Android NDK: Why is this malloc() having no observable effect?

Android NDK: Why is this malloc() having no observable effect? - android

Here's a simplified version of the code I'm using
Java:
private native void malloc(int bytes);
private native void free();
// this is called when I want to create a very large buffer in native memory
malloc(32 * 1024 * 1024);
// EDIT: after allocating, we need to initialize it before Android sees it as anythign other than a "reservation"
memset(blob, '\0', sizeof(char) * bytes);
...
// and when I'm done, I call this
free()
C:
static char* blob = NULL;
void Java_com_example_MyClass_malloc(JNIEnv * env, jobject this, jint bytes)
{
blob = (char*) malloc(sizeof(char) * bytes);
if (NULL == blob) {
__android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "Failed to allocate memory\n");
} else {
char m[50];
sprintf(m, "Allocated %d bytes", sizeof(char) * bytes);
__android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, m);
}
}
void Java_com_example_MyClass_free(JNIEnv * env, jobject this)
{
free(blob);
blob = NULL;
}
Now when I call malloc() from MyClass.java, I would expect to see 32M of memory allocated and that I would be able to observe this drop in available memory somewhere.
I haven't seen any indication of that however, either in adb shell dumpsys meminfo or adb shell cat /proc/meminfo. I am pretty new to C, but have a bunch of Java experience. I'm looking to allocate a bunch of memory outside of Dalvik's heap (so it's not managed by Android/dalvik) for testing purposes. Hackbod has led me to believe that Android currently does not place restrictions on the amount of memory allocated in Native code, so this seems to be the correct approach. Am I doing this right?

You should see an increase in "private / dirty" pages after the memset(). If you have the extra developer command-line utilities installed on the device, you can run procrank or showmap <pid> to see this easily. Requires a rooted device.
Failing that, have the process copy the contents of /proc/self/maps to a file before and after the allocation. (Easiest is to write it to external storage; you'll need the WRITE_EXTERNAL_STORAGE permission in your manifest.) By comparing the map output you should either see a new 32MB region, or an existing region expanding by 32MB. This works because 32MB is above dlmalloc's internal-heap threshold, so the memory should be allocated using a call to mmap().
There is no fixed limit on the amount of memory you can allocate from native code. However, the more you allocate, the tastier you look to the kernel's low-memory process killer.

Related

Flutter C++ Memory allocation causes jank on raster thread - Android NDK Dart FFI

I have a flutter app which uses Dart ffi to connect to my custom C++ audio backend. There I allocate around 10MB of total memory for my audio buffers. Each buffer has 10MB / 84 of memory. I use 84 audio players. Here is the ffi flow:
C++ bridge:
extern "C" __attribute__((visibility("default"))) __attribute__((used))
void *
loadMedia(char *filePath, int8_t *mediaLoadPointer, int64_t *currentPositionPtr, int8_t *mediaID) {
LOGD("loadMedia %s", filePath);
if (soundEngine == nullptr) {
soundEngine = new SoundEngine();
}
return soundEngine->loadMedia(filePath, mediaLoadPointer, currentPositionPtr, mediaID);
}
In my sound engine I launch a C++ thread:
void loadMedia(){
std::thread{startDecoderWorker,
buffer,
}.detach();
}
void startDecoderWorker(float*buffer){
buffer = new float[30000]; // 30000 might be wrong here, I entered a huge value to just showcase the problem, the calculation of 10MB / 84 code is redundant to the code
}
So here is the problem, I dont know why but when I allocate memory with new keyword even inside a C++ thread, flutters raster thread janks and I can see that my flutter UI janks lots of frames. This is also present in performance overlay as it goes all red for 3 to 5 frames with each of it taking around 30 40ms. Tested on profile mode.
Here is how I came to this conclusion:
If I instantly return from my startDecoderWorker without running new memory allocation code, when I do this there is 0 jank. Everything is smooth 60fps, performance overlay doesnt show me red bars.
Here are some screenshots from Profile mode:

The actual cause, after discussions (in the comments of the question), is not because the memory allocation is too slow, but lie somewhere else - the calculations which will be heavy if the allocation is big.
For details, please refer to the comments and discussions of the question ;)

Shared memory between NDK and SDK below API Level 26

Library written in c++ produces continuous stream of data and same has to be ported on different platforms. Now integrating the lib to android application, I am trying to create shared memory between NDK and SDK.
Below is working snippet,
Native code:
#include <jni.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <linux/ashmem.h>
#include <android/log.h>
#include <string>
char *buffer;
constexpr size_t BufferSize=100;
extern "C" JNIEXPORT jobject JNICALL
Java_test_com_myapplication_MainActivity_getSharedBufferJNI(
JNIEnv* env,
jobject /* this */) {
int fd = open("/dev/ashmem", O_RDWR);
ioctl(fd, ASHMEM_SET_NAME, "shared_memory");
ioctl(fd, ASHMEM_SET_SIZE, BufferSize);
buffer = (char*) mmap(NULL, BufferSize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
return (env->NewDirectByteBuffer(buffer, BufferSize));
}
extern "C" JNIEXPORT void JNICALL
Java_test_com_myapplication_MainActivity_TestBufferCopy(
JNIEnv* env,
jobject /* this */) {
for(size_t i=0;i<BufferSize;i = i+2) {
__android_log_print(ANDROID_LOG_INFO, "native_log", "Count %d value:%d", i,buffer[i]);
}
//pass `buffer` to dynamically loaded library to update share memory
//
}
SDK code:
//MainActivity.java
public class MainActivity extends AppCompatActivity {
// Used to load the 'native-lib' library on application startup.
static {
System.loadLibrary("native-lib");
}
final int BufferSize = 100;
#RequiresApi(api = Build.VERSION_CODES.Q)
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
ByteBuffer byteBuffer = getSharedBufferJNI();
//update the command to shared memory here
//byteBuffer updated with commands
//Call JNI to inform update and get the response
TestBufferCopy();
}
/**
* A native method that is implemented by the 'native-lib' native library,
* which is packaged with this application.
*/
public native ByteBuffer getSharedBufferJNI();
public native int TestBufferCopy();
}
Question:
Accessing primitive arrays from Java to native is reference only if garbage collector supports pinning. Is it true for other way around ?
Is it guaranteed by android platform that ALWAYS reference is shared from NDK to SDK without redundant copy?
Is it the right way to share memory?

You only need /dev/ashmem to share memory between processes. NDK and SDK (Java/Kotlin) work in same Linux process and have full access to same memory space.
The usual way to define memory that can be used both from C++ and Java is by creating a Direct ByteBuffer. You don't need JNI for that, Java API has ByteBuffer.allocateDirect(int capacity). If it's more natural for your logical flow to allocate the buffer on the C++ side, JNI has the NewDirectByteBuffer(JNIEnv* env, void* address, jlong capacity) function that you used in your question.
Working with Direct ByteBuffer is very easy on the C++ side, but not so efficient on the JVM side. The reason is that this buffer is not backed by array, and the only API you have involves ByteBuffer.get() with typed variations (getting byte array, char, int, …). You have control of current position in the buffer, but working this way requires certain discipline: every get() operation updates the current position. Also, random access to this buffer is rather slow, because it involves calling both positioning and get APIs. Therefore, in some cases of non-trivial data structures, it may be easier to write your custom access code in C++ and have 'intelligent' getters called through JNI.
It's important not to forget to set ByteBuffer.order(ByteOrder.nativeOrder()). The order of a newly-created byte buffer is counterintuitively BIG_ENDIAN. This applies both to buffer created from Java and from C++.
If you can isolate the instances when C++ needs access to such shared memory, and don't really need it to be pinned all the time, it's worth to consider working with byte array. In Java, you have more efficient random access. On the NDK side, you will call GetByteArrayElements() or GetPrimitiveArrayCritical(). The latter is more efficient, but its use imposes restrictions on what Java functions you can call until the array is released. On Android, both methods don't involve memory allocation and copy (with no official guarantee, though). Even though C++ side uses the same memory as Java, your JNI code must call the appropriate Release…() function, and better do that as early as possible. It's a good practice to handle this Get/Release via RAII.

Let me summarize my findings,
Accessing primitive arrays from Java to native is reference only if garbage collector supports pinning. Is it true for other way around ?
The contents of a direct buffer can, potentially, reside in native memory outside of the ordinary garbage-collected heap. And hence garbage collector can't claim the memory.
Is it guaranteed by android platform that ALWAYS reference is shared from NDK to SDK without redundant copy?
Yes, As per documentation of NewDirectByteBuffer.
jobject NewDirectByteBuffer(JNIEnv* env, void* address, jlong capacity);
Allocates and returns a direct java.nio.ByteBuffer referring to the block of memory starting at the memory address address and extending capacity bytes.

Android MSM kernel: copy_to_user fails

I'm writing a kernel driver for a Linux kernel running on Android devices (Nexus 5X).
I have a kernel buffer and I want to expose a device to read from it. I can read and write from the kernel buffer but I cannot write to the userspace buffer received from the read syscall. The very strange thing is that copy_to_user works only for less than 128 bytes... it makes no sense to me.
The code is the following ( truncated ):
static ssize_t dev_read(struct file *filep, char __user *buffer, size_t len, loff_t *offset){
unsigned long sent;
// ...
pr_err("MYLOGGER: copying from buffer: head=%d, tail=%d, cnt=%d, sent=%lu, access=%lu\n",
head, tail, cnt, sent,
access_ok(VERIFY_WRITE, buffer, sent));
if(sent >= 1) {
sent -= copy_to_user(buffer, mybuf + tail, sent);
pr_err("MYLOGGER: sent %lu bytes\n", sent);
// ...
}
// ...
}
The output is the following:
[ 56.476834] MYLOGGER: device opened
[ 56.476861] MYLOGGER: reading from buffer
[ 56.476872] MYLOGGER: copying from buffer: head=5666644, tail=0, cnt=5666644, sent=4096, access=1
[ 56.476882] MYLOGGER: sent 0 bytes
As you can see from the log sent is 4096, no integer overflow here.
When using dd I'm able to read up to 128 bytes per call ( dd if=/dev/mylog bs=128 ). I think that when using more than 128 bytes dd uses a buffer from the heap and the kernel cannot access it anymore, which is what I cannot understand.
I'm using copy_to_user from the read syscall handler, I've also printed the current->pid and it is the same process.
The kernel sources can be found from google android sources.
The function copy_to_user is defined at arch/arm64/include/asm/uaccess.h and the __copy_to_user can be found in arch/arm64/lib/copy_to_user.S.
Thank you for your time, I hope to get rid of this madness with your precious help.
-- EDIT --
I've wrote a small snippet to get the vm_area_struct of the destination userspace buffer and I print out the permissions, this is the result:
MYLOGGER: buffer belongs to vm_area with permissions rw-p
So that address should be writable...
-- EDIT --
I've written more debugging code, logging the state of the memory page used by the userspace buffer.
MYLOGGER: page=(0x7e3782d000-0x7e3782e000) present=1
Long story short it works when the page is present and will not cause a page fault. This is insanely weird, the page fault shall be managed by the virtual memory allocator that would load the page into the main memory...

For some reason, if the page is not present in memory the kernel will not fetch it.
My best guess is the __copy_to_user assembly function exception handler, which returns the number of uncopied bytes.
This exception handler is executed before the virtual memory page fault callback. Thus you won't be able to write to userspace unless the pages are already present in memory.
My current workaround is to preload those pages using get_user_pages.
I hope that this helps someone else :)

The problem was that I held a spin_lock.
copy_{to,from}_user shall never be called while holding a spin_lock.
Using a mutex solves the problem.
I feel so stupid to had wasted days on this...

setprop libc.debug.malloc = 1 is not working

I tried to use setprop libc.debug.malloc = 1 to find out leak.
I made an demo program and introduced memory leak in that but the above flag is not able to detect this leak.
I tried below commands:
adb shell setprop libc.debug.malloc 1
adb shell stop
adb shell start
jstring Java_com_example_hellojni_HelloJni_stringFromJNI(JNIEnv* env,
jobject thiz) {
int *p = malloc(sizeof(int));
p[1] = 100;
return (*env)->NewStringUTF(env, "Hello from JNI !");
}
Any help would be appreciated.
Thanks

libc.debug.malloc is not valgrind. It tracks native heap allocations, but doesn't really detect leaks directly. It works best in conjuction with DDMS; see this answer for information about using it for native leak chasing (and maybe this older answer).
(Note you can use valgrind on recent versions of Android, but getting it set up can be an adventure.)
FWIW, different levels of libc.debug.malloc are reasonably good at finding use-after-free and buffer overruns:
/* 1 - For memory leak detections.
* 5 - For filling allocated / freed memory with patterns defined by
* CHK_SENTINEL_VALUE, and CHK_FILL_FREE macros.
* 10 - For adding pre-, and post- allocation stubs in order to detect
* buffer overruns.
For example, if you set libc.debug.malloc = 10 and add a free() call to your example above, you'll likely get a warning message from the library because you set p[1] rather than p[0].

Running generated ARM machine code on Android gives UnsupportedOperationException with Java Bitmap objects

We ( http://www.mosync.com ) have compiled our ARM recompiler with the Android NDK which takes our internal byte code and generates ARM machine code. When executing recompiled code we see an enormous increase in performance, with one small exception, we can't use any Java Bitmap operations.
The native system uses a function which takes care of all the calls to the Java side which the recompiled code is calling. On the Java (Dalvik) side we then have bindings to Android features. There are no problems while recompiling the code or when executing the machine code. The exact same source code works on Symbian and Windows Mobile 6.x so the recompiler seems to generate correct ARM machine code.
Like I said, the problem we have is that we can't use Java Bitmap objects. We have verified that the parameters which are sent from the Java code is correct, and we have tried following the execution down in Android's own JNI systems. The problem is that we get an UnsupportedOperationException with "size must fit in 32 bits.". The problem seems consistent on Android 1.5 to 2.3. We haven't tried the recompiler on any Android 3 devices.
Is this a bug which other people have encountered, I guess other developers have done similar things.

I found the message in dalvik_system_VMRuntime.c:
/*
* public native boolean trackExternalAllocation(long size)
*
* Asks the VM if <size> bytes can be allocated in an external heap.
* This information may be used to limit the amount of memory available
* to Dalvik threads. Returns false if the VM would rather that the caller
* did not allocate that much memory. If the call returns false, the VM
* will not update its internal counts.
*/
static void Dalvik_dalvik_system_VMRuntime_trackExternalAllocation(
const u4* args, JValue* pResult)
{
s8 longSize = GET_ARG_LONG(args, 1);
/* Fit in 32 bits. */
if (longSize < 0) {
dvmThrowException("Ljava/lang/IllegalArgumentException;",
"size must be positive");
RETURN_VOID();
} else if (longSize > INT_MAX) {
dvmThrowException("Ljava/lang/UnsupportedOperationException;",
"size must fit in 32 bits");
RETURN_VOID();
}
RETURN_BOOLEAN(dvmTrackExternalAllocation((size_t)longSize));
}
This method is called, for example, from GraphicsJNI::setJavaPixelRef:
size_t size = size64.get32();
jlong jsize = size; // the VM wants longs for the size
if (reportSizeToVM) {
// SkDebugf("-------------- inform VM we've allocated %d bytes\n", size);
bool r = env->CallBooleanMethod(gVMRuntime_singleton,
gVMRuntime_trackExternalAllocationMethodID,
jsize);
I would say it seems that the code you're calling is trying to allocate a too big size. If you show the actual Java call which fails and values of all the arguments that you pass to it, it might be easier to find the reason.

I managed to find a work-around. When I wrap all the Bitmap.createBitmap calls inside a Activity.runOnUiThread() It works.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.