I'm writing a kernel driver for a Linux kernel running on Android devices (Nexus 5X).
I have a kernel buffer and I want to expose a device to read from it. I can read and write from the kernel buffer but I cannot write to the userspace buffer received from the read syscall. The very strange thing is that copy_to_user works only for less than 128 bytes... it makes no sense to me.
The code is the following ( truncated ):
static ssize_t dev_read(struct file *filep, char __user *buffer, size_t len, loff_t *offset){
unsigned long sent;
// ...
pr_err("MYLOGGER: copying from buffer: head=%d, tail=%d, cnt=%d, sent=%lu, access=%lu\n",
head, tail, cnt, sent,
access_ok(VERIFY_WRITE, buffer, sent));
if(sent >= 1) {
sent -= copy_to_user(buffer, mybuf + tail, sent);
pr_err("MYLOGGER: sent %lu bytes\n", sent);
// ...
}
// ...
}
The output is the following:
[ 56.476834] MYLOGGER: device opened
[ 56.476861] MYLOGGER: reading from buffer
[ 56.476872] MYLOGGER: copying from buffer: head=5666644, tail=0, cnt=5666644, sent=4096, access=1
[ 56.476882] MYLOGGER: sent 0 bytes
As you can see from the log sent is 4096, no integer overflow here.
When using dd I'm able to read up to 128 bytes per call ( dd if=/dev/mylog bs=128 ). I think that when using more than 128 bytes dd uses a buffer from the heap and the kernel cannot access it anymore, which is what I cannot understand.
I'm using copy_to_user from the read syscall handler, I've also printed the current->pid and it is the same process.
The kernel sources can be found from google android sources.
The function copy_to_user is defined at arch/arm64/include/asm/uaccess.h and the __copy_to_user can be found in arch/arm64/lib/copy_to_user.S.
Thank you for your time, I hope to get rid of this madness with your precious help.
-- EDIT --
I've wrote a small snippet to get the vm_area_struct of the destination userspace buffer and I print out the permissions, this is the result:
MYLOGGER: buffer belongs to vm_area with permissions rw-p
So that address should be writable...
-- EDIT --
I've written more debugging code, logging the state of the memory page used by the userspace buffer.
MYLOGGER: page=(0x7e3782d000-0x7e3782e000) present=1
Long story short it works when the page is present and will not cause a page fault. This is insanely weird, the page fault shall be managed by the virtual memory allocator that would load the page into the main memory...
For some reason, if the page is not present in memory the kernel will not fetch it.
My best guess is the __copy_to_user assembly function exception handler, which returns the number of uncopied bytes.
This exception handler is executed before the virtual memory page fault callback. Thus you won't be able to write to userspace unless the pages are already present in memory.
My current workaround is to preload those pages using get_user_pages.
I hope that this helps someone else :)
The problem was that I held a spin_lock.
copy_{to,from}_user shall never be called while holding a spin_lock.
Using a mutex solves the problem.
I feel so stupid to had wasted days on this...
Related
I have a flutter app which uses Dart ffi to connect to my custom C++ audio backend. There I allocate around 10MB of total memory for my audio buffers. Each buffer has 10MB / 84 of memory. I use 84 audio players. Here is the ffi flow:
C++ bridge:
extern "C" __attribute__((visibility("default"))) __attribute__((used))
void *
loadMedia(char *filePath, int8_t *mediaLoadPointer, int64_t *currentPositionPtr, int8_t *mediaID) {
LOGD("loadMedia %s", filePath);
if (soundEngine == nullptr) {
soundEngine = new SoundEngine();
}
return soundEngine->loadMedia(filePath, mediaLoadPointer, currentPositionPtr, mediaID);
}
In my sound engine I launch a C++ thread:
void loadMedia(){
std::thread{startDecoderWorker,
buffer,
}.detach();
}
void startDecoderWorker(float*buffer){
buffer = new float[30000]; // 30000 might be wrong here, I entered a huge value to just showcase the problem, the calculation of 10MB / 84 code is redundant to the code
}
So here is the problem, I dont know why but when I allocate memory with new keyword even inside a C++ thread, flutters raster thread janks and I can see that my flutter UI janks lots of frames. This is also present in performance overlay as it goes all red for 3 to 5 frames with each of it taking around 30 40ms. Tested on profile mode.
Here is how I came to this conclusion:
If I instantly return from my startDecoderWorker without running new memory allocation code, when I do this there is 0 jank. Everything is smooth 60fps, performance overlay doesnt show me red bars.
Here are some screenshots from Profile mode:
The actual cause, after discussions (in the comments of the question), is not because the memory allocation is too slow, but lie somewhere else - the calculations which will be heavy if the allocation is big.
For details, please refer to the comments and discussions of the question ;)
I've been using a few Android apps that hook onto another process, scan its allocated memory and edit it. Obviously, I was using it to mess around with some games.
Then, it got me thinking, "How are they doing it?"
I know how to get the list of currently running apps but hooking onto another process and scanning and editing the process' memory are.. Beyond my knowledge.
It seems that I'd need some kind of "root" privileges to execute code like that but I don't mind. I just want to know how these app developers did it to sate my curiosity.
So..
Assuming root privileges are enabled..
1) How can I hook onto a currently running different app?
2) How can I scan its memory regions?
3) How can I edit its memory regions?
inb4 "Have you tried googling?"
I thought about it and did a tonne of Googling (1+ hours) but no results because the words "RAM" and "memory" just gives me stuff like how to track the current app's memory allocations and whatnot. In other words, not what I am looking for.
So, I finally turned to opening a thread here.
Putting this here for posterity
After a fair bit of research (read, 5 days straight), as far as Linux is concerned, one may attach to a process, read its memory and detach by simply doing this:
Heavily commented for the newbies like me, uncomment and whatever if you're better
#include <sys/ptrace.h> //For ptrace()
#include <sys/wait.h> //For waitpid()
int main () {
int pid = 1337; //The process id you wish to attach to
int address = 0x13371337; //The address you wish to read in the process
//First, attach to the process
//All ptrace() operations that fail return -1, the exceptions are
//PTRACE_PEEK* operations
if (ptrace(PTRACE_ATTACH, pid, NULL, NULL) == -1) {
//Read the value of errno for details.
//To get a human readable, call strerror()
//strerror(errno) <-- Returns a human readable version of the
//error that occurred
return 0;
}
//Now, attaching doesn't mean we can read the value straight away
//We have to wait for the process to stop
int status;
//waitpid() returns -1 on failure
//W.I.F, not W.T.F
//WIFSTOPPED() returns true if the process was stopped when we attached to it
if (waitpid(pid, &status, 0) == -1 || !WIFSTOPPED(status)) {
//Failed, read the value of errno or strerror(errno)
return 0;
}
errno = 0; //Set errno to zero
//We are about to perform a PTRACE_PEEK* operation, it is possible that the value
//we read at the address is -1, if so, ptrace() will return -1 EVEN THOUGH it succeeded!
//This is why we need to 'clear' the value of errno.
int value = ptrace(PTRACE_PEEKDATA, pid, (void*)addr, NULL);
if (value == -1 && errno != 0) {
//Failed, read the value of errno or strerror(errno)
return 0;
} else {
//Success! Read the value
}
//Now, we have to detach from the process
ptrace(PTRACE_DETACH, pid, NULL, NULL);
return 0;
}
References:
http://linux.die.net/man/2/ptrace
http://linux.die.net/man/2/waitpid
How does this relate to editing Android app memory values?
Well, the headers for ptrace and wait exist in the Android NDK. So, to read/write an app's RAM, you will need native code in your app.
Also, ptrace() requires root privileges.
Why did it take you this long?
I've never written this kind of code before.
As far as Linux is concerned, it's forbidden by kernel to modify other memory that belongs to other processes (by the way, this is why there are no viruses on Linux).
What you are actually doing is editing Shared Preferences. They are written in plain text, and that means they can be edited if you have access to them(root).
You can check out CheatDroid application at Play Store. Also, if you want to develop similar app yourself, you can also check this link to create your first root app. http://www.xda-developers.com/android/how-to-build-an-android-app-part-2-writing-a-root-app-xda-tv/
I'm trying to write a service that communicates with a USB device using USB Interrupt transfer. Basically I'm blocking on UsbConnection.requestWait() in a thread to wait for interrupts transfers in, then pass those to the activity using an intent.
I seem to be having problems when the USB devices sends me a largish number of interrupt packets in a row (about 50). It sometimes works but usually the app crash with a message of that flavor:
02-23 01:55:53.387: A/libc(8460): ### ABORTING: heap corruption detected by tmalloc_small
02-23 01:55:53.387: A/libc(8460): Fatal signal 11 (SIGSEGV) at 0xdeadbaad (code=1), thread 8460 (pf.mustangtamer)
it's not always a malloc call that fails, I have seen several flavors of malloc (dlmalloc, malloc_small) as well as dlfree. In every instance I get a Fatal Signal 11 and a reference to 0xdeadbaad so somehow I am corrupting the heap.
It's not obvious from the heap dump what is causing the corruption.
Here is what I believe is the offending code (the problem only occurs when receiving many packets back to back to back):
private class ReceiverThread extends Thread {
public ReceiverThread(String string) {
super(string);
}
public void run() {
ByteBuffer buffer = ByteBuffer.allocate(BUFFER_SIZE);
buffer.clear();
UsbRequest inRequest = new UsbRequest();
inRequest.initialize(mUsbConnection, mUsbEndpointIn);
while(mUsbDevice != null ) {
if (inRequest.queue(buffer, BUFFER_SIZE) == true) {
// (mUsbConnection.requestWait() is blocking
if (mUsbConnection.requestWait() == inRequest){
buffer.flip();
byte[] bytes = new byte[buffer.remaining()];
buffer.get(bytes);
//TODO: use explicit intent, not broadcast
Intent intent = new Intent(RECEIVED_INTENT);
intent.putExtra(DATA_EXTRA, bytes);
sendBroadcast(intent);
} else{
Log.d(TAG, "mConnection.requestWait() returned for a different request (likely a send operation)");
}
} else {
Log.e(TAG, "failed to queue USB request");
}
buffer.clear();
}
Log.d(TAG, "RX thread terminating.");
}
}
Right now the activity is not consuming the intents, I'm trying to get the USB communication to stop crashing before I implement that side.
I'm not seeing how the code above can corrupt the heap, possibly through some non-thread safe behavior. Only one request is queued at a time so I think "buffer" is safe.
My target is a tablet running JB 4.3.1 if that makes a difference.
I'm not seeing anything wrong with this either. You may want to try removing code from your loop and see if it still corrupts the heap to help you zoom on the offending area.
Remember that heap operations are usually delayed, the garbage collector doesn't run immediately, so you could be corrupting it somewhere else, and it's only showing up in this loop because it is very heap intensive.
try to use a larger heap size by setting android:largeHeap="true" in your application manifest.
I would have asked these questions in a comment, but alas, not enough rep.
I see nothing directly wrong with the code above, but I would check the following:
What is BUFFER_SIZE? crazily, I've had very strange problems with UsbRequest.queue() for sizes greater than 15KB. I'm pretty sure that this wouldn't cause your heap corruption, but it could lead to weirdness later. I had to break my requests into multiple calls to queue() to do large reads.
Are you using a bulk USB endpoint? I don't know what your application is, so I cant say for sure if you should be using a bulk endpoint or not, but its the type of endpoint intended for large transfers.
Lastly, when I encountered this 0xdeadbaad problem (detected by tmalloc_large), it had nothing to do with the code I thought was at fault (the code near the malloc) - it was of course a threading issue in which I had JNI native code reading/writing the same buffers on multiple separate threads! Its only that it gets detected when malloc is called, as user3343927 mentioned.
i am trying to capture video on android using v4l2 under jni. i found some guide and followed the step:
fd = open("/dev/video0", O_RDWR);
/* init part */
ioctl(fd, VIDIOC_QUERYCAP, &caps);
ioctl(fd, VIDIOC_ENUM_FMT, &fmtdesc);
ioctl(fd, VIDIOC_S_FMT, &fmt);
ioctl(fd, VIDIOC_REQBUFS, &req);
ioctl(fd, VIDIOC_QUERYBUF, &buf);
ioctl(fd, VIDIOC_QBUF, &buf);
/* capture part */
FILE *fp = fopen("/sdcard/img.yuv", "wb");
for (i = 0; i < 20; i++)
{
ioctl(fd, VIDIOC_DQBUF, &buf);
fwrite(buffers[buf.index].start, 1, buf.bytesused, fp);
ioctl(fd, VIDIOC_QBUF, &buf);
}
fclose(fp);
this is the main structure of my code. all the function run correctly and return 0. however, when i open the output file with binary viewer, i found that all the data is 0.
is there any problem with my code? i got confused because all the functions returned 0.
Thanks!!
You are using an array called buffers[]. But I can't see where it's declared or what it stands for. If there is no code missing above, you will always get zeros cause you are writing buffer[] to the file and not the stuff you get from v4l2.
Further more, the initial values of caps, fmtdesc, fmt, req and buf prior to the ioctl command would be interesting too. Depending on their inital values, you will have different communication interfaces. Issues could be hidden in these parts.
As you wrote in your question, all ioctl commands would return 0, there should be no error. If everything behaves as expected. Another way to check for issues is calling
perror("<your comment or hint to line above>");
after each ioctl command. This would print you more information about errors on your std-out. ( more details about perror can be found in this thread When should I use perror("...") and fprintf(stderr, "...")?)
Are you trying to get the images from the camera? (on some phones video0 you used above is the back cam) On some android devices the camera has to be started by complex procedure using other device drivers besides videoXY. And trying to get the images from video0 while the official camera app is running might be difficult. The official v4l2 api says:
V4L2 drivers should not support multiple applications reading or writing the same data stream on a device by copying buffers, time multiplexing or similar means. This is better handled by a proxy application in user space.
From: http://linuxtv.org/downloads/v4l-dvb-apis/common.html#idp18553208
Can you post more (detailed) code? I might be able to help, as I'm doing very similar stuff.
To be able to reproduce it, it would be very interesting with which android device you are working (type / model number / android version).
We ( http://www.mosync.com ) have compiled our ARM recompiler with the Android NDK which takes our internal byte code and generates ARM machine code. When executing recompiled code we see an enormous increase in performance, with one small exception, we can't use any Java Bitmap operations.
The native system uses a function which takes care of all the calls to the Java side which the recompiled code is calling. On the Java (Dalvik) side we then have bindings to Android features. There are no problems while recompiling the code or when executing the machine code. The exact same source code works on Symbian and Windows Mobile 6.x so the recompiler seems to generate correct ARM machine code.
Like I said, the problem we have is that we can't use Java Bitmap objects. We have verified that the parameters which are sent from the Java code is correct, and we have tried following the execution down in Android's own JNI systems. The problem is that we get an UnsupportedOperationException with "size must fit in 32 bits.". The problem seems consistent on Android 1.5 to 2.3. We haven't tried the recompiler on any Android 3 devices.
Is this a bug which other people have encountered, I guess other developers have done similar things.
I found the message in dalvik_system_VMRuntime.c:
/*
* public native boolean trackExternalAllocation(long size)
*
* Asks the VM if <size> bytes can be allocated in an external heap.
* This information may be used to limit the amount of memory available
* to Dalvik threads. Returns false if the VM would rather that the caller
* did not allocate that much memory. If the call returns false, the VM
* will not update its internal counts.
*/
static void Dalvik_dalvik_system_VMRuntime_trackExternalAllocation(
const u4* args, JValue* pResult)
{
s8 longSize = GET_ARG_LONG(args, 1);
/* Fit in 32 bits. */
if (longSize < 0) {
dvmThrowException("Ljava/lang/IllegalArgumentException;",
"size must be positive");
RETURN_VOID();
} else if (longSize > INT_MAX) {
dvmThrowException("Ljava/lang/UnsupportedOperationException;",
"size must fit in 32 bits");
RETURN_VOID();
}
RETURN_BOOLEAN(dvmTrackExternalAllocation((size_t)longSize));
}
This method is called, for example, from GraphicsJNI::setJavaPixelRef:
size_t size = size64.get32();
jlong jsize = size; // the VM wants longs for the size
if (reportSizeToVM) {
// SkDebugf("-------------- inform VM we've allocated %d bytes\n", size);
bool r = env->CallBooleanMethod(gVMRuntime_singleton,
gVMRuntime_trackExternalAllocationMethodID,
jsize);
I would say it seems that the code you're calling is trying to allocate a too big size. If you show the actual Java call which fails and values of all the arguments that you pass to it, it might be easier to find the reason.
I managed to find a work-around. When I wrap all the Bitmap.createBitmap calls inside a Activity.runOnUiThread() It works.