PC and LR in same function in Android Kernel - android

I am facing a problem in which the PC1 and LR2 both are pointing with in the function cpuacct_charge() in the kernel's sched.c. Are there any scenario's in which this might happen? My analysis shows no recursion in the cpuacct_charge() function. I cannot provide the code. However, any scenario's when this happens would be a big help.
For Clarification : The value of PC and LR points to a different locations in function:
void cpuacct_charge(struct task_struct *tsk, u64 cputime)
Note 1: PC - Program Counter
Note 2: LR - Link Register

When a function returns it basically does a branch to the address in the link register.
So, presumably you've paused the program right after a function return.

Related

How to debug an Android stack trace with __cxa_rethrow

I am building an Android app which in the latest version has a lot of crash reports like the following one on google play dash. It consists of several libraries cross compiled with android-ndk.
Starting from frame #05 it halfway makes sense to me. What I wonder is how to go for the other half and what to make from the upper frames.
Trace:
#00 pc 0000000000083134 /apex/com.android.runtime/lib64/bionic/libc.so (abort+160)
#01 pc 000000000017cf00 /data/app/[...]==/lib/arm64/libqca-qt5_arm64-v8a.so
#02 pc 000000000017d070 /data/app/[...]==/lib/arm64/libqca-qt5_arm64-v8a.so
#03 pc 0000000000179f48 /data/app/[...]==/lib/arm64/libqca-qt5_arm64-v8a.so
#04 pc 0000000000179850 /data/app/[...]==/lib/arm64/libqca-qt5_arm64-v8a.so (__cxa_rethrow+196)
#05 pc 0000000000c0e10c /data/app/[...]==/lib/arm64/libqgis_core_arm64-v8a.so (QgsCoordinateTransform::transformInPlace(double&, double&, double&, QgsCoordinateTransform::TransformDirection) const+300)
#06 pc 00000000000340d8 /data/app/[...]==/lib/arm64/libqfield_qgsquick_arm64-v8a.so (QgsQuickCoordinateTransformer::updatePosition()+136)
#07 pc 0000000000034350 /data/app/[...]==/lib/arm64/libqfield_qgsquick_arm64-v8a.so (QgsQuickCoordinateTransformer::setDestinationCrs(QgsCoordinateReferenceSystem const&)+176)
#08 pc 0000000000028488 /data/app/[...]==/lib/arm64/libqfield_qgsquick_arm64-v8a.so
#09 pc 0000000000028a18 /data/app/[...]==/lib/arm64/libqfield_qgsquick_arm64-v8a.so (QgsQuickCoordinateTransformer::qt_metacall(QMetaObject::Call, int, void**)+316)
#10 pc 00000000002f36a8 /data/app/[...]==/lib/arm64/libQt5Qml_arm64-v8a.so (QV4::QQmlValueTypeWrapper::write(QObject*, int) const+180)
What I know: QgsCoordinateTransform::transformInPlace can throw a QgsCsException which is caught and handled inside updatePosition().
try
{
mCoordinateTransform.transformInPlace( x, y, z );
}
catch ( const QgsCsException &exp )
{
QgsDebugMsg( exp.what() );
}
Given that it's handled I'm not sure how that's related to a crash, nonetheless I think it could be interesting information.
What I can't make sense of is how libqca-qt5 comes into play, this is never used inside transformInPlace. Might it have some magic in place to handle unhandled exceptions (Can something be extracted from __cxa_rethrow)?
The only idea that comes to my mind is that it's not a QgsCsException but another (unhandled) exception that's raised and causes the crash. This would be an easy fix, but since I'm not able to reproduce this and all I have is this stack trace here and the only way to test is to ship a new apk and wait for reports to come in. This is a long roundtrip for feedback, so I'm very interested in either getting things right directly or at least improving the debug possibilities to fix it in two rounds.
So the question: what can be read from a stack trace like this and how to go about debugging this?
After having received feedback from a user about how exactly to reproduce this crash, it was possible to resolve this with the confidence of a positive test.
As assumed, it was a different exception raised by another library which was called from within transformInPlace().
Adding the following line fixed the crash and helped at least as a band aid.
try
{
mCoordinateTransform.transformInPlace( x, y, z );
}
catch ( const QgsCsException &exp )
{
QgsDebugMsg( exp.what() );
}
catch ( ... )
{
QgsDebugMsg( "Unknown exception caught" );
}
It is still unclear to me, how qca is involved in this stack trace. And if there is any code to handle unhandled exceptions or if it is just a random library defining some symbols or if there are other mechanisms at play. I assume __cxa_rethrow plays a role, after all it's related to exception handling. I'd still be happy if someone could shed some light. Meanwhile, for future readers, it's good to know that adding a catch all clause is a feasible approach here.

Android MSM kernel: copy_to_user fails

I'm writing a kernel driver for a Linux kernel running on Android devices (Nexus 5X).
I have a kernel buffer and I want to expose a device to read from it. I can read and write from the kernel buffer but I cannot write to the userspace buffer received from the read syscall. The very strange thing is that copy_to_user works only for less than 128 bytes... it makes no sense to me.
The code is the following ( truncated ):
static ssize_t dev_read(struct file *filep, char __user *buffer, size_t len, loff_t *offset){
unsigned long sent;
// ...
pr_err("MYLOGGER: copying from buffer: head=%d, tail=%d, cnt=%d, sent=%lu, access=%lu\n",
head, tail, cnt, sent,
access_ok(VERIFY_WRITE, buffer, sent));
if(sent >= 1) {
sent -= copy_to_user(buffer, mybuf + tail, sent);
pr_err("MYLOGGER: sent %lu bytes\n", sent);
// ...
}
// ...
}
The output is the following:
[ 56.476834] MYLOGGER: device opened
[ 56.476861] MYLOGGER: reading from buffer
[ 56.476872] MYLOGGER: copying from buffer: head=5666644, tail=0, cnt=5666644, sent=4096, access=1
[ 56.476882] MYLOGGER: sent 0 bytes
As you can see from the log sent is 4096, no integer overflow here.
When using dd I'm able to read up to 128 bytes per call ( dd if=/dev/mylog bs=128 ). I think that when using more than 128 bytes dd uses a buffer from the heap and the kernel cannot access it anymore, which is what I cannot understand.
I'm using copy_to_user from the read syscall handler, I've also printed the current->pid and it is the same process.
The kernel sources can be found from google android sources.
The function copy_to_user is defined at arch/arm64/include/asm/uaccess.h and the __copy_to_user can be found in arch/arm64/lib/copy_to_user.S.
Thank you for your time, I hope to get rid of this madness with your precious help.
-- EDIT --
I've wrote a small snippet to get the vm_area_struct of the destination userspace buffer and I print out the permissions, this is the result:
MYLOGGER: buffer belongs to vm_area with permissions rw-p
So that address should be writable...
-- EDIT --
I've written more debugging code, logging the state of the memory page used by the userspace buffer.
MYLOGGER: page=(0x7e3782d000-0x7e3782e000) present=1
Long story short it works when the page is present and will not cause a page fault. This is insanely weird, the page fault shall be managed by the virtual memory allocator that would load the page into the main memory...
For some reason, if the page is not present in memory the kernel will not fetch it.
My best guess is the __copy_to_user assembly function exception handler, which returns the number of uncopied bytes.
This exception handler is executed before the virtual memory page fault callback. Thus you won't be able to write to userspace unless the pages are already present in memory.
My current workaround is to preload those pages using get_user_pages.
I hope that this helps someone else :)
The problem was that I held a spin_lock.
copy_{to,from}_user shall never be called while holding a spin_lock.
Using a mutex solves the problem.
I feel so stupid to had wasted days on this...

ARM64 hooking: insert a jump in memory

I'm trying to hook on a 64 bit ARM Android device the SSL_do_handshake function available in libssl.so, until now I have:
address of libssl.so (found in proc//maps)
offset of SSL_do_handshake(found in libssl.so)
absolute address of SSL_do_handshake (address_of_libssl + offset_of_SSL_do_handshake)
What I'm doing is save the instructions at absolute address of libssl.so (for recovery) and overwrite them with my instructions (for jump):
LDR X9,#8
Br X9
address for jump 7f75a5b890
In detail what I'm saving in memory is 49000058 20011fd6 90b8a5757f if I look in memory after the overwriting I found my instructions and the address. If I try to execute it and call SSL_do_hanshake I have a crash before the jump.
I did the same for ARMv7 using thumb32 with the same approach but different instructions:
ldr pc [pc,#0]
address for jump (f36e240d)
In memory dff800f0 0d246ef3 and it works.
Any idea of what I'm doing wrong?

android crash on eglDestroyImageKHR using GraphicBuffer

I'm copying data to the GraphicBuffer using the following code:
uint8_t *ptr;
sp<GraphicBuffer> gBuffer = new GraphicBuffer(width,height,format,usage);
gBuffer->lock(GRALLOC_USAGE_SW_WRITE_OFTEN, (void**)(&ptr));
//Copy Data
gBuffer->unlock();
EGLClient clientBuffer = (EGLClientBuffer)gBuffer->getNativeBuffer();
EGLImageKHR img = eglCreateImageKHR(eglGetDisplay(EGL_DEFAULT_DISPLAY), EGL_NO_CONTEXT, EGL_NATIVE_BUFFER_ANDROID,clientBuffer, NULL);
glBindTexture(GL_TEXTURE_EXTERNAL_OES, textureHandle);
glEGLImageTargetTexture2DOES(GL_TEXTURE_EXTERNAL_OES, (GLeglImageOES)img);
//Finished using img, Crash Here:
eglDestroyImageKHR(eglGetDisplay(EGL_DEFAULT_DISPLAY), img);
And the problem comes when calling eglDestroyImageKHR which crashes in some devices and some others not. This is the backtrace:
00 pc 00006488 /system/lib/libui.so
01 pc 00006719 /system/lib/libui.so (android::GraphicBuffer::free_handle()+52)
02 pc 00006813 /system/lib/libui.so (android::GraphicBuffer::~GraphicBuffer()+22)
03 pc 00006841 /system/lib/libui.so (android::GraphicBuffer::~GraphicBuffer()+4)
04 pc 0000f823 /system/lib/libutils.so (android::RefBase::decStrong(void const*) const+40)
05 pc 00003bbb /system/vendor/lib/egl/eglsubAndroid.so
06 pc 0001b5f4 /system/vendor/lib/egl/libEGL_adreno.so (egliDoDestroyEGLImage+80)
07 pc 00006c88 /system/vendor/lib/egl/libEGL_adreno.so (eglDestroyImageKHR+16)
08 pc 0000e749 /system/lib/libEGL.so (eglDestroyImageKHR+44)
Here are a couple of complete backtraces:
http://pastebin.com/S0Ax6eNp
http://pastebin.com/bGWeWruw
Not calling eglDestroyImageKHR causes a leak and when calling again the above routine, gbuffer->lock() fails with an insufficient memory error message.
Crashes for example on a galaxy S4, galaxy s2 , xperia z1 and doesn't crash in a nexus 4, nexus 7, galaxy ace 2...etc
I would appreciate any help.
-EDITED-
The only workaround I have found is to decrease the reference counter to 0 so the GraphicBuffer destructor gets called and frees the memory.
if(gBuffer->getStrongCount() > 0){
gBuffer->decStrong(gBuffer->handle);
}
I had the same issue with EGL surfaces. Since 4.3 Samsung ROMs don't deactivate the active context and surface when destroying either one. The code now looks something like this:
// This line had to be added to prevent crashes:
mEgl.eglMakeCurrent(mEglDisplay, EGL10.EGL_NO_SURFACE, EGL10.EGL_NO_SURFACE, EGL10.EGL_NO_CONTEXT);
mEgl.eglDestroyContext(mEglDisplay, mEglContext);
mEgl.eglDestroySurface(mEglDisplay, mEglSurface);
The stack trace looked fairly similar. Have you tried destroying gBuffer before calling eglDestroyImageKHR?
FWIW, in the Mozilla AndroidGraphicBuffer.cpp code, the author writes:
/**
* XXX: eglDestroyImageKHR crashes sometimes due to refcount badness (I think)
*
* If you look at egl.cpp (https://github.com/android/platform_frameworks_base/blob/master/opengl/libagl/egl.cpp#L2002)
* you can see that eglCreateImageKHR just refs the native buffer, and eglDestroyImageKHR
* just unrefs it. Somehow the ref count gets messed up and things are already destroyed
* by the time eglDestroyImageKHR gets called. For now, at least, just not calling
* eglDestroyImageKHR should be fine since we do free the GraphicBuffer below.
*
* Bug 712716
*/
and essentially does not call eglDestroyImageKHR() which is apparently OK in that context. Bug report here.
James Willcox the author of the Mozilla code is also the author of the snorp blog post.

BX LR and ARM return from exception instructions (RFE and ERET)

In ARM architecture, returning from exception can be done in two ways which I know,(there might be others). But main logic is to modify PC, which will make processor trigger into mode set in CPSR.
So pop {...,pc} would make a switch to say user from supervisor
or mov pc,lr would do the same.
My Question is, would a BX lr make the switch ?
Assuming I am handling IRQ and I call a assembly routine say do_IRQ and then return from do_IRQ is via BX LR. Would this make the code after bl do_IRQ in my fault handler irrelevant ?
irq_fault_handler:
push {lr}
push {ro-r12}
mrs r0, spsr
push {r0}
bl do_IRQ
pop {r0}
msr cpsr, r0
pop {r0,-r12}
pop {lr}
subs pc, lr, #4
do_IRQ:
...
BX LR
It does not make it irrelevant since you called do_IRQ with bl instead of just a branch so you've already overwritten lr. Furthermore, even if you just did a branch, your stack would be messed up (you push r0-r12 but never pop them before returning).
Also, the code you show doesn't seem to return from the exception correctly either. Most of the time, when you return from an exception, you want to restore the program status registers as well as the registers which a mov pc, lr won't do nor will a bx lr.
Furthermore, the address contained in lr will actually be offsetted by a few words (it varies depending on the type of exception) so you're not returning to the correct instruction anyway. For IRQ, I believe it is off by 1 word.
The recommended way to return from an IRQ exception is a subs pc, lr, #4 instruction (after you've pop'd all your registers beforehand of course) which is special in that it also restores the CPSR.

Categories

Resources