android crash on eglDestroyImageKHR using GraphicBuffer - android

I'm copying data to the GraphicBuffer using the following code:
uint8_t *ptr;
sp<GraphicBuffer> gBuffer = new GraphicBuffer(width,height,format,usage);
gBuffer->lock(GRALLOC_USAGE_SW_WRITE_OFTEN, (void**)(&ptr));
//Copy Data
gBuffer->unlock();
EGLClient clientBuffer = (EGLClientBuffer)gBuffer->getNativeBuffer();
EGLImageKHR img = eglCreateImageKHR(eglGetDisplay(EGL_DEFAULT_DISPLAY), EGL_NO_CONTEXT, EGL_NATIVE_BUFFER_ANDROID,clientBuffer, NULL);
glBindTexture(GL_TEXTURE_EXTERNAL_OES, textureHandle);
glEGLImageTargetTexture2DOES(GL_TEXTURE_EXTERNAL_OES, (GLeglImageOES)img);
//Finished using img, Crash Here:
eglDestroyImageKHR(eglGetDisplay(EGL_DEFAULT_DISPLAY), img);
And the problem comes when calling eglDestroyImageKHR which crashes in some devices and some others not. This is the backtrace:
00 pc 00006488 /system/lib/libui.so
01 pc 00006719 /system/lib/libui.so (android::GraphicBuffer::free_handle()+52)
02 pc 00006813 /system/lib/libui.so (android::GraphicBuffer::~GraphicBuffer()+22)
03 pc 00006841 /system/lib/libui.so (android::GraphicBuffer::~GraphicBuffer()+4)
04 pc 0000f823 /system/lib/libutils.so (android::RefBase::decStrong(void const*) const+40)
05 pc 00003bbb /system/vendor/lib/egl/eglsubAndroid.so
06 pc 0001b5f4 /system/vendor/lib/egl/libEGL_adreno.so (egliDoDestroyEGLImage+80)
07 pc 00006c88 /system/vendor/lib/egl/libEGL_adreno.so (eglDestroyImageKHR+16)
08 pc 0000e749 /system/lib/libEGL.so (eglDestroyImageKHR+44)
Here are a couple of complete backtraces:
http://pastebin.com/S0Ax6eNp
http://pastebin.com/bGWeWruw
Not calling eglDestroyImageKHR causes a leak and when calling again the above routine, gbuffer->lock() fails with an insufficient memory error message.
Crashes for example on a galaxy S4, galaxy s2 , xperia z1 and doesn't crash in a nexus 4, nexus 7, galaxy ace 2...etc
I would appreciate any help.
-EDITED-
The only workaround I have found is to decrease the reference counter to 0 so the GraphicBuffer destructor gets called and frees the memory.
if(gBuffer->getStrongCount() > 0){
gBuffer->decStrong(gBuffer->handle);
}

I had the same issue with EGL surfaces. Since 4.3 Samsung ROMs don't deactivate the active context and surface when destroying either one. The code now looks something like this:
// This line had to be added to prevent crashes:
mEgl.eglMakeCurrent(mEglDisplay, EGL10.EGL_NO_SURFACE, EGL10.EGL_NO_SURFACE, EGL10.EGL_NO_CONTEXT);
mEgl.eglDestroyContext(mEglDisplay, mEglContext);
mEgl.eglDestroySurface(mEglDisplay, mEglSurface);
The stack trace looked fairly similar. Have you tried destroying gBuffer before calling eglDestroyImageKHR?

FWIW, in the Mozilla AndroidGraphicBuffer.cpp code, the author writes:
/**
* XXX: eglDestroyImageKHR crashes sometimes due to refcount badness (I think)
*
* If you look at egl.cpp (https://github.com/android/platform_frameworks_base/blob/master/opengl/libagl/egl.cpp#L2002)
* you can see that eglCreateImageKHR just refs the native buffer, and eglDestroyImageKHR
* just unrefs it. Somehow the ref count gets messed up and things are already destroyed
* by the time eglDestroyImageKHR gets called. For now, at least, just not calling
* eglDestroyImageKHR should be fine since we do free the GraphicBuffer below.
*
* Bug 712716
*/
and essentially does not call eglDestroyImageKHR() which is apparently OK in that context. Bug report here.
James Willcox the author of the Mozilla code is also the author of the snorp blog post.

Related

How to debug an Android stack trace with __cxa_rethrow

I am building an Android app which in the latest version has a lot of crash reports like the following one on google play dash. It consists of several libraries cross compiled with android-ndk.
Starting from frame #05 it halfway makes sense to me. What I wonder is how to go for the other half and what to make from the upper frames.
Trace:
#00 pc 0000000000083134 /apex/com.android.runtime/lib64/bionic/libc.so (abort+160)
#01 pc 000000000017cf00 /data/app/[...]==/lib/arm64/libqca-qt5_arm64-v8a.so
#02 pc 000000000017d070 /data/app/[...]==/lib/arm64/libqca-qt5_arm64-v8a.so
#03 pc 0000000000179f48 /data/app/[...]==/lib/arm64/libqca-qt5_arm64-v8a.so
#04 pc 0000000000179850 /data/app/[...]==/lib/arm64/libqca-qt5_arm64-v8a.so (__cxa_rethrow+196)
#05 pc 0000000000c0e10c /data/app/[...]==/lib/arm64/libqgis_core_arm64-v8a.so (QgsCoordinateTransform::transformInPlace(double&, double&, double&, QgsCoordinateTransform::TransformDirection) const+300)
#06 pc 00000000000340d8 /data/app/[...]==/lib/arm64/libqfield_qgsquick_arm64-v8a.so (QgsQuickCoordinateTransformer::updatePosition()+136)
#07 pc 0000000000034350 /data/app/[...]==/lib/arm64/libqfield_qgsquick_arm64-v8a.so (QgsQuickCoordinateTransformer::setDestinationCrs(QgsCoordinateReferenceSystem const&)+176)
#08 pc 0000000000028488 /data/app/[...]==/lib/arm64/libqfield_qgsquick_arm64-v8a.so
#09 pc 0000000000028a18 /data/app/[...]==/lib/arm64/libqfield_qgsquick_arm64-v8a.so (QgsQuickCoordinateTransformer::qt_metacall(QMetaObject::Call, int, void**)+316)
#10 pc 00000000002f36a8 /data/app/[...]==/lib/arm64/libQt5Qml_arm64-v8a.so (QV4::QQmlValueTypeWrapper::write(QObject*, int) const+180)
What I know: QgsCoordinateTransform::transformInPlace can throw a QgsCsException which is caught and handled inside updatePosition().
try
{
mCoordinateTransform.transformInPlace( x, y, z );
}
catch ( const QgsCsException &exp )
{
QgsDebugMsg( exp.what() );
}
Given that it's handled I'm not sure how that's related to a crash, nonetheless I think it could be interesting information.
What I can't make sense of is how libqca-qt5 comes into play, this is never used inside transformInPlace. Might it have some magic in place to handle unhandled exceptions (Can something be extracted from __cxa_rethrow)?
The only idea that comes to my mind is that it's not a QgsCsException but another (unhandled) exception that's raised and causes the crash. This would be an easy fix, but since I'm not able to reproduce this and all I have is this stack trace here and the only way to test is to ship a new apk and wait for reports to come in. This is a long roundtrip for feedback, so I'm very interested in either getting things right directly or at least improving the debug possibilities to fix it in two rounds.
So the question: what can be read from a stack trace like this and how to go about debugging this?
After having received feedback from a user about how exactly to reproduce this crash, it was possible to resolve this with the confidence of a positive test.
As assumed, it was a different exception raised by another library which was called from within transformInPlace().
Adding the following line fixed the crash and helped at least as a band aid.
try
{
mCoordinateTransform.transformInPlace( x, y, z );
}
catch ( const QgsCsException &exp )
{
QgsDebugMsg( exp.what() );
}
catch ( ... )
{
QgsDebugMsg( "Unknown exception caught" );
}
It is still unclear to me, how qca is involved in this stack trace. And if there is any code to handle unhandled exceptions or if it is just a random library defining some symbols or if there are other mechanisms at play. I assume __cxa_rethrow plays a role, after all it's related to exception handling. I'd still be happy if someone could shed some light. Meanwhile, for future readers, it's good to know that adding a catch all clause is a feasible approach here.

Android MSM kernel: copy_to_user fails

I'm writing a kernel driver for a Linux kernel running on Android devices (Nexus 5X).
I have a kernel buffer and I want to expose a device to read from it. I can read and write from the kernel buffer but I cannot write to the userspace buffer received from the read syscall. The very strange thing is that copy_to_user works only for less than 128 bytes... it makes no sense to me.
The code is the following ( truncated ):
static ssize_t dev_read(struct file *filep, char __user *buffer, size_t len, loff_t *offset){
unsigned long sent;
// ...
pr_err("MYLOGGER: copying from buffer: head=%d, tail=%d, cnt=%d, sent=%lu, access=%lu\n",
head, tail, cnt, sent,
access_ok(VERIFY_WRITE, buffer, sent));
if(sent >= 1) {
sent -= copy_to_user(buffer, mybuf + tail, sent);
pr_err("MYLOGGER: sent %lu bytes\n", sent);
// ...
}
// ...
}
The output is the following:
[ 56.476834] MYLOGGER: device opened
[ 56.476861] MYLOGGER: reading from buffer
[ 56.476872] MYLOGGER: copying from buffer: head=5666644, tail=0, cnt=5666644, sent=4096, access=1
[ 56.476882] MYLOGGER: sent 0 bytes
As you can see from the log sent is 4096, no integer overflow here.
When using dd I'm able to read up to 128 bytes per call ( dd if=/dev/mylog bs=128 ). I think that when using more than 128 bytes dd uses a buffer from the heap and the kernel cannot access it anymore, which is what I cannot understand.
I'm using copy_to_user from the read syscall handler, I've also printed the current->pid and it is the same process.
The kernel sources can be found from google android sources.
The function copy_to_user is defined at arch/arm64/include/asm/uaccess.h and the __copy_to_user can be found in arch/arm64/lib/copy_to_user.S.
Thank you for your time, I hope to get rid of this madness with your precious help.
-- EDIT --
I've wrote a small snippet to get the vm_area_struct of the destination userspace buffer and I print out the permissions, this is the result:
MYLOGGER: buffer belongs to vm_area with permissions rw-p
So that address should be writable...
-- EDIT --
I've written more debugging code, logging the state of the memory page used by the userspace buffer.
MYLOGGER: page=(0x7e3782d000-0x7e3782e000) present=1
Long story short it works when the page is present and will not cause a page fault. This is insanely weird, the page fault shall be managed by the virtual memory allocator that would load the page into the main memory...
For some reason, if the page is not present in memory the kernel will not fetch it.
My best guess is the __copy_to_user assembly function exception handler, which returns the number of uncopied bytes.
This exception handler is executed before the virtual memory page fault callback. Thus you won't be able to write to userspace unless the pages are already present in memory.
My current workaround is to preload those pages using get_user_pages.
I hope that this helps someone else :)
The problem was that I held a spin_lock.
copy_{to,from}_user shall never be called while holding a spin_lock.
Using a mutex solves the problem.
I feel so stupid to had wasted days on this...

Min undequeued buffer count exceeded

I am using a SurfaceTexture to get preview frames in the following way.
First, I set a preview texture:
camera.setPreviewTexture(new SurfaceTexture(0));
Then, just before starting the preview and then each time onPreviewFrame is called, I set the callback buffer like this:
camera.addCallbackBuffer(buffer);
camera.setPreviewCallbackWithBuffer(this);
It works. Sometimes, I take a picture using camera.takePicture(null, null, callback), which results in calling onPictureTaken successfully. The image is saved. Since I want to restart the preview after the picture has been taken, I do the following:
try
{
camera.setPreviewTexture(new SurfaceTexture(0));
camera.startPreview();
}
...
The preview restarts and everything seems to be fine. But the following error is reported in my Logcat, seemingly after the preview has be restarted:
E/BufferQueue﹕ [unnamed-5682-5] dequeueBuffer: min undequeued buffer count (2) exceeded (dequeued=5 undequeudCount=1)
Am I missing something? Should I release the old texture at some point?
Configuration: Samsung Galaxy S4, Samsung Galaxy S5, Nexus 5, running on Android KitKat.
EDIT: I am not sure wether it is linked or not, but after a while, my App does not take pictures anymore and the following messages appear continuously in my Logcat:
E/LocSvc_api_v02( 318): I/---> locClientSendReq line 2332 QMI_LOC_INJECT_SENSOR_DATA_REQ_V02
E/gsiff_dmn( 318): I/loc_api_resp_ind_callback: Received LocAPI Resp ind = 77
E/LocSvc_api_v02( 318): D/loc_sync_process_ind:172]: loc_sync_array not in use
E/LocSvc_utils_q( 318): D/msg_q_rcv: Received message 0xB899D940 rv = 0
E/gsiff_dmn( 318): I/gsiff_data_task: Handling message type = 4
E/gsiff_dmn( 318): I/gsiff_daemon_inject_sensor_data_handler: Sending Sensor Data to LocApi. opaque_id = 1226
E/LocSvc_api_v02( 318): I/---> locClientSendReq line 2332 QMI_LOC_INJECT_SENSOR_DATA_REQ_V02
E/gsiff_dmn( 318): I/loc_api_resp_ind_callback: Received LocAPI Resp ind = 77
E/LocSvc_api_v02( 318): D/loc_sync_process_ind:172]: loc_sync_array not in use
E/mm-camera( 284): [cpp_hardware_process_frame:997] Too many cpp frames dropped!!
E/mm-camera( 284): cpp_thread_handle_process_buf_event:224] get buffer fail. drop frame id:1845 identity:0x20002
W/QCamera2HWI( 269): [CHECK_BUF_LOCK] Too many preview buffer is locked by surfaceflinger : 29
E/mm-camera( 284): [cpp_hardware_process_frame:997] Too many cpp frames dropped!!
E/mm-camera( 284): cpp_thread_handle_process_buf_event:224] get buffer fail. drop frame id:1846 identity:0x20002
EDIT 2: If, instead of a new SurfaceTexture(0), I always use the same SurfaceTexture (that I keep as a member), then some errors disappear and the App continues to work. The min undequeued buffer count exceeded error and the Too many preview buffer is locked by surfaceflinger warning stay.
It seems that the camera is holding something in its buffer that is not dequeued by your activity. You have to find the way to clear the camera buffer when you start a new preview.
As you can find in Android documentation about Camera class :
The buffer queue will be cleared if this method [setPreviewCallbackWithBuffer] is called with a null callback, setPreviewCallback(Camera.PreviewCallback) is called, or setOneShotPreviewCallback(Camera.PreviewCallback) is called.
So maybe it is enough to remove your callback when you take a picture and then reinstatiate it when you restart your preview.

BX LR and ARM return from exception instructions (RFE and ERET)

In ARM architecture, returning from exception can be done in two ways which I know,(there might be others). But main logic is to modify PC, which will make processor trigger into mode set in CPSR.
So pop {...,pc} would make a switch to say user from supervisor
or mov pc,lr would do the same.
My Question is, would a BX lr make the switch ?
Assuming I am handling IRQ and I call a assembly routine say do_IRQ and then return from do_IRQ is via BX LR. Would this make the code after bl do_IRQ in my fault handler irrelevant ?
irq_fault_handler:
push {lr}
push {ro-r12}
mrs r0, spsr
push {r0}
bl do_IRQ
pop {r0}
msr cpsr, r0
pop {r0,-r12}
pop {lr}
subs pc, lr, #4
do_IRQ:
...
BX LR
It does not make it irrelevant since you called do_IRQ with bl instead of just a branch so you've already overwritten lr. Furthermore, even if you just did a branch, your stack would be messed up (you push r0-r12 but never pop them before returning).
Also, the code you show doesn't seem to return from the exception correctly either. Most of the time, when you return from an exception, you want to restore the program status registers as well as the registers which a mov pc, lr won't do nor will a bx lr.
Furthermore, the address contained in lr will actually be offsetted by a few words (it varies depending on the type of exception) so you're not returning to the correct instruction anyway. For IRQ, I believe it is off by 1 word.
The recommended way to return from an IRQ exception is a subs pc, lr, #4 instruction (after you've pop'd all your registers beforehand of course) which is special in that it also restores the CPSR.

PC and LR in same function in Android Kernel

I am facing a problem in which the PC1 and LR2 both are pointing with in the function cpuacct_charge() in the kernel's sched.c. Are there any scenario's in which this might happen? My analysis shows no recursion in the cpuacct_charge() function. I cannot provide the code. However, any scenario's when this happens would be a big help.
For Clarification : The value of PC and LR points to a different locations in function:
void cpuacct_charge(struct task_struct *tsk, u64 cputime)
Note 1: PC - Program Counter
Note 2: LR - Link Register
When a function returns it basically does a branch to the address in the link register.
So, presumably you've paused the program right after a function return.

Categories

Resources