Secure Playback: Crash observed in MediaCodec

Secure Playback: Crash observed in MediaCodec - android

I am working on enabling secure playback on Lollipop. I am using ExoPlayer to validate the usecase. I am able to create a secure OMX video decoder component(H264.secure).
However, after the creation, I am facing a crash in MediaCodec as shown below
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0xf0300000
eax f0300000 ebx ec1da038 ecx 00000005 edx 00000002
esi ec3ca200 edi f325b148
xcs 00000023 xds 0000002b xes 0000002b xfs 000000bf xss 0000002b
eip ec0e1655 ebp e05ffb28 esp e05ffa90 flags 00210202
#00 pc 000b5655 /system/lib/libstagefright.so (android::MediaCodec::onQueueInputBuffer(android::sp<android::AMessage> const&)+1061)
#01 pc 000b7b16 /system/lib/libstagefright.so (android::MediaCodec::onMessageReceived(android::sp<android::AMessage> const&)+1894)
#02 pc 0000e039 /system/lib/libstagefright_foundation.so (android::ALooperRoster::deliverMessage(android::sp<android::AMessage> const&)+345)
#03 pc 0000d3d0 /system/lib/libstagefright_foundation.so (android::ALooper::loop()+256)
#04 pc 0000d4ed /system/lib/libstagefright_foundation.so (android::ALooper::LooperThread::threadLoop()+29)
#05 pc 000169de /system/lib/libutils.so (android::Thread::_threadLoop(void*)+398)
#06 pc 0006fe92 /system/lib/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+98)
#07 pc 000160fa /system/lib/libutils.so (thread_data_t::trampoline(thread_data_t const*)+122)
After some analysis i found that the crash occurs at function ACodec::allocateBuffersOnPort
I am a newbie on android. Any pointers to debug this would be helpful

To summarize, the issue is specific to a case when kFlagIsSecure is set and the creation of OMX buffer in a different process as compared to MediaCodec, which leads to a segmentation fault when accessed in MediaCodec. Please refer below for the detailed background about this issue.
To overcome this issue, I would recommend the following changes in ACodec
size_t totalSize = def.nBufferCountActual * def.nBufferSize;
mDealer[portIndex] = new MemoryDealer(totalSize, "ACodec");
/* Check if the component resides in same pid as ACodec */
bool isLocalComponent = mOMX->livesLocally(mNode, getpid()); // New Code
for (OMX_U32 i = 0; i < def.nBufferCountActual; ++i) {
sp<IMemory> mem = mDealer[portIndex]->allocate(def.nBufferSize);
...
...
and modify the check for allocation as below
-- if ((portIndex == kPortIndexInput && (mFlags & kFlagIsSecure))
-- || mUseMetadataOnEncoderOutput) {
// Modified check
++if (isLocalComponent && ((portIndex == kPortIndexInput && (mFlags & kFlagIsSecure))
++ || mUseMetadataOnEncoderOutput)) {
P.S: I would recommend you to check with Google about this solution.
Background:
ExoPlayer creates a video decoder as a MediaCodec component. When a new MediaCodec component is created, the corresponding object in JNI is created. Please note that there is no interaction with MediaPlayerService in this process.
MediaCodec internally creates an ACodec which interacts with the OMX core and subsequently OMX component.
ACodec is created in the same context as MediaCodec. When OMXClient::connect is invoked, the OMX handle is created in the MediaPlayer service's context. Hence, the process id of OMX component and ACodec would be different.
For secure input buffers, there is a special handling in ACodec::allocateBuffersOnPorts. Here, the buffer pointer returned from allocateBuffer is wrapped as ABuffer and queued for consumption. In my view, there is a potential issue in the current implementation as below.
ACodec::allocateBufferOnPort calls mOMX->allocateBuffer. mOMX is of type IOMX i.e. there is a binder interaction involved. Please do note this variable &buffer_data which will translate to ptr in ACodec::allocateBufferOnPorts layer as this is critical for the following part.
In OMXNodeInstance which actually runs in MediaPlayerService's context, a traditional OMX_AllocateBuffer is called. In OMXNodeInstance::allocateBuffer, after the allocation *buffer_data is initialized with header->pBuffer which is basically a local pointer allocated by the OMX component potentially through a simple malloc call.
When the control returns, the same pointer is written into the binder interface here and subsequently read back here. So, when the control comes out mOMX->allocateBuffer, the value of ptr is equivalent to header->pBuffer allocated by OMX component, but both of which are in 2 different processes.
Hence, when ACodec creates the ABuffer based on this ptr which is then accessed in MediaCodec, there will be an access violation as the address was created in a different process' context as compared to MediaCodec's process id.

Related

How to debug an Android stack trace with __cxa_rethrow

I am building an Android app which in the latest version has a lot of crash reports like the following one on google play dash. It consists of several libraries cross compiled with android-ndk.
Starting from frame #05 it halfway makes sense to me. What I wonder is how to go for the other half and what to make from the upper frames.
Trace:
#00 pc 0000000000083134 /apex/com.android.runtime/lib64/bionic/libc.so (abort+160)
#01 pc 000000000017cf00 /data/app/[...]==/lib/arm64/libqca-qt5_arm64-v8a.so
#02 pc 000000000017d070 /data/app/[...]==/lib/arm64/libqca-qt5_arm64-v8a.so
#03 pc 0000000000179f48 /data/app/[...]==/lib/arm64/libqca-qt5_arm64-v8a.so
#04 pc 0000000000179850 /data/app/[...]==/lib/arm64/libqca-qt5_arm64-v8a.so (__cxa_rethrow+196)
#05 pc 0000000000c0e10c /data/app/[...]==/lib/arm64/libqgis_core_arm64-v8a.so (QgsCoordinateTransform::transformInPlace(double&, double&, double&, QgsCoordinateTransform::TransformDirection) const+300)
#06 pc 00000000000340d8 /data/app/[...]==/lib/arm64/libqfield_qgsquick_arm64-v8a.so (QgsQuickCoordinateTransformer::updatePosition()+136)
#07 pc 0000000000034350 /data/app/[...]==/lib/arm64/libqfield_qgsquick_arm64-v8a.so (QgsQuickCoordinateTransformer::setDestinationCrs(QgsCoordinateReferenceSystem const&)+176)
#08 pc 0000000000028488 /data/app/[...]==/lib/arm64/libqfield_qgsquick_arm64-v8a.so
#09 pc 0000000000028a18 /data/app/[...]==/lib/arm64/libqfield_qgsquick_arm64-v8a.so (QgsQuickCoordinateTransformer::qt_metacall(QMetaObject::Call, int, void**)+316)
#10 pc 00000000002f36a8 /data/app/[...]==/lib/arm64/libQt5Qml_arm64-v8a.so (QV4::QQmlValueTypeWrapper::write(QObject*, int) const+180)
What I know: QgsCoordinateTransform::transformInPlace can throw a QgsCsException which is caught and handled inside updatePosition().
try
{
mCoordinateTransform.transformInPlace( x, y, z );
}
catch ( const QgsCsException &exp )
{
QgsDebugMsg( exp.what() );
}
Given that it's handled I'm not sure how that's related to a crash, nonetheless I think it could be interesting information.
What I can't make sense of is how libqca-qt5 comes into play, this is never used inside transformInPlace. Might it have some magic in place to handle unhandled exceptions (Can something be extracted from __cxa_rethrow)?
The only idea that comes to my mind is that it's not a QgsCsException but another (unhandled) exception that's raised and causes the crash. This would be an easy fix, but since I'm not able to reproduce this and all I have is this stack trace here and the only way to test is to ship a new apk and wait for reports to come in. This is a long roundtrip for feedback, so I'm very interested in either getting things right directly or at least improving the debug possibilities to fix it in two rounds.
So the question: what can be read from a stack trace like this and how to go about debugging this?

After having received feedback from a user about how exactly to reproduce this crash, it was possible to resolve this with the confidence of a positive test.
As assumed, it was a different exception raised by another library which was called from within transformInPlace().
Adding the following line fixed the crash and helped at least as a band aid.
try
{
mCoordinateTransform.transformInPlace( x, y, z );
}
catch ( const QgsCsException &exp )
{
QgsDebugMsg( exp.what() );
}
catch ( ... )
{
QgsDebugMsg( "Unknown exception caught" );
}
It is still unclear to me, how qca is involved in this stack trace. And if there is any code to handle unhandled exceptions or if it is just a random library defining some symbols or if there are other mechanisms at play. I assume __cxa_rethrow plays a role, after all it's related to exception handling. I'd still be happy if someone could shed some light. Meanwhile, for future readers, it's good to know that adding a catch all clause is a feasible approach here.

Meaning of /dev/ashmem (deleted) records in process maps

I'm wondering what is the meaning of next records in /proc/<pid>/maps:
...
cdbc2000-cdbc6000 rw-p 00000000 00:04 1290888 /dev/ashmem/dalvik-large object space allocation (deleted)
cdbc6000-cdbcf000 rw-p 00000000 00:04 1290887 /dev/ashmem/dalvik-large object space allocation (deleted)
cdbcf000-cdbdf000 rw-p 00000000 00:04 1290886 /dev/ashmem/dalvik-large object space allocation (deleted)
...
Especially what does (deleted) mean? Is this memory range is held by something within process and should be released later or these records are just markers and there is no associated physical memory ? I.e. may plain call to mmap() return mapping that intersects such ranges?
P.S. range names may differ slightly but they always contain /dev/ashmem ... (deleted).

android crash on eglDestroyImageKHR using GraphicBuffer

I'm copying data to the GraphicBuffer using the following code:
uint8_t *ptr;
sp<GraphicBuffer> gBuffer = new GraphicBuffer(width,height,format,usage);
gBuffer->lock(GRALLOC_USAGE_SW_WRITE_OFTEN, (void**)(&ptr));
//Copy Data
gBuffer->unlock();
EGLClient clientBuffer = (EGLClientBuffer)gBuffer->getNativeBuffer();
EGLImageKHR img = eglCreateImageKHR(eglGetDisplay(EGL_DEFAULT_DISPLAY), EGL_NO_CONTEXT, EGL_NATIVE_BUFFER_ANDROID,clientBuffer, NULL);
glBindTexture(GL_TEXTURE_EXTERNAL_OES, textureHandle);
glEGLImageTargetTexture2DOES(GL_TEXTURE_EXTERNAL_OES, (GLeglImageOES)img);
//Finished using img, Crash Here:
eglDestroyImageKHR(eglGetDisplay(EGL_DEFAULT_DISPLAY), img);
And the problem comes when calling eglDestroyImageKHR which crashes in some devices and some others not. This is the backtrace:
00 pc 00006488 /system/lib/libui.so
01 pc 00006719 /system/lib/libui.so (android::GraphicBuffer::free_handle()+52)
02 pc 00006813 /system/lib/libui.so (android::GraphicBuffer::~GraphicBuffer()+22)
03 pc 00006841 /system/lib/libui.so (android::GraphicBuffer::~GraphicBuffer()+4)
04 pc 0000f823 /system/lib/libutils.so (android::RefBase::decStrong(void const*) const+40)
05 pc 00003bbb /system/vendor/lib/egl/eglsubAndroid.so
06 pc 0001b5f4 /system/vendor/lib/egl/libEGL_adreno.so (egliDoDestroyEGLImage+80)
07 pc 00006c88 /system/vendor/lib/egl/libEGL_adreno.so (eglDestroyImageKHR+16)
08 pc 0000e749 /system/lib/libEGL.so (eglDestroyImageKHR+44)
Here are a couple of complete backtraces:
http://pastebin.com/S0Ax6eNp
http://pastebin.com/bGWeWruw
Not calling eglDestroyImageKHR causes a leak and when calling again the above routine, gbuffer->lock() fails with an insufficient memory error message.
Crashes for example on a galaxy S4, galaxy s2 , xperia z1 and doesn't crash in a nexus 4, nexus 7, galaxy ace 2...etc
I would appreciate any help.
-EDITED-
The only workaround I have found is to decrease the reference counter to 0 so the GraphicBuffer destructor gets called and frees the memory.
if(gBuffer->getStrongCount() > 0){
gBuffer->decStrong(gBuffer->handle);
}

I had the same issue with EGL surfaces. Since 4.3 Samsung ROMs don't deactivate the active context and surface when destroying either one. The code now looks something like this:
// This line had to be added to prevent crashes:
mEgl.eglMakeCurrent(mEglDisplay, EGL10.EGL_NO_SURFACE, EGL10.EGL_NO_SURFACE, EGL10.EGL_NO_CONTEXT);
mEgl.eglDestroyContext(mEglDisplay, mEglContext);
mEgl.eglDestroySurface(mEglDisplay, mEglSurface);
The stack trace looked fairly similar. Have you tried destroying gBuffer before calling eglDestroyImageKHR?

FWIW, in the Mozilla AndroidGraphicBuffer.cpp code, the author writes:
/**
* XXX: eglDestroyImageKHR crashes sometimes due to refcount badness (I think)
*
* If you look at egl.cpp (https://github.com/android/platform_frameworks_base/blob/master/opengl/libagl/egl.cpp#L2002)
* you can see that eglCreateImageKHR just refs the native buffer, and eglDestroyImageKHR
* just unrefs it. Somehow the ref count gets messed up and things are already destroyed
* by the time eglDestroyImageKHR gets called. For now, at least, just not calling
* eglDestroyImageKHR should be fine since we do free the GraphicBuffer below.
*
* Bug 712716
*/
and essentially does not call eglDestroyImageKHR() which is apparently OK in that context. Bug report here.
James Willcox the author of the Mozilla code is also the author of the snorp blog post.

SIGSEGV in Canvas.clipPath at the second clipPath

I have a ASUS Nexus 7 running Android 4.2.2 My application is generatng a SIGSEGV in sk_malloc_flags when running the following code:
static Picture createDrawing() {
Path firstPath = new Path();
firstPath.moveTo(3058, 12365);
firstPath.lineTo(8499, 3038);
firstPath.lineTo(9494, 3619);
firstPath.lineTo(4053, 12946);
firstPath.close();
Path fourthPath = new Path();
fourthPath.moveTo(3065, 12332);
fourthPath.lineTo(4053, 12926);
fourthPath.lineTo(9615, 3669);
fourthPath.lineTo(8628, 3075);
fourthPath.close();
Picture picture = new Picture();
Canvas canvas = picture.beginRecording(12240, 15840);
canvas.clipPath(firstPath);
canvas.clipPath(fourthPath); << SIGSEGV occurs here
picture.endRecording();
return picture;
}
The SIGSEGV is reported as follows:
I/DEBUG ( 124): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr deadbaad
I/DEBUG ( 124): r0 00000027 r1 deadbaad r2 4017f258 r3 00000000
I/DEBUG ( 124): r4 00000000 r5 bed72434 r6 bed72508 r7 1be773bc
I/DEBUG ( 124): r8 1be730f9 r9 000042c3 sl 00000001 fp 67185010
I/DEBUG ( 124): ip 40443f3c sp bed72430 lr 401522f9 pc 4014e992 cpsr 60000030
...
I/DEBUG ( 124): backtrace:
I/DEBUG ( 124): #00 pc 0001a992 /system/lib/libc.so
I/DEBUG ( 124): #01 pc 00018070 /system/lib/libc.so (abort+4)
I/DEBUG ( 124): #02 pc 000be4b4 /system/lib/libskia.so (sk_malloc_flags(unsigned int, unsigned int)+28)
I/DEBUG ( 124): #03 pc 0008afc0 /system/lib/libskia.so (SkRegion::op(SkRegion const&, SkRegion const&, SkRegion::Op)+1716)
I/DEBUG ( 124): #04 pc 00089448 /system/lib/libskia.so (SkRasterClip::op(SkRasterClip const&, SkRegion::Op)+128)
I have obviously simplified the code to that shown above, the full application uses transforms, etc based on some input data to generate the values. Are they any suggestions as to how to fix this without implementing my own code for clipping in the general case?

This looks like an ill fated corner case for clipPath handling.
canvas.clipPath(fourthPath);
causes a merge with previous firstPath however since these are complex (non-rectangular) shapes system tries to draw them as scanlines and merge it afterwards. To make this merge, it needs to allocate some memory but as you can see in SkRegion.cpp, it goes for heuristic worst case.
static int compute_worst_case_count(int a_count, int b_count) {
int a_intervals = count_to_intervals(a_count);
int b_intervals = count_to_intervals(b_count);
// Our heuristic worst case is ai * (bi + 1) + bi * (ai + 1)
int intervals = 2 * a_intervals * b_intervals + a_intervals + b_intervals;
// convert back to number of RunType values
return intervals_to_count(intervals);
}
For your paths this worst_case_count becomes close to 2GB and you get an abort due to not getting that big memory from malloc.
I couldn't see any way out of it using different parameters. Anything which avoids merging clipPaths must help, like calling clipPath with Region.Op.REPLACE. Region.Op.INTERSECT should fail too.
I would concentrate on avoiding calling clipPath with a complex path on top of a complex path.
If it suits your use case, you can use same Path object for setting canvas.clipPath(). For example:
Picture picture = new Picture();
Canvas canvas = picture.beginRecording(12240, 15840);
Path path = new Path();
path.moveTo(3058, 12365);
path.lineTo(8499, 3038);
path.lineTo(9494, 3619);
path.lineTo(4053, 12946);
path.close();
canvas.clipPath(path);
// do stuff with canvas
path.moveTo(3065, 12332);
path.lineTo(4053, 12926);
path.lineTo(9615, 3669);
path.lineTo(8628, 3075);
path.close();
canvas.clipPath(path, Region.Op.REPLACE);
// do more stuff with canvas
picture.endRecording();
return picture;
Since path contains previous drawings, you can just continue updating it. If this is not applicable to your case, you either need to make those numbers smaller or partition your complex regions into smaller ones to avoid worst case heuristic to become too big.

Ok, let me put this in an answer, since it looks quite logical to me:
To see if the problem is comming from the consequtive calls to clipPath(Path), try removing the call at first or putting canvas.clipPath(fourthPath, Region.Op.REPLACE); in the place of canvas.clipPath(fourthPath); and see if that's the cause.
Another thing I can think of, is if you draw them separately:
Picture picture = new Picture();
Canvas canvas = picture.beginRecording(12240, 15840);
canvas.clipPath(firstPath);
picture.endRecording();
canvas = picture.beginRecording(12240, 15840);
canvas.clipPath(fourthPath);
picture.endRecording();

It looks like Canvas.clipPath() is not supported with hardware acceleration, at least doc says that:
http://developer.android.com/guide/topics/graphics/hardware-accel.html#unsupported
The only workaround that comes to my mind is to turn off hardware acceleration.
You can do it on:
Application level
<application android:hardwareAccelerated="true" ...>
in manifest
Activity level
android:hardwareAccelerated="false"
for activity in manifest
View level
view.setLayerType(View.LAYER_TYPE_SOFTWARE, null);
for an individual view at runtime
Docs:
http://developer.android.com/guide/topics/graphics/hardware-accel.html#controlling

PC and LR in same function in Android Kernel

I am facing a problem in which the PC1 and LR2 both are pointing with in the function cpuacct_charge() in the kernel's sched.c. Are there any scenario's in which this might happen? My analysis shows no recursion in the cpuacct_charge() function. I cannot provide the code. However, any scenario's when this happens would be a big help.
For Clarification : The value of PC and LR points to a different locations in function:
void cpuacct_charge(struct task_struct *tsk, u64 cputime)
Note 1: PC - Program Counter
Note 2: LR - Link Register

When a function returns it basically does a branch to the address in the link register.
So, presumably you've paused the program right after a function return.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.

Secure Playback: Crash observed in MediaCodec - android

Related

How to debug an Android stack trace with __cxa_rethrow

Meaning of /dev/ashmem (deleted) records in process maps

android crash on eglDestroyImageKHR using GraphicBuffer

SIGSEGV in Canvas.clipPath at the second clipPath

PC and LR in same function in Android Kernel

Categories

Resources