I'm adding a black (0) padding around Region of interest (center) of NV21 frame got from Android CameraPreview callbacks in a thread.
To avoid overhead of conversion to RGB/Bitmap and reverse, I'm trying to manipulate NV21 byte array directly but this involves nested loops which is also making preview/processing slow.
This is my run() method sending frames to detector after calling method blackNonROI.
public void run() {
Frame outputFrame;
ByteBuffer data;
while (true) {
synchronized (mLock) {
while (mActive && (mPendingFrameData == null))
try{ mLock.wait(); }catch(InterruptedException e){ return; }
if (!mActive) { return; }
// Region of Interest
mPendingFrameData = blackNonROI(mPendingFrameData.array(),mPreviewSize.getWidth(),mPreviewSize.getHeight(),300,300);
outputFrame = new Frame.Builder().setImageData(mPendingFrameData, mPreviewSize.getWidth(),mPreviewSize.getHeight(), ImageFormat.NV21).setId(mPendingFrameId).setTimestampMillis(mPendingTimeMillis).setRotation(mRotation).build();
data = mPendingFrameData;
mPendingFrameData = null;
}
try {
mDetector.receiveFrame(outputFrame);
} catch (Throwable t) {
} finally {
mCamera.addCallbackBuffer(data.array());
}
}
}
Following is the method blackNonROI
private ByteBuffer blackNonROI(byte[] yuvData, int width, int height, int roiWidth, int roiHeight){
int hozMargin = (width - roiWidth) / 2;
int verMargin = (height - roiHeight) / 2;
// top/bottom of center
for(int x=0; x<width; x++){
for(int y=0; y<verMargin; y++)
yuvData[y * width + x] = 0;
for(int y=height-verMargin; y<height; y++)
yuvData[y * width + x] = 0;
}
// left/right of center
for(int y=verMargin; y<height-verMargin; y++){
for (int x = 0; x < hozMargin; x++)
yuvData[y * width + x] = 0;
for (int x = width-hozMargin; x < width; x++)
yuvData[y * width + x] = 0;
}
return ByteBuffer.wrap(yuvData);
}
Example output frame
Note that I'm not cropping the image, just padding black pixels around specified center of image to maintain coordinated for further activities. This works like it should but it's not fast enough and causing lag in preview and frames processing.
Can I further improve byte array update?
Is time/place for calling blackNonROI fine?
Any other way / lib for doing it more efficiently?
My simple pixel iteration is so slow, how YUV/Bitmap libraries do complex things so fast? do they use GPU?
Edit:
I've replaced both for loops with following code, and it's pretty much fast now (Please refer to greeble31's answer for details):
// full top padding
from = 0;
to = (verMargin-1)*width + width;
Arrays.fill(yuvData,from,to,(byte)1);
// full bottom padding
from = (height-verMargin)*width;
to = (height-1)*width + width;
Arrays.fill(yuvData,from,to,(byte)1);
for(int y=verMargin; y<height-verMargin; y++) {
// left-middle padding
from = y*width;
to = y*width + hozMargin;
Arrays.fill(yuvData,from,to,(byte)1);
// right-middle padding
from = y*width + width-hozMargin;
to = y*width + width;
Arrays.fill(yuvData,from,to,(byte)1);
}
1. Yes. To understand why, let's take a look at the bytecode Android Studio produces for your "left/right of center" nested loop:
(Annotated excerpt from a release build of blackNonROI, AS 3.2.1):
:goto_27
sub-int v2, p2, p4 ;for(int y=verMargin; y<height-verMargin; y++)
if-ge v1, v2, :cond_45
const/4 v2, 0x0
:goto_2c
if-ge v2, p3, :cond_36 ;for (int x = 0; x < hozMargin; x++)
mul-int v3, v1, p1
add-int/2addr v3, v2
.line 759
aput-byte v0, p0, v3
add-int/lit8 v2, v2, 0x1
goto :goto_2c
:cond_36
sub-int v2, p1, p3
:goto_38
if-ge v2, p1, :cond_42 ;for (int x = width-hozMargin; x < width; x++)
mul-int v3, v1, p1
add-int/2addr v3, v2
.line 761
aput-byte v0, p0, v3
add-int/lit8 v2, v2, 0x1
goto :goto_38
:cond_42
add-int/lit8 v1, v1, 0x1
goto :goto_27
.line 764
:cond_45 ;all done with the for loops!
Without bothering to decipher this whole thing line-by-line, it is clear that each of your small, inner loops is performing:
1 comparison
1 integer multiplication
1 addition
1 store
1 goto
That's a lot, when you consider that all that you really need this inner loop to do is set a certain number of successive array elements to 0.
Moreover, some of these bytecodes require multiple machine instructions to implement, so I wouldn't be surprised if you're looking at over 20 cycles, just to do a single iteration of one of the inner loops. (I haven't tested what this code looks like once it's compiled by the Dalvik VM, but I sincerely doubt it is smart enough to optimize the multiplications out of these loops.)
POSSIBLE FIXES
You could improve performance by eliminating some redundant calculations. For example, each inner loop is recalculating y * width each time. Instead, you could pre-calculate that offset, store it in a local variable (in the outer loop), and use that when calculating the indices.
When performance is absolutely critical, I will sometimes do this sort of buffer manipulation in native code. If you can be reasonably certain that mPendingFrameData is a DirectByteBuffer, this is an even more attractive option. The disadvantages are 1.) higher complexity, and 2.) less of a "safety net" if something goes wrong/crashes.
MOST APPROPRIATE FIX
In your case, the most appropriate solution is probably just to use Arrays.fill(), which is more likely to be implemented in an optimized way.
Note that the top and bottom blocks are big, contiguous chunks of memory, and can be handled by one Arrays.fill() each:
Arrays.fill(yuvData, 0, verMargin * width, 0); //top
Arrays.fill(yuvData, width * height - verMargin * width, width * height, 0); //bottom
And then the sides could be handled something like this:
for(int y=verMargin; y<height-verMargin; y++){
int offset = y * width;
Arrays.fill(yuvData, offset, offset + hozMargin, 0); //left
Arrays.fill(yuvData, offset + width, offset + width - hozMargin, 0); //right
}
There are more opportunities for optimization, here, but we're already at the point of diminishing returns. For example, since the end of each row of is adjacent to the start of the next one (in memory), you could actually combine two smaller fill() calls into a larger one that covers both the right side of row N and the left side of row N + 1. And so forth.
2. Not sure. If your preview is displaying without any corruption/tearing, then it's probably a safe place to call the function from (from a thread safety standpoint), and is therefor probably as good a place as any.
3 and 4. There could be libraries for doing this task; I don't know of any offhand, for Java-based NV21 frames. You'd have to do some format conversions, and I don't think it's be worth it. Using a GPU to do this work is excessive over-optimization, in my opinion, but it may be appropriate for some specialized applications. I'd consider going to JNI (native code) before I'd ever consider using the GPU.
I think your choice to do the manipulation directly to the NV21, instead of converting to a bitmap, is a good one (considering your needs and the fact that the task is simple enough to avoid needing a graphics library).
Obviously, the most efficient way to pass image for detection would be to pass the ROI rectangle to detector. All our image processing functions accept bounding box as a parameter.
If the black margin is used for display, consider using a black overlay mask for preview layout instead of pixel manipulation.
If pixel manipulation is inevitable, check if you can limit it to Y OK, you already do this!
If your detector works on a downscaled image (as my face recognition engine does), it may be wise to apply black out to a resized frame.
At any rate, keep your loops clean and tidy, remove all recurring calculations. Using Arrays.fill() operations may help significantly, but not dramatically.
Related
Using OpenGL ES 2.0 and Galaxy S4 phone, I have a Render Target 1024x1024 RGBA8888 where some textures are rendered each frame. I need to calculate how much red RGBA(1, 0, 0, 1) pixels was rendered on the render target (twice a second).
The main problem is that getting the texture from the GPU is very performance-expensive (~300-400 ms), and freezes are not applicable for my application.
I know about OES_shader_image_atomic extension for atomic counters (simply to increment some value when frag shader works), but it's available only in OpenGL ES 3.1 (and later), I have to stick to ES 2.0.
Is there any common solution I missed?
What you can try is to "reduce" texture in question to a significantly smaller one and read back to CPU that one (which should be less expensive performance-wise). For example, you can split your texture into squares N by N (where N is preferably is a power of two), then render a "whole screen" quad into a 1024/N by 1024/N texture with a fragment shader that sums number of red pixels in corresponding square:
sampler2D texture;
void main(void) {
vec2 offset = N * gl_FragCoord.xy;
int cnt = 0;
for (float x = 0.; x < float(N); x += 1) {
for(float y = 0.; y < float(N); y += 1) {
if (texture2D(texture, (offset + vec2(x, y)) / 1024.) == vec4(1, 0, 0, 1)) {
cnt += 1;
}
}
}
gl_FragColor = vec4((cnt % 256) / 255., ((cnt / 256) % 256) / 255., /* ... */);
}
Also remember that readPixels synchronously wait till GPU is done with all previously issued draws to the texture. So it may be beneficial to have two textures,
on each frame one is being rendered to, and the other is being read from. The next frame you swap them. That will somewhat delay obtaining the desired data, but should eliminate some freezes.
I'm new to LibGDX and was trying to implement parallax background.
Everything went good until I faced such issue: I get some stripes when scrolling background. You can see it in attached image:
So I looked deeper into an issue and figured out that this some sort of texture bleeding. But the case is that my textures already have [Linear, Nearest] filter set and TexturePacker uses duplicatePadding. Actually, I don't know any other methods to solve this issue. Please help!
Here's some of my code:
TexturePacker
TexturePacker.Settings settings = new TexturePacker.Settings();
settings.minWidth = 256;
settings.minHeight = 256;
settings.duplicatePadding = true;
TexturePacker.process(settings, "../../design", "./", "textures");
AssetLoader
textureAtlas = new TextureAtlas(Gdx.files.internal("textures.atlas"));
for (int i = 0; i < 2; i++) {
Background.skies.add(textureAtlas.findRegion("background/sky", i));
Background.skies.get(i).getTexture().setFilter(Texture.TextureFilter.Linear, Texture.TextureFilter.Nearest);
}
for (int i = 0; i < 2; i++) {
Background.clouds.add(textureAtlas.findRegion("background/cloud", i));
Background.clouds.get(i).getTexture().setFilter(Texture.TextureFilter.Linear, Texture.TextureFilter.Nearest);
}
for (int i = 0; i < 8; i++) {
Background.cities.add(textureAtlas.findRegion("background/city", i));
Background.cities.get(i).getTexture().setFilter(Texture.TextureFilter.Linear, Texture.TextureFilter.Nearest);
}
Background.moon = textureAtlas.findRegion("background/moon");
Background.forest = textureAtlas.findRegion("background/forest");
Background.road = textureAtlas.findRegion("background/road");
Background.moon.getTexture().setFilter(Texture.TextureFilter.Linear, Texture.TextureFilter.Nearest);
Background.forest.getTexture().setFilter(Texture.TextureFilter.Linear, Texture.TextureFilter.Nearest);
Background.road.getTexture().setFilter(Texture.TextureFilter.Linear, Texture.TextureFilter.Nearest);
BackgroundDrawer
private void drawParallaxTextureList(Batch batch, List<TextureAtlas.AtlasRegion> list,
float moveX, float posY) {
for (int i = 0; i < list.size(); i++) {
boolean needDraw = false;
float shift = GameScreen.VIEWPORT_WIDTH * i;
float drawX = 0.0f;
if (shift - moveX <= -(GameScreen.VIEWPORT_WIDTH)) { // If it's behind the screen
if (i == 0) { // If it's first element
if (moveX >= GameScreen.VIEWPORT_WIDTH * (list.size() - 1)) { // We need to show first after last
needDraw = true;
drawX = (GameScreen.VIEWPORT_WIDTH) - (moveX - ((GameScreen
.VIEWPORT_WIDTH) * (list.size() - 1)));
}
}
} else if (shift - moveX < (GameScreen.VIEWPORT_WIDTH - 1)) {
needDraw = true;
drawX = shift - moveX;
}
if (needDraw) {
batch.draw(list.get(i), (int) drawX, (int) posY);
}
}
}
NOTE: I don't use any camera for drawing right now. I only use FitViewport with size of 1920x1280. Also, bleeding sometimes appears even in FullHD resolution.
UPDATE: Setting both Nearest filters for minification and magification with increasing paddingX and disabling antialiasing solved issue, but final image become too ugly! Is there way to avoid disabling antialiasing? Because without it, downscale look awful.
Try to set both min and mag filters as Nearest
.setFilter(Texture.TextureFilter.Nearest, Texture.TextureFilter.Nearest);
In GUI TexturePacker there is an option to extrude graphics - it means repeating every of border pixel of texture. Then you can set both filters to Linear
.setFilter(Texture.TextureFilter.Linear, Texture.TextureFilter.Linear);
but unfortunately I cannot see this option in the TexturePacker.Settings object you are using. You can try to set Linear to both but I'm pretty sure it won't be working (Linear filter takes nearest 4 texels to generate the one so it will probably still generate issues).
Try to use GUI Texturepacker then with extrude option maybe
A few possible reasons for this artifact:
Maybe the padding is not big enough when the sprite resolution is shrunk down. Try changing your texture packer's filterMin to MipMapLinearNearest. And also try increasing the size of paddingX and paddingY.
Maybe you're seeing dim or brightened pixels at the edge of your sprite because you're not using pre-multiplied alpha and your texture's background color (where its alpha is zero) is white. Try setting premultiplyAlpha: true. If you do this, you need to also change the SpriteBatch's blend function to (GL20.GL_ONE, GL20.GL_ONE_MINUS_SRC_ALPHA) to render properly.
You seem to be rounding your sprite positions and sizes to integers when you draw them. This would work in a pixel perfect game, where you're sure the sprites are being rendered exactly at 1:1 resolution to the screen. But once the screen size does not match exactly, your rounding might produce gaps that are less than 1 pixel wide, which will look like semi-transparent pixels.
I'm writing an image processing app on android, and I'm trying to speed it up using the NDK. I have the following for-loop:
int x, y, c, idx;
const int pitch3 = pitch * 3;
float adj, result;
...
// px, py, u, u_bar are all float arrays of size nx*ny*3
// theta, tau, denom are float constants
// idx >= pitch3
for(y=1;y<ny;++y)
{
for(x=1;x<nx;++x)
{
for(c=0;c<3;++c)
{
adj = -px[idx] - py[idx] + px[idx - 3] + py[idx - pitch3];
result = ((u[idx] - tau * adj) + tau * f[idx]) * denom;
u_bar[idx] = result + theta * (result - u[idx]);
u[idx] = result;
++idx;
}
}
}
I'm wondering if it is possible to speed up this loop?
I'm thinking that using fixed-point arithmetic wouldn't do much, except on really old android phone (which I'm not going to target). Would writing it in assembly give a big improvement?
EDIT: I know I could use SIMD/NEON instructions, but they are not so common I think ...
Since you're accessing the array as a flat structure, the 3 levels of looping is only increasing the value used for idx. You can loop for (idx = pitch3; idx < nx*ny*3; idx++).
Another option is to move to fixed-point math. Do you really need more than 64 bits of dynamic range?
I'm writing a scratch card like app, and I use a SurfaceView for that.
I fill it with some kind of color and I draw some Path on it with PorterDuff.Mode.CLEAR PorterDuffXfermode. I have to identify when the user fully scratched it (the SurfaceView's canvas is fully transparent). Can anybody give me some advice, how to identify it?
I tried it with saving the coordinates of the paths, but because of the drawing stroke width I can't calculate the covered area well.
I tried to get a Bitmap from the SurfaceView's getDrawingCache method and iterate on its pixels and use the getPixel method. It doesn't work and i think it would be not an efficient way to examine the canvas.
Assuming the canvas will not be large or scalable to an arbitrary size I think looping over the pixels would be effective.
Given a canvas of large or arbitrary size I would create an array representation of the canvas and mark pixels as you go, keeping a count of how many the user has hit at least once. Then test that number against a threshold value that determines how much of the ticket must be scratched for it to be considered "scratched off". Incoming pseudo-code
const int count = size_x * size_y; // pixel count
const int threshhold = 0.8 * count // user must hit 80% of the pixels to uncover
const int finger_radius = 2; // the radias of our finger in pixels
int scratched_pixels = 0;
bit [size_x][size_y] pixels_hit; // array of pixels all initialized to 0
void OnMouseDown(int pos_x, int pos_y)
{
// calculates the mouse position in the canvas
int canvas_pos_x, canvas_pos_y = MousePosToCanvasPos(pos_x, pos_y);
for(int x = canvas_pos_x - finger_rad; x < canvas_pos_x + brush_rad; ++x)
{
for(int y = canvas_pos_y - finger_rad; y < canvas_pos_y + brush_rad; ++y)
{
int dist_x = x - canvas_pos_x;
int dist_y = y - canvas_pos_y;
if((dist_x * dist_x + dist_y * dist_y) <= brush_rad * brush_rad
&& pixels_hit[x][y] == 0)
{
++scratched_pixels;
pixels_hit[x][y] = 1;
}
}
}
}
bool IsScratched()
{
if(scratched_pixels > threshhold)
return true;
else
return false;
}
I have a requirement to display somewhat big images on an Android app.
Right now I'm using an ImageView with a source Bitmap.
I understand openGL has a certain device-independent limitation as to
how big the image dimensions can be in order for it to process it.
Is there ANY way to display these images (with fixed width, without cropping) regardless of this limit,
other than splitting the image into multiple ImageView elements?
Thank you.
UPDATE 01 Apr 2013
Still no luck so far all suggestions were to reduce image quality. One suggested it might be possible to bypass this limitation by using the CPU to do the processing instead of using the GPU (though might take more time to process).
I don't understand, is there really no way to display long images with a fixed width without reducing image quality? I bet there is, I'd love it if anyone would at least point me to the right direction.
Thanks everyone.
You can use BitmapRegionDecoder to break apart larger bitmaps (requires API level 10). I've wrote a method that will utilize this class and return a single Drawable that can be placed inside an ImageView:
private static final int MAX_SIZE = 1024;
private Drawable createLargeDrawable(int resId) throws IOException {
InputStream is = getResources().openRawResource(resId);
BitmapRegionDecoder brd = BitmapRegionDecoder.newInstance(is, true);
try {
if (brd.getWidth() <= MAX_SIZE && brd.getHeight() <= MAX_SIZE) {
return new BitmapDrawable(getResources(), is);
}
int rowCount = (int) Math.ceil((float) brd.getHeight() / (float) MAX_SIZE);
int colCount = (int) Math.ceil((float) brd.getWidth() / (float) MAX_SIZE);
BitmapDrawable[] drawables = new BitmapDrawable[rowCount * colCount];
for (int i = 0; i < rowCount; i++) {
int top = MAX_SIZE * i;
int bottom = i == rowCount - 1 ? brd.getHeight() : top + MAX_SIZE;
for (int j = 0; j < colCount; j++) {
int left = MAX_SIZE * j;
int right = j == colCount - 1 ? brd.getWidth() : left + MAX_SIZE;
Bitmap b = brd.decodeRegion(new Rect(left, top, right, bottom), null);
BitmapDrawable bd = new BitmapDrawable(getResources(), b);
bd.setGravity(Gravity.TOP | Gravity.LEFT);
drawables[i * colCount + j] = bd;
}
}
LayerDrawable ld = new LayerDrawable(drawables);
for (int i = 0; i < rowCount; i++) {
for (int j = 0; j < colCount; j++) {
ld.setLayerInset(i * colCount + j, MAX_SIZE * j, MAX_SIZE * i, 0, 0);
}
}
return ld;
}
finally {
brd.recycle();
}
}
The method will check to see if the drawable resource is smaller than MAX_SIZE (1024) in both axes. If it is, it just returns the drawable. If it's not, it will break the image apart and decode chunks of the image and place them in a LayerDrawable.
I chose 1024 because I believe most available phones will support images at least that large. If you want to find the actual texture size limit for a phone, you have to do some funky stuff through OpenGL, and it's not something I wanted to dive into.
I wasn't sure how you were accessing your images, so I assumed they were in your drawable folder. If that's not the case, it should be fairly easy to refactor the method to take in whatever parameter you need.
You can use BitmapFactoryOptions to reduce size of picture.You can use somthing like that :
BitmapFactory.Options options = new BitmapFactory.Options();
options.inSampleSize = 3; //reduce size 3 times
Have you seen how your maps working? I had made a renderer for maps once. You can use same trick to display your image.
Divide your image into square tiles (e.g. of 128x128 pixels). Create custom imageView supporting rendering from tiles. Your imageView knows which part of bitmap it should show now and displays only required tiles loading them from your sd card. Using such tile map you can display endless images.
It would help if you gave us the dimensions of your bitmap.
Please understand that OpenGL runs against natural mathematical limits.
For instance, there is a very good reason a texture in OpenGL must be 2 to the power of x. This is really the only way the math of any downscaling can be done cleanly without any remainder.
So if you give us the exact dimensions of the smallest bitmap that's giving you trouble, some of us may be able to tell you what kind of actual limit you're running up against.