I'm working on a project which ultimately uses an app to recognize spoken words via face recognition and gives feedback how good your pronunciation was.
I would like to know if there is a way to only partially get the data from the camera sensor (ROI) so not all pixels have to be parsed and processed to possibly improve framerates and lower the datastream.
I'm fairly new to android app dev, so I don't know if there is a way to call the sensor on this level or if such intervention on hardware specific elements can't be handeled by software methods.
So I would appreciate if there is anyone who could tell me if there is a way. Researching android docu didn't get me any results so far.
Thanks in advance and regards from Germany
Generally speaking, this isn't supported.
While the camera API allows for setting a level of digital zoom (for the deprecated android.hardware.Camera API) or an explicit crop region (on the current android.hardware.camera2 API), there's no guarantee that frame rate will increase when you select a smaller region.
This is because digital zoom is expected to be variable per-frame, and image sensors can generally not reconfigure their readout region that quickly. So digital zoom / crop is implemented in the camera image processing pipeline instead, and the sensor always reads out the full frame.
Some lower-resolution output configurations may set the sensor to skipping/binning modes, which does increase the maximum frame rate by reducing the number of pixels read off the sensor, but you indicate you need full resolution on a small ROI, so this doesn't help you.
Related
If I use camera2 API to capture some image I will get "final" image after image processing, so after noise reduction, color correction, some vendor algorithms and etc.
I should also be able to get raw camera image following this.
The question is can I get intermediate stages of image as well? For example let's say that raw image is stage 0, then noise reduction is stage 1 color correction stage 2 and etc. I would like to get all of those stages and present them to user in an app.
In general, no. The actual hardware processing pipelines vary a great deal between different chip manufacturers and chip versions even from the same manufacturer. Plus each Android device maker then adds their own software on top of that.
And often, it's not possible to dump outputs from every step of the process, only some of them.
So making a consistent API for fetching this isn't very feasible, and the camera2 API doesn't have support for it.
You can somewhat simulate it by turning things like noise reduction entirely off (if supported by the device) and capturing multiple images, but that of course isn't as good as multiple versions of a single capture.
In my app I'm trying to use ArCore as sort of a "camera assistant" in a custom camera view.
To be clear - I want to display images for the user in his camera and have him capture images that don't contain the AR models.
From what I understand, in order to capture an image with ArCore I'll have to use the Camera2 API which is enabled by configuring the session to use the "shared Camera".
However, I can't seem to configure the camera to use any high-end resolutions (I'm using pixel 3 so I should be able to go as high as 12MP).
In the "shared camera example", they toggle between Camera2 and ArCore (a shame there's no API for CameraX) and it has several problems:
In the ArCore mode the image is blurry (I assume that's because the depth sensor is disabled as stated in their documentation)
In the Camera2 mode I can't enhance the resolution at all.
I can't use the Camera2 API to capture an image while displaying models from ArCore.
Is this requirement at all possible at the moment?
I have not worked yet with shared camera with ARCore, but I can say a few things regarding the main point of your question.
In ARCore you can configure both CPU image size and GPU image size. You can do that by checking all available camera configurations (available through Session.getSupportedCameraConfigs(CameraConfigFilter cameraConfigFilter)) and selecting your preferred one by passing it back to the ARCore Session. On each CameraConfig you can check which CPU image size and GPU texture size you will get.
Probably you are currently using (maybe by default?) a CameraConfig with the lowest CPU image, 640x480 pixels if I remember correctly, so yes it definitely looks blurry when rendered (but nothing to do with depth sensor in this regard).
Sounds like you could just select a higher CPU image and you're good to go... but unfortunately that's not the case because that configuration applies to every frame. Getting higher resolution CPU images will result in much lower performance. When I tested this I got about 3-4 frames per second on my test device, definitely not ideal.
So now what? I think you have 2 options:
Pause the ARCore session, switch to a higher CPU image for 1 frame, get the image and switch back to the "normal" configuration.
Probably you are already getting a nice GPU image, maybe not the best due to camera Preview, but hopefully good enough? Not sure how you are rendering it, but with some OpenGL skills you can copy that texture. Not directly, of course, because of the whole GL_TEXTURE_EXTERNAL_OES thing... but rendering it onto another framebuffer and then reading the texture attached to it could work. Of course you might need to deal with texture coordinates yourself (full image vs visible area) but that's another topic.
Regarding CameraX, note that it is wrapping Camera2 API in order to provide some camera use cases so that app developers don't have to worry about the camera lifecycle. As I understand it would not be suitable for ARCore to use CameraX as I imagine they need full control of the camera.
I hope that helps a bit!
I want to get image flows with the least distortions possible(no noise reduction, etc) without having to deal with RAW outputs.
I'm working with two streams(one when using the deprecated camera), one for the preview and one for the processing. I understand the camera2 api, but am wondering what kind of upsampling/downsampling is used when fitting the sensor output to the surfaces?
More specifically, I'm working on zoomed images, and according to the camera2 documentation concerning cropping and the references:
For non-raw streams, any additional per-stream cropping will be done to maximize the final pixel area of the stream.
The whole concept is easy enough to understand, but it's also mentioned that:
Output streams use this rectangle to produce their output, cropping to a smaller region if necessary to maintain the stream's aspect ratio, then scaling the sensor input to match the output's configured resolution.
But I haven't been able to find any info about this scaling. Which method is used(filter based, bicubic, edge-directed, etc)? is there a way to get this info? and is there a way I can actually choose which one is used?
Concerning the deprecated camera, I'm guessing the zoom is just simpler, in the sense that it's probably equivalent to having SCALER_CROPPING_TYPE_CENTER_ONLY with only a limited set of crop regions corresponding to the exposed zoom ratios. But is the image scaling the same as in camera2? If someone could shed some light I'd be happy.
Real life example
Camera sensor: 5312x2988(16:9)
I want a 4x zoom so the crop region should be (1992, 1120, 1328, 747)
(btw what happens to odd sizes? for instance with SCALER_CROPPING_TYPE_CENTER_ONLY devices?)
Now I have a surface of size(1920, 1080), the crop area and the stream ratio fit, but the 1328x747 values must be transformed to fill the 1920x1080 surface. The nature of this transformation is what I want to know.
The scaling algorithm used depends on the device; generally for power efficiency and speed, scaling is done in hardware blocks usually at the end of camera image signal processor (ISP) pipeline.
Therefore, you can't generally rely on it being any particular kind of scaling or filtering. Unfortunately, if you want to understand the entire processing pipeline, you have to start with RAW and implement it yourself.
If you're on the same device, the old camera API and the new camera2 API talk to the same hardware abstraction layer, and the same hardware scalers, so the scaling output will generally match exactly for the same resolution. (with the exception of LEGACY-level devices, where camera2 may need additional GPU-based scaling, which will be bilinear downsampling - but you don't really know when this would apply).
The requirement is to create an Android application running on one specific mobile device that records video of a human eye pupil dilating in response to a bright light (which is physically attached to the mobile device). The video is then post-processed frame by frame on the device to detect & measure the diameter of the pupil AND the iris in each frame. Note the image processing does NOT need doing in real-time. The end result will be a dataset describing the changes in pupil (& iris) size over time. It's expected that the iris size can be used to enhance confidence in the pupil diameter data (eg removing pupil size data that's wildly wrong), but also as a relative measure for how dilated the eye is at any point.
I am familiar with developing Android mobile apps, but my experience with image processing is very limited. I've researched solutions and it seems that the answer may lie with the OpenCV/JavaCv libraries, which should provide shape detection (eg http://opencvlover.blogspot.co.uk/2012/07/hough-circle-in-javacv.html) but can anyone provide guidance on these specific questions:
Am I right to think it can detect the two circle shapes within a bitmap, one inside the other? ie shapes inside each other is not a problem.
Is it true that JavaCv can detect a circle, and return a position & radius/diameter? ie it doesn't return a set of vertices that then require further processing to compare with a circle? It seems to have a HoughCircle method, so I think yes.
What processing of each frame is typically used before doing shape detection? For example an algorithm to enhance edges, smooth, or remove colour?
Can I use it to not just detect presence of, but measure the diameter of the detected circles? (in pixels, but then can easily be converted to real-world measurements because known hardware is being used). I think yes, but would be great to hear confirmation from those more familiar.
This project is a non-commercial charitable project, so any help especially appreciated.
I would really suggest using ndk as it is a bit richer in features. Also it allows you to run and test your algorithms on a laptop with images before pushing it to a device, speeding up development.
Pre-processing steps:
Typically one would use thresholding or canny edge detection and morphological operations like erode dilate.
For detection of iris / pupil, houghcircles is not a very good method, feature detection methods like MSER work better for not-so-well-defined circles. Here is another answer I wrote on the same topic which has code that could help.
If you are looking to measure the regions, I would suggest going through this blog. It has a clear explanation on the steps involved for a reasonably accurate measurement.
I'm very new to android. I'm trying to use the new Android Camera2 api to build a real time image processing application. My application requires to maintain a good FPS rate as well. Following some examples i managed to do the image processing inside the onImageAvailable(ImageReader reader) method available with ImageReader class. However by doing so, i can only manage to get a frame rate around 5-7 FPS.
I've seen that it is advised to use RenderScript for YUV processing with Android camera2 api. Will using RenderScript gain me higher FPS rates?
If so please can someone guide me on how to implement that, as i'm new to android i'm having a hard time grasping concepts of Allocation and RenderScript. Thanks in advance.
I don't know what type of image processing you want to perform. But in case that you are interested only in the intensity of the image (i.e. grayvalue information) you don't need any conversion of the YUV data array (e.g. into jpeg). For an image consisting of n pixels the intensity information is given by the first n bytes of the YUV data array. So, just cut those bytes out of the YUV data array:
byte[] intensity = new byte[width*height];
intensity = Arrays.copyOfRange(data, 0, width*height);
In theory, you can get the available fps ranges with this call:
characteristics.get(CameraCharacteristics.CONTROL_AE_AVAILABLE_TARGET_FPS_RANGES);
and set the desired fps range here:
mPreviewRequestBuilder.set(CaptureRequest.CONTROL_AE_TARGET_FPS_RANGE, bestFPSRange);
So in principle, you should choose a range with the same lower and upper bound, and that should keep your frame rate constant.
HOWEVER, on devices with a LEGACY profile, none of the devices I have tested have been able to achieve 30fps at 1080p (S5, Z3 Compact, Huawei Mate S, and HTC One M9). The only way I was able to achieve that was by using a device (LG G4) that turned out to have a FULL profile.
Renderscript will not buy you anything here if you are going to use it inside the onImageAvailable callback. It appears that getting the image at that point is the bottleneck on LEGACY devices since the new camera2 API simply wraps the old one, and is presumably creating so much overhead that the callback does not occur at 30fps anymore. So if Renderscript is to work, you would need to need to create a Surface and find another way of grabbing the frames off of it.
Here is the kicker though... if you move back the deprecated API, I would almost guarantee 30fps at whatever resolution you want. At least that is what I found on all of the devices I tested....