I'm trying to capture images with 30 seconds exposure times in my app (I know it's possible since the stock camera allows it).
But SENSOR_INFO_EXPOSURE_TIME_RANGE (which it's supposed to be in nanoseconds) gives me the range :
13272 - 869661901
in seconds it would be just
0.000013272 - 0.869661901
Which obviously is less than a second.
How can I use longer exposure times?
Thanks in advance!.
The answer to your question:
You can't. You checked exactly the right information and interpreted it correctly. Any value you set for the exposure time longer than that will be clipped to that max amount.
The answer you want:
You can still get what you want, though, by faking it. You want 30 continuous seconds' worth of photons falling on the sensor, which you can't get. But you can get something (virtually) indistinguishable from it by accumulating 30 seconds' worth of photons with tiny missing intervals interspersed.
At a high level, what you need to do is create a List of CaptureRequests and pass it to CameraCaptureSession.captureBurst(...). This will take the shots with as minimal an interstitial time as possible. When each frame of image data is available, pass it to some new buffer somewhere and accumulate the information (simple point-wise addition). This is probably most properly done with an Allocation as the output Surface and some RenderScript.
Notes on data format:
The right way to do this is to use the RAW_SENSOR output format if you can. That way the accumulated output really is directly proportional to the light that was incident to the sensor over the whole 30s.
If you can't use that, for some reason, I would recommend using YUV_420_888 output, and make sure you set the tone map curve to be linear (unfortunately you have to do this manually by creating a curve with two points). Otherwise the non-linearity introduced will ruin our scheme. (Although I'm not sure simple addition is exactly right in a linear YUV space, but it's a first approach at least.) Whether you use this approach or RAW_SENSOR, you'll probably want to apply your own gamma curve/tone map after accumulation to make it "look right."
For the love of Pete don't use JPEG output, for many reasons, not the least of which is that this will most likely add a LOT of interstitial time between exposures, thereby ruining our approximation of 30s on continuous exposure.
Note on exposure equivalence:
This will produce almost exactly the exposure you want, but not quite. It differs in two ways.
There will be small missing periods of photon information in the middle of this chunk of exposure time. But on the time scale you are talking about (30s), missing a few milliseconds of light here and there is trivial.
The image will be slightly nosier than if you had taken a true single exposure of 30s. This is because each time you read out the pixel values from the actual sensor, a little electronic noise gets added to the information. So in the end you'll have 35 times as much of this additive noise (from the 35 exposures for your specific problem) as a single exposure would. There's no way around this, sorry, but it might not even be noticeable- this is usually fairly small relative to the meaningful photographic signal. It depends on the camera sensor quality (and ISO, but I imagine for this application you need that to be high.)
(Bonus!) This exposure will actually be superior in one way: Areas that might have been saturated (pure white) in a 30s exposure will still retain definition in these far shorter exposures, so you're basically guaranteed not to lose your high end details. :-)
You can't always trust SENSOR_INFO_EXPOSURE_TIME_RANGE as of May 2017. Try manually increasing the time and see what happens. I know my Pixel will actually take a 1.9 sec shot but SENSOR_INFO_EXPOSURE_TIME_RANGE has a value in the sub second range.
Related
Context
I'm building an app which performs real-time object detection throught the camera module of the device. The render is like the image below.
Let's say I try to recognize an apple, most of the time the app will recognize an apple. However, sometimes, the app will recognize the wrong fruit (let's say a lemon) on a few camera frames.
Goal
As the recognition of a fruit triggers an action in my code, my goal is to programmatically prevent a brief wrong recognition to trigger an action, and only take into account the majority result.
What I've tried
I tried this way : if the same fruit is recognized several frames in a row, I assumed the result is supposed to be the right one. But as my device process image recognition several times per second, even a wrong guess can be recognized several times in a row, and leads to the wrong action.
Question
Is there any known techniques for avoiding this behavior ?
I feel like you've already answered your own question. In general the interpretation of a model's inference is it's own tuning step. You know for example in logistic regression tasks that the threshold does NOT have to be 0.5. In fact, it's quite common to flex the threshold to see what the recall and precision are at various thresholds, and you can pick a threshold that works given your business/product problem. (Fraud detection might favor high recall if you never want to miss any fraud... or high precision if you don't want to annoy users with lots of false positives).
In video this broad concept is extended to multiple frames as you know. You now have the tune the hyperparameters, "how many frames total?" and "how many frames voting [apple]"?
If you are analyzing fruit going down a conveyer belt one by one, and you know each piece of fruit will be in frame for X seconds and you are shooting at 60 fps, maybe you want 60 * X frames. And maybe you want 90% of the frames to agree.
You'll want to visualize how often your detector "flips" detections so you can make a business/product judgement call on what your threshold ought to be.
This answer hasn't been very helpful in giving you a bright line rule here, but I hope it's helpful in suggesting that there is in fact NO bright line rule. You have to understand the problem to set the key hyperparameters:
For each frame, is top-1 acc sufficient, or do I need [.75] or higher confidence?
How many frames get to vote? Say [100].
How many correlated votes are necessary to trigger an actual signal? maybe it's [85].
The above algo assumes you take a hardmax after step 1. another option would be to just average all 100 frames and pick a threshold. that's kind of a soft label version of the above algo.
Goal
Capture images with Android smartphones attached to moving vehicles
frequency: 1 Hz
reference model: Google Pixel 3a
objects of interest: the road/way in front of the vehicle
picture usage: as input for machine learning (like RNN) to identify damages on the road/way surfaces
capture environment: outdoor, only on cloudy days
Current state
Capturing works (currently using JPEG instead of RAW because of the data size)
auto-exposure works
static focus distance works
Challenge
The surface of the ways/roads in the pictures are often blurry
The source of the motion blur is mostly from the shaking vehicle/fixed phone
To reduce the motion blur we want to use a "Shutter Speed Priority Mode"
i.e. minimize shutter speed => increase ISO (accept increase noise)
there is only one aperture (f/1.8) available
there is no "Shutter Speed Priority Mode" (short: Tv/S-Mode) available in the Camera2 API
the CameraX API does not (yet) offer what we need (static focus, Tv/S Mode)
Steps
Set the shutter speed to the fastest exposure supported (easy)
Automatically adjust ISO setting for auto-exposure (e.g. this formular)
To calculate the ISO the only missing part is the light level (EV)
Question
How can I estimate the EV continuously during capturing to adjust the ISO automatically while using a fixed shutter speed?
Ideas so far:
If I could read out the "recommendations" from the Camera2 auto exposure (AE) routine without actually enabling AE_MODE_ON then I could easily calculate the EV. However, I did not find an API for this so far. I guess it's not possible without routing the device.
If the ambient light sensor would provide all information needed to auto-expose (calculate EV) this would also be very easy. However, from my understanding it only measures the incident light not the reflected light so the measurement does not take the actual objects in the pictures into account (how their surfaces reflect light)
If I could get the information from the Pixels of the last captures this would also be doable (if the calculation time fits into the time between two captures). However, from my unterstanding the pixel "bightness" is heavily dependent on the objects captured, i.e. if the brightness of the objects captured changes (many "black horses" or "white bears" at the side of the road/way) I would calculate bad EV values.
Capture auto-exposed images in-between the actual captures and calculate the light levels from the auto-selected settings used in the in-between captures for the actual captures. This would be a relatively "good" way from my understanding but it's quite hard on the resources end - I am not sure I the time available between two captures is enough for this.
Maybe I don't see a simpler solution. Has anyone done something like this?
Yes, you need to implement your own auto-exposure algorithm.
All the 'real' AE has to go by is the image captured by the sensor as well, so in theory you can build something just as good at guessing the right light level.
In practice, it's unlikely you can match it, both because you have a longer feedback loop (the AE algorithm can cheat a bit on synchronization requirements and update sensor settings faster than an application can), and because the AE algorithm can use hardware statistics units (collect histograms and average values across the scene), which make it more efficient.
But a simple auto-exposure algorithm would be to average the whole scene (or a section of the scene, or every-tenth-pixel of the scene, etc) and if that average is below half max value, increase ISO, and if it's above, reduce. A basic feedback control loop, in other words. With all the issues about stability, convergence, etc, that apply. So a bit of control theory understanding can be quite helpful here. I'd recommend a low-resolution YUV output (640x480 maybe?) to an ImageReader from the camera to use as the source data, and then just look at the Y channel. Not a ton of data to churn through in that case.
Or as hb0 mentioned, if you have a very limited set of outdoor conditions, you can try to hardcode values for each of them. But the range of outdoor brightness can be quite large, so this would require a decent bit of testing to make sure it'll work, plus manual selection of the right values each time.
When the pictures are only captured in specific light situations like "outdoor, cloudy":
Tabulated values can be used for the exposure value (EV) instead of using light measurements.
Example
EV100 (iso100) for Outdoor cloudy (OC) = 13
EV (dynamic iso) for OC = EV100 + log2(iso/100)
Using this formula together with those formulas we can calculate the iso from:
aperture (fixed)
shutter speed (manually selected)
Additionally, we could add an UI option to choose a "light situation" like:
outdoor, cloudy
outdoor, sunny
etc.
This is probably not the most accurate way but for now a first, easy way to continue prototyping.
there is a MAX_LENGTH value that we can set.
that value limits the recording approximately.. sometimes it can get up to double.
so lets say i set the value of MAX_LENGTH to 15 then launch the app and start recording sometimes it exceed that limitation and get to 30!
that limitation mechanism is broken..
there is a way to hard limit it so it won't exceed MAX_LENGTH ever?
thanks
First
Grafika is experimental and has no support. From the README file:
It's not stable.
It's not polished or well tested. Expect the UI to be ugly and awkward.
It's not intended as a demonstration of the proper way to do things. The code may handle edge cases poorly or not at all. Logging is often left enabled at a moderately verbose level.
It's barely documented.
It's generally just not supported.
Second
The code comment itself specifies that the FPS can be variable:
Opens a camera, and attempts to establish preview mode at the specified width and height.
Sets mCameraPreviewFps to the expected frame rate (which might actually be variable).
Source Code: https://github.com/google/grafika/blob/master/src/com/android/grafika/ContinuousCaptureActivity.java
So I'm trying to build an android app which acts as a real time audio analyzer as a precursor to a project that will involve detecting and filtering out certain sounds.
So I think I've got the basics of discrete Fourier transforms down, however I'm not sure what the best parameters should be for doing real time frequency analysis.
I get the impression that under ideal situations (unlimited computing power), I would take all the samples from the 44100 sample/sec PCM stream I'm getting from the AudioRecord class and put them through a 44100 element fifo "window" (padded to 2**16 with 0's and maybe a tapering function?) , running an FFT on the window every time a new sample came in. This would (I think), give me the spectrum for 0 - ~22 KHz updated 44100 times per second.
It seems like this is not going to happen on a smartphone. The thing is, I'm not sure which parameters of the computation I should reduce in order to make in order to make it tractable on my Galaxy Nexus while still holding on to as much quality as possible. Eventually I would like to be using an external microphone with better sensitivity.
I figure it will involve moving the window more than one sample between taking FFT's, but I have no idea at what point this becomes more detrimental to accuracy/aliasing/whatever than just doing the FFT on a smaller window, or if there is a third option I'm overlooking.
With the natively implemented KissFFT I'm using from libgdx, I seem to be able to do somewhere between 30-42 44100 element FFT's per 44100 samples and still have it be responsive (meaning that the buffer getting filled from the thread doing AudioRecord.read() isn't filling up faster than the thread doing the fft's can drain it).
So my questions are:
Could the performance I'm currently getting just be the best I'm going to get? Or does it seem like I must be something stupid because much faster speeds are possible?
Is my approach to this at least fundamentally correct or am I barking entirely up the wrong tree?
I'd be happy to show any of my code if that would help answer my questions, but there's a lot of it so I figured I would do so selectively instead of posting it all.
if there is a third option I'm overlooking
Yes: doing both at the same time, a reduction of the FFT size as well as a larger step size. In a comment you pointed out that you want to detect "sniffling/chewing with mouth". So, what you want to do is similar to the typical task of speech recognition. There, you typically extract a feature vector in steps of 10ms (meaning with Fs=44.1kHz every 441 samples) and the signal window to transform is roughly about double the size of the step size, so 20ms which yields to a 2^X FFT size of 1024 samples (make sure that you choose an FFT size which is a power of 2, because it is faster).
Any increase in window size or reduction in step size increases the data but mainly adds redundancy.
Additional hints:
#SztupY correctly pointed out that you need to "window" your signal prior to the FFT, typically with a Hamming-wondow. (But this is not "filtering". It is just multiplying each sample value with the corresponding window value without accumulating the result).
The raw FFT output is hardly suited to recognize "sniffling/chewing with mouth", a classical recognizer consists of HMMs or ANNs which process sequences of MFCCs and their deltas.
Could the performance I'm currently getting just be the best I'm going to get? Or does it seem like I must be something stupid because much faster speeds are possible?
It's close to the best, but you are wasting all the CPU power to estimate highly redundant data, leaving no CPU power to the recognizer.
Is my approach to this at least fundamentally correct or am I barking entirely up the wrong tree?
After considering my answer you might re-think your approach.
I am working an a bike computer app. I was hoping to work out the inclination of the slope using the accelerometer but things are not working too well.
I have put in test code getting the sensor data I am just smapeling at the UI rate and keeping a moving average over 128 samples which is about 6 seconds worth. With the phone in hand the data is good and I can calculate a good angle compared to my calibration flat vector.
With the phone mounted on the bike things are not at all good. I expect to get a good bit of noise but I was hoping that the large number of samples over the big time window would remove the vibration effects and general bike movements. Unfortunately this just is not working, the magnitude of the acceleration vector is not really staying around the 9.8 mark but is dropping lower which indicates to me that something is not right somewhere.
Here is a plot of the data from part of a test ride.
As you can see when stationary at the start the magnitude is OK but once I get going it drops. I am fairly sure the problem is vibration related I initially descend and there was heavy vibration I then climb and the vibration is less and the magnitude gets back towards 9.8 but then I drop down quickly on a bad road and the magnitude ends up less than 3.
This is with a SonyErricson Xperia Active which uses a BMA250 sensor the datasheat looks like the sensor should be capable. My only theory for the cause of the problem is that the range is set to the 2g range and the vibration is causing data to go out of range and this is causing my problems.
Has anyone seen anything like this?
Has anyone got any ideas on the cause of the problem?
Is there any way to change the sensitivity that I have not found?
Additional information.
OK I logged the raw sensor data before my filtering. A very small portion presented here
The major axis is in green and on the flat as I belive this should be without the vibration it should be about 8.5. There is no obvious clamping on the data but I get more below 8.5 values than above 8.5 values. Even if the sensor is set up for it's most sensative 2g range it looks like the vibration shgould be OK I have a max value here of just over 15 and a minimum of -10 well ib a +- 20 ragnge just not centered correctly on the 8.5 it should be.
I will dig out my other phone which looks to have a slightly different sensor a BMA150 and try with that but unless it is perfect I think I will have to give up on the idea.
I suspect the accelerometer is not linear over such large G ranges. If so, and if there is any asymmetry, it will do what you see.
The solution for that is to pad the accelerometer mount a bit more, foam rubber, bungy-cord, whatever, possibly mount it on a heavier stage to filter the vibration more.
Or (not a good solution) try to model the error and compensate for it.
I used the same handset and by coincidence the same averaging interval of 6 seconds for an application a few years ago and I don't recall seeing the behaviour in the graph.
I'm wondering whether the issue is in the way the 6 second averages are being accumulated. One problem I had is that the sampling interval was not constant but depends on how busy the processor is. A sample is acquired in the specified time but the calling of the event handler depends on the scheduler. When the processor is unloaded sampling occurs at a constant frequency but as the processor works harder the sampling frequency becomes slower and more erratic. You can write your app to keep processor load low while sampling. What we did is sample for 6 seconds, doing nothing else, then stop sampling and process the last sample set but this was only partially successful as you can't control other apps running at the same time and the scheduler is sharing processor resources across them all. On the Xperia Active I found it can occasionally go out to seconds between samples which I attributed to garbage collection in one of the JVMs. The solution for us was to time stamp each sample then run some quality checks over a sample set and discard those that failed the quality check. This is a poor solution as defining what is good enough is imprecise and when the user runs another app that uses a lot of resources most sample sets can be discarded so the app needs additional logic to handle that.
The current Android API, unavailable on the Xperia Active, should have eliminated this as samples can be batched as described at https://source.android.com/devices/sensors/hal-interface.html#batch_sensor_flags_sampling_period_maximum_report_latency .
If the algorithm assumed a particular number of samples rather than counting them and the processor worked harder as the bike went faster, though I'm not sure why it would, it would produce something like the first graph because when the bike is going downhill magnitude goes down and when going up hill it goes up. There is a lot of speculation there but a 6 second average giving a magnitude of less than 3 m/s^2 looks implausible from my experience with this sensor.