I've got a rather complicated problem that I need to solve at work. It's pretty far out of my remit of "Android App Developer" - I would class it as a very specialized audio engineering problem.
I am tasked with developing an application, which needs to be able to stream either a local audio file or audio from streaming service apps such as, but not limited to, Spotify, to another device over Bluetooth.
In addition, the app needs to be able to estimate the BPM of the streamed audio (it is assumed all audio will be musical) and use this BPM value to control the playback speed of a lighting sequence.
This question is about how to estimate the BPM of the streamed music.
For the case where the audio file is local, I can think of some solutions for this, such as hardcoding the BPM into the app, in a map against the audio resources URL.
I have also investigated and experimented with "static" library (aubio) than can estimate BPM from an audio file, but not on the fly. It assumes .wav format. This won't be sufficient for what we are trying to achieve here.
However, given the requirement for streaming external audio from streaming service apps such as Spotify, a static analysis solution is pointless as the solution wouldn't work for the streaming service case, and the streaming service case solution will work for both cases.
Therefore, I have come to the conclusion that somehow, I need to on the fly analyze the streamed audio, perhaps with FFT or peak detection algorithms.
This question isn't about the actual BPM estimation algorithm itself (or the implementation details of how I would get there) and is about the basic starting point of such a solution:
How might I go about getting A) the raw bytes of streamed audio for both the local file case and the external streaming service app case and B) how might I process these bytes into a data structure representing the audio stream in a way amenable to running audio analysis algorithms on it.
I realize this is very open ended, quite vague question, but this is so far out of my comfort zone I've no idea how to even formulate a more coherent question.
Any help would be greatly appreciated!
I'd start by creating some separate, more tightly defined questions for the different pieces. For example, ask how to get access to the raw bytes when streaming local file, or streaming URL-sourced audio. Android has some nice support for streaming, including the ability to stream PCM, so I'd be pretty surprised if getting a hook for access to the byte stream were not possible.
Once you have a hooking point, to convert the bytes to "something useful" I'd look at using the audio format to tell you how to read the incoming bytes. The format should tell you how many channels (mono or stereo), the encoding (e.g., signed PCM is common, might be normalized floats), the number of bits per value (16 is common) and the order of the bytes (big-endian vs little endian).
I know that there are posts that will explain how to convert the raw audio bytes to PCM values based on this info, including some on stackoverflow. They should be reachable via search. I think signed normalized floats is the most common data representation used for processing audio signals.
Related
How streaming apps like Youtube, Hotstar or any other video player app, programmatically detects if network is getting slow over run-time and based on that they change video quality based on changes in network speed?
Many streaming services nowadays use HTTP-based streaming protocols. But there are exceptions; especially with low-latency streaming; e.g. WebRTC or Websocket-based solutions.
Assuming that you're using a HTTP-based protocol like HLS or MPEG-DASH, the "stream" is a long chain of video segments that are downloaded one after another. A video segment is a file in "TS" or "MP4" format (in some MP4 cases, video and audio are splitted into separate files); typically a segment has 2 or 6 or 10 seconds of audio and/or video.
Based on the playlist or manifest (or sometimes simply from decoding the segment), the player knows how many seconds of a single segment contains. It also knows how long it took to download that segment. You can measure the available bandwidth by diving the (average) size of a video segment file by the (average) time it took to download.
At the moment that it takes more time to download a segment than to play it, you know that the player will stall as soon as the buffer is empty; stalling is generally referred to as "buffering". Adaptive Bitrate (aka. ABR) is a technique that tries to prevent buffering; see https://en.wikipedia.org/wiki/Adaptive_bitrate_streaming (or Google for the expression) - when the player notices that the available bandwidth is lower than the bit rate of the video stream, it can switch to another version of the same stream that has a lower bit rate (typically achieved by higher compression and/or lower resolution - which results in less quality, but that's better than buffering)
PS #1: WebRTC and Websocket-based streaming solutions cannot use this measuring trick and must implement other solutions
PS #2: New/upcoming variants of HLS (eg. LL-HLS and LHLS) and MPEG-DASH use other HTTP technologies (like chunked-transfer or HTTP PUSH) to achieve lower latency - these typically do not work well with the mentioned measuring technique and use different techniques which I consider outside scope here.
You have to use a streaming server in order to do that. Wowza server is one of them (not free). The client and server will exchange information about the connexion and distribute chuncks of the video, depending on the network speed.
I have a cross-platform(iOS and Android) app where I will record audio clips then send it to the server to do some machine learning operations. In my iOS app, I use AVAudioRecorder for recording the audio. In the Android app, I use MediaRecorder for recording the audio. In the mobile initially, I use m4a format because of size constrictions. After reaching the server I will convert it to wav format before using it in the ML operations.
My Problem is, in iOS the AVAudioRecorder by OS default does a factor of Amplification to the raw audio data before we the developer get access to the raw data. But in Android, the MediaRecorder doesn't provide any sort of default Amplification to the raw data. In other words, in iOS I will never get the raw audio stream from the microphone whereas in Android I will always only get the raw audio stream from the microphone. The distinction is clearly visible if you can record the same audio in both iPhone and Android phones side by side with a common audio source, then import the recorded audio in Audacity for visual representation. I have attached a sample representation screenshot below.
In the image, the first track is the Android recording and the second track is from the iOS recording. When I hear both the audio through headphones I can vaguely distinguish them but when I visualize the data points, you can clearly see the difference in the image. These distinctions are bad for ML operations.
Clearly in the iPhone, there is a certain amplification factor involved which I would like to implement in the Android also.
Is anyone aware of the amplification factor? OR are there any other possible alternatives?
It's quite possible that the difference is that the effect of Automatic Gain Control.
You can disable this in your app's AVAudioSession by setting its mode to AVAudioSessionModeMeasurement which you do once in your application - usually at startup. This disables a great deal of input signal processing.
Reading your problem description, you might be better off enabling AGC on Android.
If neither of these yields results, you might want to gain scale both signals so they are just below clipping.
let audioSession = AVAudioSession.sharedInstance()
audio.session.setMode(AVAudioSessionModeMeasurement)
I want to write an app on Android to record snoring sounds of a sleeper and analyze it afterwards (i.e., not in real-time) for signs of a medical condition called obstructive sleep apnea.
The Android devices I've experimented with have voice recorders that produce a file format called .3ga. I want to programmatically read in the audio file and look at the amplitude for each individual time-sample. Then I can analyze that for patterns. Would this be easier if I converted this to a different format, e.g., MP3, and if so how can I do that programmatically?
I did a Google search on this and most of the hits seemed to be related to audio recording or playback which are unrelated to what I'm trying to do. I haven't coded anything yet because I don't know how to get started.
You are looking to do sample-based analysis on a raw audio signal, but the formats you mention are compressed. You will need to either deal with raw samples directly, or decompress the audio and then analyze.
Since you said you can do this work after-the-fact, why not upload to a server and analyze there?
I'm currently working on an app that lets the user choose an MP3 audio file. The file is then processed by my app.
For this processing, the application would need to decode audio files to get the raw PCM output.
To decode MP3, I have two options:
Use the Android system to decode MP3 and get the PCM data.
Decode the MP3 myself on the phone, WITHOUT paying MP3 licensing fees.
My question is whether #1 is technically possible? And for #2, whether the MP3 license on the phone covers an app as well?
To my knowledge, there is no Android-provided way to decode MP3s.
I've used JLayer in the past, and can recommend it for MP3 processing. Using the NDK with a c++ library might be faster, but if you're looking to keep it Java, that's what I'd use. It's still faster than real-time, roughly 30 seconds to decode all frames in an average bitrate 3 minute MP3. That's with an Galaxy S(1GHz), so any newer phones are faster.
As far as licensing goes, I can't help you there. JLayer itself is LGPL, but the world of MP3 licensing is murkier than used motor oil. After a few days of searching for a concrete answer, I just gave up and did it. The world at large seems divided on who even holds the license in the first place.
the Android system can decode mp3 file now, see here it describes the media codec, container, and network protocol support provided by the Android platform.
The MedieCodec is a very powful framework to encode and decode media file.
Option 1 is definitely not possible (unless you want to target ICS+ devices and are willing to write native C code to decode MP3s with OpenSL). Geobits recommendation of jLayer is a good one. For the most part, dealing with jLayer is a breeze. Here's a good blog post that will help: http://mindtherobot.com/blog/624/android-audio-play-an-mp3-file-on-an-audiotrack/
I'm looking for a way to programmatically save an array of shorts as PCM data. I know that this should be possible, but I haven't found a very easy way to do this on Android.
Essentially, I'm taking voltage data, and I want to save it in PCM format. My function looks something like this:
public void audifySignal(short[] signal) {
// Create a WAV file from the incoming signal
}
Any suggestions would be awesome, or even references. Seems like the audio APIs built in to android are more geared for directly recording from the mic, and not so much for lower level signal processing type work (at least for saving raw data to a file). I'd also like to avoid having to manually write the PCM file headers and what not...
Thanks!
Sam, I dunno about Android-specific libraries, but I'll go ahead and say this:
Raw PCM data is pretty straight forward. It's generally just sequential data. Maybe you need to understand the WAV format in order to understand what PCM is and how it works.
WAV is fairly widely used as a container for uncompressed audio. Gaining an understanding of how the WAV file contains the data will cast a fair bit of light on how raw digital audio works in general.
This page helped me a fair bit:
http://www.sonicspot.com/guide/wavefiles.html
Interestingly you can more or less fire ANY data at a sound-card and it'll play it. It'll probably sound crazy to us humans as the sound card doesn't care about whether it sounds garbled or not.
Whether it sounds pleasing to the ear or not will depend upon whether you've provided the correct sample size, number of channels, frequency and some PCM data that conforms to the former.
See you can't "detect" the sample size, the number of channels or the correct frequency from the raw PCM data itself. You have to store this crucial data ALONG with the PCM data so that other pieces of software can let the sound-card know how to handle your PCM data.
That's where the WAV container format comes in.
There are other formats but WAV is pretty commonplace and it's therefore a good place to start.
Cheers
Tristen
You can use Android's AudioTrack to write raw PCM data that you want to get played, but it's not a function to generate the wav file or so.