Android encoder and file format for voice messages

Android encoder and file format for voice messages - android

I am developing an Android application that needs to send short (<60 second) voice messages to a server.
File size is very important because we don't want to eat up data plans. Sound quality is important to the point the message needs to be recognizable, but it should require significantly less bandwidth/quality than music files.
Which of the standard Android audio encoders (http://developer.android.com/reference/android/media/MediaRecorder.AudioEncoder.html) and file formats (http://developer.android.com/reference/android/media/MediaRecorder.OutputFormat.html) are likely to be best for this application?
Any hints on good starting places for bit rates, etc. would be welcome as well.
We need to ultimately be able to play them on Windows and iOS, but it's okay if that takes some back-end conversion. There doesn't seem to be an efficient cross-platform format/encoder so that's where we'll put in the work.

AMR is aimed precisely at speech compression, and is the codec most commonly used for normal circuit-switched voice calls.The narrow-band variant (AMR-NB, 8kHz sample rate) is still the most widely used and should be supported on pretty much any mobile phone you can find. The wide-band variant (AMR-WB, 16kHz sample rate) offers better quality and is preferred if the target device supports it and you can spare the bandwidth.
Typical bitrates for AMR ranges from around 6 to 14 kbit/s.
I'm not sure if there are any media players for Windows that handle .3GP files with AMR audio directly (VLC might). There are converters that can be used, though.
HE-AAC (v1) could also be used for speech encoding, however this page suggests that encoding support on Android is limited to Android 4.1 and above. Suitable rates might be 16 kHz / 64 kbps.

Related

Real time audio analysis Android

I've got a rather complicated problem that I need to solve at work. It's pretty far out of my remit of "Android App Developer" - I would class it as a very specialized audio engineering problem.
I am tasked with developing an application, which needs to be able to stream either a local audio file or audio from streaming service apps such as, but not limited to, Spotify, to another device over Bluetooth.
In addition, the app needs to be able to estimate the BPM of the streamed audio (it is assumed all audio will be musical) and use this BPM value to control the playback speed of a lighting sequence.
This question is about how to estimate the BPM of the streamed music.
For the case where the audio file is local, I can think of some solutions for this, such as hardcoding the BPM into the app, in a map against the audio resources URL.
I have also investigated and experimented with "static" library (aubio) than can estimate BPM from an audio file, but not on the fly. It assumes .wav format. This won't be sufficient for what we are trying to achieve here.
However, given the requirement for streaming external audio from streaming service apps such as Spotify, a static analysis solution is pointless as the solution wouldn't work for the streaming service case, and the streaming service case solution will work for both cases.
Therefore, I have come to the conclusion that somehow, I need to on the fly analyze the streamed audio, perhaps with FFT or peak detection algorithms.
This question isn't about the actual BPM estimation algorithm itself (or the implementation details of how I would get there) and is about the basic starting point of such a solution:
How might I go about getting A) the raw bytes of streamed audio for both the local file case and the external streaming service app case and B) how might I process these bytes into a data structure representing the audio stream in a way amenable to running audio analysis algorithms on it.
I realize this is very open ended, quite vague question, but this is so far out of my comfort zone I've no idea how to even formulate a more coherent question.
Any help would be greatly appreciated!

I'd start by creating some separate, more tightly defined questions for the different pieces. For example, ask how to get access to the raw bytes when streaming local file, or streaming URL-sourced audio. Android has some nice support for streaming, including the ability to stream PCM, so I'd be pretty surprised if getting a hook for access to the byte stream were not possible.
Once you have a hooking point, to convert the bytes to "something useful" I'd look at using the audio format to tell you how to read the incoming bytes. The format should tell you how many channels (mono or stereo), the encoding (e.g., signed PCM is common, might be normalized floats), the number of bits per value (16 is common) and the order of the bytes (big-endian vs little endian).
I know that there are posts that will explain how to convert the raw audio bytes to PCM values based on this info, including some on stackoverflow. They should be reachable via search. I think signed normalized floats is the most common data representation used for processing audio signals.

Opentok SDK making Android and iOS devices too hot

I am using Opentok SDK for video calling in IOS and Android devices with Nodejs server.
It is a group call scenario with max 4 people, when we stream for more than 10 min, both the devices getting too hot.
Does anyone have solution for this?
We can't degrade the video quality.

This is likely because you are using the default video code, VP8, which is not hardware accelerated. You can change the codec per publisher to either H.264 or VP8, but there are some trade-offs to this approach.
Their lack of H.264 SVC support is disappointing, but might be okay depending on your use case. If you read this whole post and still want more guidance, I'd recommend reaching out to their developer support team, and/or post more about your use case here.
Here's some more context from the OpenTok Documentation, but I recommend you read the whole page to understand where you need to make compromises:
The VP8 real-time video codec is a software codec. It can work well at lower bitrates and is a mature video codec in the context of WebRTC. As a software codec it can be instantiated as many times as is needed by the application within the limits of memory and CPU. The VP8 codec supports the OpenTok Scalable Video feature, which means it works well in large sessions with supported browsers and devices.
The H.264 real-time video codec is available in both hardware and software forms depending on the device. It is a relatively new codec in the context of WebRTC although it has a long history for streaming movies and video clips over the internet. Hardware codec support means that the core CPU of the device doesn’t have to work as hard to process the video, resulting in reduced CPU load. The number of hardware instances is device-dependent with iOS having the best support. Given that H.264 is a new codec for WebRTC and each device may have a different implementation, the quality can vary. As such, H.264 may not perform as well at lower bit-rates when compared to VP8. H.264 is not well suited to large sessions since it does not support the OpenTok Scalable Video feature.

what audio format is natively supported in all platforms, both for recording and playing back?

We're creating a range of apps that record user's voice for a wide range of applications. Users can register their ideas, or describe a scene, or give educational tips and notes to someone else.
We need to choose a file format that satisfies these conditions:
Better to be playable natively in Android, iOS and web
Better to reduce the cost of encoding-decoding
Better to reduce the cost of development (we're not sound experts)
Storage is not a big deal, so compression is not important, but network traffic IS a big deal, so for that reason better to be as compact as possible
The most obvious choice coming to mind is MP3, but to our surprise, MP3 encoding is not supported in Android Studio out of the box.
We searched and tried to find best practices for this, and again, to our surprise there is not much written in spite of huge usage of sounds and voices everywhere.
For example, in this post it's written that MP3 is the most used file format, and then ACC. But we're totally stranger with AAC.
So, what audio format is natively supported in all medias, both for recording and playing back?

The file format can be .AAC, is compressed, compatible with iOS, Web and Android (3.1 or higher) and is developed by Nokia and Sony (This last one is extra information).
You can see at wikipedia all its compatible OS: AAC Wikipedia
It is compatible with WebOS.

How to reduce mp3 streaming traffic comsumption for mobile use?

I want to be able to send an audio stream to Android/IOS devices.
The current encoding for the stream is mp3 128 kbps. If i'd send this over the network it will take huge amount of mobile data.
I was thinking of compressing the data with gzip but i think that would make no difference as mp3 is already a reduced file.
Is there any way to reduce the size of the stream and play it on the mobile device?
Thanks,
Dan

First off, your math is ignoring a key unit. Your MP3 stream is 128 kilobits (note the bits) per second. This comes out to be a little under 60 megabytes per hour after you factor in a little bit of overhead and metadata.
Now, as Mark said you can use a different bitrate and/or codec. For most mobile streams, I choose either a 64kbit or 96kbit stream, and then either MP3 or AAC depending on compatibility. AAC does compress a bit better, providing a better sounding stream at those low bitrates, but you will still need an MP3 stream for some devices.
Also note that you should not assume your users are using the mobile network on their mobile devices. Give your users a choice of which stream to use. Some have unlimited data and great coverage. Others use WiFi all the time.

All you can do is re-compress to a lower bit rate and use a different compression method, e.g. AAC. An AAC should sound better at the same bit rate.

How to decode MP3 in Android within app?

I'm currently working on an app that lets the user choose an MP3 audio file. The file is then processed by my app.
For this processing, the application would need to decode audio files to get the raw PCM output.
To decode MP3, I have two options:
Use the Android system to decode MP3 and get the PCM data.
Decode the MP3 myself on the phone, WITHOUT paying MP3 licensing fees.
My question is whether #1 is technically possible? And for #2, whether the MP3 license on the phone covers an app as well?

To my knowledge, there is no Android-provided way to decode MP3s.
I've used JLayer in the past, and can recommend it for MP3 processing. Using the NDK with a c++ library might be faster, but if you're looking to keep it Java, that's what I'd use. It's still faster than real-time, roughly 30 seconds to decode all frames in an average bitrate 3 minute MP3. That's with an Galaxy S(1GHz), so any newer phones are faster.
As far as licensing goes, I can't help you there. JLayer itself is LGPL, but the world of MP3 licensing is murkier than used motor oil. After a few days of searching for a concrete answer, I just gave up and did it. The world at large seems divided on who even holds the license in the first place.

the Android system can decode mp3 file now, see here it describes the media codec, container, and network protocol support provided by the Android platform.
The MedieCodec is a very powful framework to encode and decode media file.

Option 1 is definitely not possible (unless you want to target ICS+ devices and are willing to write native C code to decode MP3s with OpenSL). Geobits recommendation of jLayer is a good one. For the most part, dealing with jLayer is a breeze. Here's a good blog post that will help: http://mindtherobot.com/blog/624/android-audio-play-an-mp3-file-on-an-audiotrack/

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.