Short version: What is the best way to get data encoded in an MP3 (and ideally in an
AAC/Ogg/WMA) into a Java array or ByteBuffer that I can then
manipulate?
I'm putting together a program that has slowing down and speeding up
sound files as one of its features. This works fine for WAV files,
which are a header plus the exact binary data that needs to be sent to
the speaker, and now I need to implement it for MP3 (ideally, this
would also support AAC, Ogg, and WMA, but since those are less popular
formats this is not required). Android does not expose an interface
to decode the MP3 without playing it, so I need to create that
interface.
Three options present themselves, though I'm open to others:
1) Write my own decoder. I already have a functional frame detector
that I was hoping to use for option (3), and now should only need to
implement the Huffman decoding tables.
2) Use JLayer, or an equivalent Java library, to handle the
decoding. I'm not entirely clear on what the license ramifications
are here.
3) Connect to the libmedia library/MediaPlayerService. This is what
SoundPool does, and the amount of use of that service make me believe
that while it's officially unstable, that implementation isn't going
anywhere. This means writing JNI code to connect to the service, but
I'm finding that that's a deep rabbit hole. At the surface, I'm
having trouble with the sp<> template.
I did that with libmad and the NDK. JLayer is way to slow and the media framework is a moving target. You can find info and source code at http://apistudios.com/hosted/marzec/badlogic/wordpress/?p=231
I have not tried it, but mp3transform is LGPL.
Related
We have a product consisting of documents and media files that are encrypted for DRM protection. On the production side, we have a Python script that encrypts the files, and on the client side, an Android app that decrypts them. This means we need to have an encryption/decryption scheme that can work compatibly on both Python and Android platforms. I've settled on libsodium/NaCl because it is available on both platforms, is free and open source, and it's designed with high-level APIs that are supposed to provide "Expert selection of default primitives" (http://nacl.cr.yp.to/features.html), thus helping the developer get things configured right without having to be an expert in the details of cryptography parameters.
Based on that, I've been able to test successfully that data encrypted by Sodium on Python can be decrypted by Sodium on Android. There's a fair bit of learning time invested in Sodium, so I'd prefer not to have to change that, if at all possible.
However, when it comes to playing large DRM-protected videos on the Android side, I believe we need a solution that works for streaming, not just for decrypting a whole file into memory. Currently we are just reading the whole file into memory and decrypting it:
final byte[] plainBytes = secretBox.decrypt(nonce, cipherText);
Obviously that's not going to work well with large video files. If we were using javax.crypto.Cipher instead of Sodium, we could use it to implement a CipherInputStream (and use that to implement a exoplayer2.upstream.DataSource or something). But I'm having difficulty seeing how to use libsodium to implement a decryption stream.
The libsodium library I'm using does provide bindings to "stream" functions. But this meaning of "stream" seems to be a somewhat different thing from "streaming" in the sense of Java InputStream. Moreover, all those functions seem to be very specific to the low-level detailed parameters that up to this point, libsodium has not required me to be aware of. For example, chacha20, salsa20, xsalsa20, xchacha20poly1305, etc. Up to this point, I have no idea which of these algorithms is being used on either side; SecretBox just works.
So I guess the question I would like answered most is, how can libsodium be used in Android to provide seekable, streaming decryption? Do you know of any good example code?
Subquestions of that:
Admittedly, now that I look closer in the docs, I see that pynacl SecretBox uses XSalsa20 stream cipher. I wonder if I can count on that always being the case, since I'm supposed to be insulated from those details?
I think for media playing, you need more than just streaming, in the sense of being able to consume a small piece at a time, in sequence. For typical usage, you also need it to be seekable: the user wants to be able to skip back 5 seconds without having to wait for the player to reset to the beginning of the stream and process/decrypt the whole thing again up to 5 seconds ago.
Is it feasible that I could use javax.crypto.Cipher on the Android side, but configure it to be compatible with the encryption algorithm (XSalsa20) and its parameters from the PyNaCl SecretBox production process?
Update:
To clarify,
The issue of decryption key delivery is already solved to our satisfaction, so that is not what I'm asking help on here.
Our app is completely offline, so the streaming issues I mentioned have to do with loading and decrypting files from local storage, rather than waiting for downloads.
For video you might it easier to use existing mechanisms, as they will have already solved most of your issues.
For most video applications you will want to stream the video and play/seek as you go, rather than having to download the entire video, as you point out.
At this tine there are three major DRM's commonly used to encrypt and share keys between the server and the client: Widevine, PlayReady and FairPlay. All three will support the functionality you want for streamed videos. The disadvantage is that you will usually have to pay to use these DRM services.
You can also use HLS or DASH to streams the video, Adjustable Bit Rate or ABR streaming protocols (https://stackoverflow.com/a/42365034/334402).
These allow you also use less secure, but possibly adequate for your needs, key sharing mechanisms that essentially allow the key be shared in the clear while the content itself is still encrypted. These are both free and well supported:
HLS AES Encryption
DASH Cleasrkey Encryption
Have a look at these answers for examples of generating both streams: https://stackoverflow.com/a/45103073/334402, https://stackoverflow.com/a/46897097/334402
You can play back the streams using open source players like DASH.JS for browser and ExoPlayer for Android Native.
If you wanted more security but still wanted to avoid using a commercial DRM, you could also modify the above to configure the key on your player client directly rather than transiting it from server to client.
You then do have the risk that someone could hack or reverse engineer your client app to extract the key, but I think you will have this with your original approach anyway. The real value of DRM's systems is not the content encryption, which is essentially just AES, but the mechanisms they use to securely transport and store the keys. Ultimately, it is a question of cost and benefit - it sounds like your solution may work quite adequately with a custom configured key implementation.
As an aside, on the seeking question - most video formats are broken into groups of pictures or frames which can be decoded separately from the rest of the video before and afterwards, with the help of some header info. So you can decode at, or at least near, any given point without having to decode the entire video up to that point.
The thumbnails you see when you scroll or hover along the timeline on a player are generally actually a separate stream of still image snapshots or thumbnails at regular intervals in the video. This allows the player show the appropriate thumbnail as if it is showing the frame at that point in the video. If the user clicks to that point then the player requests that section of the video, if it does not already have it, decodes the relevant chunk and starts playing it.
I guess small audio clips are necessary for many applications, thus I would expect QT have support playing mp3 in memory slices. Maybe decode mp3 data to wav data in memory may be one solution, but that needs time to decode all data first. For real time application, it is not a good idea. It also doesn't make sense to store mp3_data in a file and ask QMediaPlayer to play that, the performance is unacceptable.
This is my code after many searches by google, including stackoverflow:
m_buffer.setBuffer(&mp3_data_in_memory);
m_player.setMedia(QMediaContent(), &m_buffer);
m_player.play();
where m_buffer is a QBuffer instance, and mp3_data_in_memory is a QByteArray one; m_player is a QMediaPlayer instance.
I got some information that the code here doesn't work in MacOS and iOS, but I am running on Android now.
Does anyone have a solution for Android system? Thanks a lot.
Your code won't work because the media property requires a valid QMediaContent instance:
Setting this property to a null QMediaContent will cause the player to
discard all information relating to the current media source and to
cease all I/O operations related to that media.
There's also no way of telling the QMediaPlayer what format the data is in, you're just dumping raw data on it. In principle QMediaResource can hold this information, but it requires a url and is regarded as null without it.
As you may have guessed, QMediaPlayer and the related classes are high-level constructs not designed for this sort of thing. You need to use a QAudioDecoder to actually decode the raw data, and pipe the output to a QAudioOutput to hear it.
Hello sages of the Overflowing Stack, Android noob here..
I'm using CSipSimple and want to stream the call audio to another app, in chunks of 1 second audio data so that it can process the raw pcm data.
The code that handles the audio in CSipSimple is native, so I prefer using native approaches and not callback Java.
I thought of a few ways of doing so:
Use audio streaming and let the other app get it.
Writing the data to a file and let the other app read it.
Calling a service in the other application (AIDL)
Using intents.
These are the considerations leading to my dillema:
Streaming looks like the natural choice, but I couldn't find Android support for retrieving raw pcm data from an an audio stream. The intent mechanism is flexible and convenient, but I don't think that that's what they're meant for. Using a file seems cumbersome, although it's well supported. Finally, using a service seems like a good option but it seems less flexible and probably needs more error handling and thread management.
Can you guys point out the best alternative?
If you have another one you're welcome to share it..
I do not know about the streaming audio API support so I'll not touch this case.
As for writing data to a file and let other application to read it - this is a possible case how to solve your problem.
As for calling service through AIDL and using intents, I do not think that this is a good solution. The problem is that Binder has a limitation over the size of the data (1MB) that can be passed in a transaction.
To my point of view, the best solution (especially if you're working in native) is to use AshMem. This is a shared memory driver developed specifically for Android. Thus, in your service you create a shared memory region and pass the reference to it into your client app that reads information from the this memory.
I need to implement a playback of separate audio files in N channels, files may play sequentially or in parallel. I need to implement it on Android.
Timeline:
|file a...|file d....|file b...|......|file k|....
|.....|file g|file c..|file p.|....
I'm thinking two options, one being FMOD to decompress files and play them simultaneously. I have researched and FMOD seems to fit well and much easier than manually playing this using an AudioTrack. However, I can't understand if FMOD would allow us to save the entire merged output without playing it through.
I know that using solution here we can redirect output to a wav file, but is it possible to just create a final output instantly and save it using FMOD? Or will I have to manually merge PCMS into one stream after all..
Thanks.
An important question here is why you need to save the files out, if it's possible to do this offline then it would be a lot simpler. If you must record the concatenation of several files (including others played in parallel), it is quite possible with FMOD.
One way would be to use wave-writer-nrt output mode, which allows you to output a wav file based on FMOD playsound calls in faster than realtime.
Another way is to use a custom DSP to access the data stream of any submix as it plays, useful if you want other sounds actually playing at the same time.
Another is simply create the sound objects, then use Sound::lock to access the PCM data, which you could concatenate yourself to a destination. Keep in mind all the sounds would need to be the same sample rate and channels, otherwise you would need to do processing. Also keep in mind you cannot do this for parallel sounds unless you want to mix the audio yourself.
I'm looking for some way in Android to play in-memory audio in a manner analogous to the waveOutOpen family of methods in Windows programming.
The waveOut... methods essentially let an application create arrays of sample values (like in-memory WAV files without the headers) and dump them into a queue for sequential playback. Windows transitions seamlessly from one array to the next, so as long as the application keeps dumping arrays into the queue ahead of playback, the program can create and play continuous audio of any arbitrary length. The Windows API also incorporates a callback mechanism that the application can use to indicate progress and load additional buffers.
As far as I can tell, the Android audio API lets an application play a file from local storage or a URL, or from a memory stream. Is there any way to get Android to "queue up" MediaPlayer.start() calls so that one player transitions (without glitches) into the next upon play completion? It appears that Jet does something like this, but only with its own internal synthesis engine.
Is there any other way of accessing Android audio in a waveOutOpen way?
android.media.AudioTrack
... is the class you are probably looking for.
http://developer.android.com/reference/android/media/AudioTrack.html#AudioTrack%28int,%20int,%20int,%20int,%20int,%20int%29
After creating it you simply feed it with binary data with given format using following method:
AudioTrack.writeTo(...)