use ffmpeg api to convert audio files. crash on avcodec_encode_audio2 - android

From the examples I got the basic idea of this code.
However I am not sure, what I am missing, as muxing.c demuxing.c and decoding_encoding.c
all use different approaches.
The process of converting an audio file to another file should go roughly like this:
inputfile -demux-> audiostream -read-> inPackets -decode2frames->
frames
-encode2packets-> outPackets -write-> audiostream -mux-> outputfile
However I found the following comment in demuxing.c:
/* Write the raw audio data samples of the first plane. This works
* fine for packed formats (e.g. AV_SAMPLE_FMT_S16). However,
* most audio decoders output planar audio, which uses a separate
* plane of audio samples for each channel (e.g. AV_SAMPLE_FMT_S16P).
* In other words, this code will write only the first audio channel
* in these cases.
* You should use libswresample or libavfilter to convert the frame
* to packed data. */
My questions about this are:
Can I expect a frame that was retrieved by calling one of the decoder functions, f.e.
avcodec_decode_audio4 to hold suitable values to directly put it into an encoder or is
the resampling step mentioned in the comment mandatory?
Am I taking the right approach? ffmpeg is very asymmetric, i.e. if there is a function
open_file_for_input there might not be a function open_file_for_output. Also there are different versions of many functions (avcodec_decode_audio[1-4]) and different naming
schemes, so it's very hard to tell, if the general approach is right, or actually an
ugly mixture of techniques that where used at different version bumps of ffmpeg.
ffmpeg uses a lot of specific terms, like 'planar sampling' or 'packed format' and I am having a hard time, finding definitions for these terms. Is it possible to write working code, without deep knowledge of audio?
Here is my code so far that right now crashes at avcodec_encode_audio2
and I don't know why.
int Java_com_fscz_ffmpeg_Audio_convert(JNIEnv * env, jobject this, jstring jformat, jstring jcodec, jstring jsource, jstring jdest) {
jboolean isCopy;
jclass configClass = (*env)->FindClass(env, "com.fscz.ffmpeg.Config");
jfieldID fid = (*env)->GetStaticFieldID(env, configClass, "ffmpeg_logging", "I");
logging = (*env)->GetStaticIntField(env, configClass, fid);
/// open input
const char* sourceFile = (*env)->GetStringUTFChars(env, jsource, &isCopy);
AVFormatContext* pInputCtx;
AVStream* pInputStream;
open_input(sourceFile, &pInputCtx, &pInputStream);
// open output
const char* destFile = (*env)->GetStringUTFChars(env, jdest, &isCopy);
const char* cformat = (*env)->GetStringUTFChars(env, jformat, &isCopy);
const char* ccodec = (*env)->GetStringUTFChars(env, jcodec, &isCopy);
AVFormatContext* pOutputCtx;
AVOutputFormat* pOutputFmt;
AVStream* pOutputStream;
open_output(cformat, ccodec, destFile, &pOutputCtx, &pOutputFmt, &pOutputStream);
/// decode/encode
error = avformat_write_header(pOutputCtx, NULL);
DIE_IF_LESS_ZERO(error, "error writing output stream header to file: %s, error: %s", destFile, e2s(error));
AVFrame* frame = avcodec_alloc_frame();
DIE_IF_UNDEFINED(frame, "Could not allocate audio frame");
frame->pts = 0;
LOGI("allocate packet");
AVPacket pktIn;
AVPacket pktOut;
LOGI("done");
int got_frame, got_packet, len, frame_count = 0;
int64_t processed_time = 0, duration = pInputStream->duration;
while (av_read_frame(pInputCtx, &pktIn) >= 0) {
do {
len = avcodec_decode_audio4(pInputStream->codec, frame, &got_frame, &pktIn);
DIE_IF_LESS_ZERO(len, "Error decoding frame: %s", e2s(len));
if (len < 0) break;
len = FFMIN(len, pktIn.size);
size_t unpadded_linesize = frame->nb_samples * av_get_bytes_per_sample(frame->format);
LOGI("audio_frame n:%d nb_samples:%d pts:%s\n", frame_count++, frame->nb_samples, av_ts2timestr(frame->pts, &(pInputStream->codec->time_base)));
if (got_frame) {
do {
av_init_packet(&pktOut);
pktOut.data = NULL;
pktOut.size = 0;
LOGI("encode frame");
DIE_IF_UNDEFINED(pOutputStream->codec, "no output codec");
DIE_IF_UNDEFINED(frame->nb_samples, "no nb samples");
DIE_IF_UNDEFINED(pOutputStream->codec->internal, "no internal");
LOGI("tests done");
len = avcodec_encode_audio2(pOutputStream->codec, &pktOut, frame, &got_packet);
LOGI("encode done");
DIE_IF_LESS_ZERO(len, "Error (re)encoding frame: %s", e2s(len));
} while (!got_packet);
// write packet;
LOGI("write packet");
/* Write the compressed frame to the media file. */
error = av_interleaved_write_frame(pOutputCtx, &pktOut);
DIE_IF_LESS_ZERO(error, "Error while writing audio frame: %s", e2s(error));
av_free_packet(&pktOut);
}
pktIn.data += len;
pktIn.size -= len;
} while (pktIn.size > 0);
av_free_packet(&pktIn);
}
LOGI("write trailer");
av_write_trailer(pOutputCtx);
LOGI("end");
/// close resources
avcodec_free_frame(&frame);
avcodec_close(pInputStream->codec);
av_free(pInputStream->codec);
avcodec_close(pOutputStream->codec);
av_free(pOutputStream->codec);
avformat_close_input(&pInputCtx);
avformat_free_context(pOutputCtx);
return 0;
}

Meanwhile I have figured this out and written an Android Library Project that does this
(for audio files). https://github.com/fscz/FFmpeg-Android
See the file /jni/audiodecoder.c for details

Related

C++, Android NDK: How to save my raw audio data to file properly and load it again

I'm working on an Android app that plays back audio. To minimize latency I'm using C++ via JNI to play the app using the C++ library oboe.
Currently, before playback, the app has to decode the given file (e.g. an mp3), and then plays back the decoded raw audio stream. This leads to waiting time before playback starts if the file is bigger.
So I would like to do the decoding beforehand, save it, and when playback is requested just play thre decoded data from the saved file.
I have next to no knowledge of how to do proper file i/o in C++ and have a hard time wrapping my head around it. It is possible that my problem can be solved just with the right library, I'm not sure.
So currently I am saving my file like this:
bool Converter::doConversion(const std::string& fullPath, const std::string& name) {
// here I'm setting up the extractor and necessary inputs. Omitted since not relevant
// this is where the decoder is called to decode a file to raw audio
constexpr int kMaxCompressionRatio{12};
const long maximumDataSizeInBytes = kMaxCompressionRatio * (size) * sizeof(int16_t);
auto decodedData = new uint8_t[maximumDataSizeInBytes];
int64_t bytesDecoded = NDKExtractor::decode(*extractor, decodedData);
auto numSamples = bytesDecoded / sizeof(int16_t);
auto outputBuffer = std::make_unique<float[]>(numSamples);
// This block is necessary to get the correct format for oboe.
// The NDK decoder can only decode to int16, we need to convert to floats
oboe::convertPcm16ToFloat(
reinterpret_cast<int16_t *>(decodedData),
outputBuffer.get(),
bytesDecoded / sizeof(int16_t));
// This is how I currently save my outputBuffer to a file. This produces a file on the disc.
std::string outputSuffix = ".pcm";
std::string outputName = std::string(mFolder) + name + outputSuffix;
std::ofstream outfile(outputName.c_str(), std::ios::out | std::ios::binary);
outfile.write(reinterpret_cast<const char *>(&outputBuffer), sizeof outputBuffer);
return true;
}
So I believe I take my float array, convert it to a char array and save it. I am not certain this correct, but that is my best understanding of it.
There is a file afterwards, anyway.
Edit: As I found out when analyzing my saved file I only store 8 bytes.
Now how do I load this file again and restore the contents of my outputBuffer?
Currently I have this bit, which is clearly incomplete:
StorageDataSource *StorageDataSource::openPCM(const char *fileName, AudioProperties targetProperties) {
long bufferSize;
char * buffer;
std::ifstream stream(fileName, std::ios::in | std::ios::binary);
stream.seekg (0, std::ios::beg);
bufferSize = stream.tellg();
buffer = new char [bufferSize];
stream.read(buffer, bufferSize);
stream.close();
If this is correct, what do I have to do to restore the data as the original type? If I am doing it wrong, how does it work the right way?
I figured out how to do it thanks to #Michael's comments.
This is how I save my data now:
bool Converter::doConversion(const std::string& fullPath, const std::string& name) {
// here I'm setting up the extractor and necessary inputs. Omitted since not relevant
// this is where the decoder is called to decode a file to raw audio
constexpr int kMaxCompressionRatio{12};
const long maximumDataSizeInBytes = kMaxCompressionRatio * (size) * sizeof(int16_t);
auto decodedData = new uint8_t[maximumDataSizeInBytes];
int64_t bytesDecoded = NDKExtractor::decode(*extractor, decodedData);
auto numSamples = bytesDecoded / sizeof(int16_t);
// converting to float has moved to the reading function, so now i save decodedData directly.
std::string outputSuffix = ".pcm";
std::string outputName = std::string(mFolder) + name + outputSuffix;
std::ofstream outfile(outputName.c_str(), std::ios::out | std::ios::binary);
outfile.write((char*)decodedData, numSamples * sizeof (int16_t));
return true;
}
And this is how I read the stored file again:
long bufferSize;
char * inputBuffer;
std::ifstream stream;
stream.open(fileName, std::ifstream::in | std::ifstream::binary);
if (!stream.is_open()) {
// handle error
}
stream.seekg (0, std::ios::end); // seek to the end
bufferSize = stream.tellg(); // get size info, will be 0 without seeking to the end
stream.seekg (0, std::ios::beg); // seek to beginning
inputBuffer = new char [bufferSize];
stream.read(inputBuffer, bufferSize); // the actual reading into the buffer. would be null without seeking back to the beginning
stream.close();
// done reading the file.
auto numSamples = bufferSize / sizeof(int16_t); // calculate my number of samples, so the audio is correctly interpreted
auto outputBuffer = std::make_unique<float[]>(numSamples);
// the decoding bit now happens after the file is open. This avoids confusion
// The NDK decoder can only decode to int16, we need to convert to floats
oboe::convertPcm16ToFloat(
reinterpret_cast<int16_t *>(inputBuffer),
outputBuffer.get(),
bufferSize / sizeof(int16_t));
// here I continue working with my outputBuffer
The important bits of information/understanding C++ I didn't have or get were
a) the size of a pointer is not the same as the size of the data it
points to and
b) how seeking a stream works. I needed to put the
needle back to the start before I would find any data in my buffer.

How to play decoded in-memory PCM with Oboe properly?

I use oboe to play sounds in my ndk library, and I use OpenSL with Android extensions to decode wav files into PCM. Decoded signed 16-bit PCM are stored in-memory (std::forward_list<int16_t>), and then they are sent into the oboe stream via a callback. The sound that I can hear from my phone is alike original wav file in volume level, however, 'quality' of such a sound is not -- it bursting and crackle.
I am guessing that I send PCM in audio stream in wrong order or format (sampling rate ?). How can I can use OpenSL decoding with oboe audio stream ?
To decode files to PCM, I use AndroidSimpleBufferQueue as a sink, and AndroidFD with AAssetManager as a source:
// Loading asset
AAsset* asset = AAssetManager_open(manager, path, AASSET_MODE_UNKNOWN);
off_t start, length;
int fd = AAsset_openFileDescriptor(asset, &start, &length);
AAsset_close(asset);
// Creating audio source
SLDataLocator_AndroidFD loc_fd = { SL_DATALOCATOR_ANDROIDFD, fd, start, length };
SLDataFormat_MIME format_mime = { SL_DATAFORMAT_MIME, NULL, SL_CONTAINERTYPE_UNSPECIFIED };
SLDataSource audio_source = { &loc_fd, &format_mime };
// Creating audio sink
SLDataLocator_AndroidSimpleBufferQueue loc_bq = { SL_DATALOCATOR_ANDROIDSIMPLEBUFFERQUEUE, 1 };
SLDataFormat_PCM pcm = {
.formatType = SL_DATAFORMAT_PCM,
.numChannels = 2,
.samplesPerSec = SL_SAMPLINGRATE_44_1,
.bitsPerSample = SL_PCMSAMPLEFORMAT_FIXED_16,
.containerSize = SL_PCMSAMPLEFORMAT_FIXED_16,
.channelMask = SL_SPEAKER_FRONT_LEFT | SL_SPEAKER_FRONT_RIGHT,
.endianness = SL_BYTEORDER_LITTLEENDIAN
};
SLDataSink sink = { &loc_bq, &pcm };
And then I register callback, enqueue buffers and move PCM from buffer to storage until it's done.
NOTE: wav audio file is also 2 channeled signed 16 bit 44.1Hz PCM
My oboe stream configuration is the same:
AudioStreamBuilder builder;
builder.setChannelCount(2);
builder.setSampleRate(44100);
builder.setCallback(this);
builder.setFormat(AudioFormat::I16);
builder.setPerformanceMode(PerformanceMode::LowLatency);
builder.setSharingMode(SharingMode::Exclusive);
Audio rendering is working like that:
// Oboe stream callback
audio_engine::onAudioReady(AudioStream* self, void* audio_data, int32_t num_frames) {
auto stream = static_cast<int16_t*>(audio_data);
sound->render(stream, num_frames);
}
// Sound::render method
sound::render(int16_t* audio_data, int32_t num_frames) {
auto iter = pcm_data.begin();
std::advance(iter, cur_frame);
const int32_t rem_size = std::min(num_frames, size - cur_frame);
for(int32_t i = 0; i < rem_size; ++i, std::next(iter), ++cur_frame) {
audio_data[i] += *iter;
}
}
It looks like your render() method is confusing samples and frames.
A frame is a set of simultaneous samples.
In a stereo stream, each frame has TWO samples.
I think your iterator works on a sample basis. In other words next(iter) will advance to the next sample, not the next frame. Try this (untested) code.
sound::render(int16_t* audio_data, int32_t num_frames) {
auto iter = pcm_data.begin();
const int samples_per_frame = 2; // stereo
std::advance(iter, cur_sample);
const int32_t num_samples = std::min(num_frames * samples_per_frame,
total_samples - cur_sample);
for(int32_t i = 0; i < num_samples; ++i, std::next(iter), ++cur_sample) {
audio_data[i] += *iter;
}
}
In short: essentially, I was experiencing an underrun, because of usage of std::forward_list to store PCM. In such a case (using iterators to retrieve PCM), one has to use a container whose iterator implements LegacyRandomAccessIterator (e.g. std::vector).
I was sure that the linear complexity of methods std::advance and std::next doesn't make any difference there in my sound::render method. However, when I was trying to use raw pointers and pointer arithmetic (thus, constant complexity) with debugging methods that were suggested in the comments (Extracting PCM from WAV with Audacity, then loading this asset with AAssetManager directly into memory), I realized, that amount of "corruption" of output sound was directly proportional to the position argument in std::advance(iter, position) in render method.
So, if the amount of sound corruption was directly proportional to the complexity of std::advance (and also std::next), then I have to make the complexity constant -- by using std::vector as an container. And using an answer from #philburk, I got this as a working result:
class sound {
private:
const int samples_per_frame = 2; // stereo
std::vector<int16_t> pcm_data;
...
public:
render(int16_t* audio_data, int32_t num_frames) {
auto iter = std::next(pcm_data.begin(), cur_sample);
const int32_t s = std::min(num_frames * samples_per_frame,
total_samples - cur_sample);
for(int32_t i = 0; i < s; ++i, std::advance(iter, 1), ++cur_sample) {
audio_data[i] += *iter;
}
}
}

Compress Videos using FFMPEG and JNI

I want to create an android application which can locate a video file (which is more than 300 mb) and compress it to lower size mp4 file.
i already tried to do it with this
This tutorial is a very effective since you 're compressing a small size video (below than 100 mb)
So i tried to implement it using JNI .
i managed to build ffmpeg using this
But currently what I want to do is to compress videos . I don't have very good knowledge on JNI. But i tried to understand it using following link
If some one can guide me the steps to compress video after open file it using JNI that whould really great , thanks
Assuming you've got the String path of the input file, we can accomplish your task fairly easily. I'll assume you have an understanding of the NDK basics: How to connect a native .c file to native methods in a corresponding .java file (Let me know if that's part of your question). Instead I'll focus on how to use FFmpeg within the context of Android / JNI.
High-Level Overview:
#include <jni.h>
#include <android/log.h>
#include <string.h>
#include "libavcodec/avcodec.h"
#include "libavformat/avformat.h"
#define LOG_TAG "FFmpegWrapper"
#define LOGI(...) __android_log_print(ANDROID_LOG_INFO, LOG_TAG, __VA_ARGS__)
#define LOGE(...) __android_log_print(ANDROID_LOG_ERROR, LOG_TAG, __VA_ARGS__)
void Java_com_example_yourapp_yourJavaClass_compressFile(JNIEnv *env, jobject obj, jstring jInputPath, jstring jInputFormat, jstring jOutputPath, jstring JOutputFormat){
// One-time FFmpeg initialization
av_register_all();
avformat_network_init();
avcodec_register_all();
const char* inputPath = (*env)->GetStringUTFChars(env, jInputPath, NULL);
const char* outputPath = (*env)->GetStringUTFChars(env, jOutputPath, NULL);
// format names are hints. See available options on your host machine via $ ffmpeg -formats
const char* inputFormat = (*env)->GetStringUTFChars(env, jInputFormat, NULL);
const char* outputFormat = (*env)->GetStringUTFChars(env, jOutputFormat, NULL);
AVFormatContext *outputFormatContext = avFormatContextForOutputPath(outputPath, outputFormat);
AVFormatContext *inputFormatContext = avFormatContextForInputPath(inputPath, inputFormat /* not necessary since file can be inspected */);
copyAVFormatContext(&outputFormatContext, &inputFormatContext);
// Modify outputFormatContext->codec parameters per your liking
// See http://ffmpeg.org/doxygen/trunk/structAVCodecContext.html
int result = openFileForWriting(outputFormatContext, outputPath);
if(result < 0){
LOGE("openFileForWriting error: %d", result);
}
writeFileHeader(outputFormatContext);
// Copy input to output frame by frame
AVPacket *inputPacket;
inputPacket = av_malloc(sizeof(AVPacket));
int continueRecording = 1;
int avReadResult = 0;
int writeFrameResult = 0;
int frameCount = 0;
while(continueRecording == 1){
avReadResult = av_read_frame(inputFormatContext, inputPacket);
frameCount++;
if(avReadResult != 0){
if (avReadResult != AVERROR_EOF) {
LOGE("av_read_frame error: %s", stringForAVErrorNumber(avReadResult));
}else{
LOGI("End of input file");
}
continueRecording = 0;
}
AVStream *outStream = outputFormatContext->streams[inputPacket->stream_index];
writeFrameResult = av_interleaved_write_frame(outputFormatContext, inputPacket);
if(writeFrameResult < 0){
LOGE("av_interleaved_write_frame error: %s", stringForAVErrorNumber(avReadResult));
}
}
// Finalize the output file
int writeTrailerResult = writeFileTrailer(outputFormatContext);
if(writeTrailerResult < 0){
LOGE("av_write_trailer error: %s", stringForAVErrorNumber(writeTrailerResult));
}
LOGI("Wrote trailer");
}
For the full content of all the auxillary functions (the ones in camelCase), see my full project on Github. Got questions? I'm happy to elaborate.

Not able to read audio streams with ffmpeg

I am trying to solve a big problem but stuck with very small issue. I am trying to read audio streams inside a video file with the help of ffmpeg but the loop that should traverse the whole file of streams only runs couple of times. Can not figure out what is the issue as others have used it very similarly.
Following is my code please check:
JNIEXPORT jint JNICALL Java_ru_dzakhov_ffmpeg_test_MainActivity_logFileInfo
(JNIEnv * env,
jobject this,
jstring filename
)
{
AVFormatContext *pFormatCtx;
int i,j,k, videoStream, audioStream;
AVCodecContext *pCodecCtx;
AVCodec *pCodec;
AVFrame *pFrame;
AVPacket packet;
int frameFinished;
float aspect_ratio;
AVCodecContext *aCodecCtx;
AVCodec *aCodec;
//uint8_t inbuf[AUDIO_INBUF_SIZE + FF_INPUT_BUFFER_PADDING_SIZE];
j=0;
av_register_all();
char *str = (*env)->GetStringUTFChars(env, filename, 0);
LOGI(str);
// Open video file
if(av_open_input_file(&pFormatCtx, str, NULL, 0, NULL)!=0)
;
// Retrieve stream information
if(av_find_stream_info(pFormatCtx)<0)
;
LOGI("Separating");
// Find the first video stream
videoStream=-1;
audioStream=-1;
for(i=0; i<&pFormatCtx->nb_streams; i++) {
if(pFormatCtx->streams[i]->codec->codec_type==AVMEDIA_TYPE_AUDIO)
{
LOGI("Audio Stream");
audioStream=i;
}
}
av_write_header(pFormatCtx);
if(videoStream==-1)
LOGI("Video stream is -1");
if(audioStream==-1)
LOGI("Audio stream is -1");
return i;}
you may be having issue related to library loading and unloading and how that relates to repeated calls thru jni. Not sure from what your symptom is , but if u have no solution try reading :
here
and here

Too small ffmpeg rtsp decoding buffer

I'm decoding rtsp on Android with ffmpeg, and I quickly see pixelization when the image updates quickly or with a high resolution:
After googling, I found that it might be correlated to the UDP buffer size. I have then recompiled the ffmpeg library with the following parameters inside ffmpeg/libavformat/udp.c
#define UDP_TX_BUF_SIZE 327680
#define UDP_MAX_PKT_SIZE 655360
It seems to improve but it still starts to fail at some point. Any idea which buffer I should increase and how?
For my problem (http://libav-users.943685.n4.nabble.com/UDP-Stream-Read-Pixelation-Macroblock-Corruption-td4655270.html), I was trying to capture from a multicast UDP stream that had been set-up by someone else. Because I didn't have the ability to mess with the source, I ended up switching from using libav to using libvlc as a wrapper and it worked perfectly. Here is the summary of what worked for me:
stream.h:
#include <vlc/vlc.h>
#include <vlc/libvlc.h>
struct ctx {
uchar* frame;
};
stream.cpp:
void* lock(void* data, void** p_pixels){
struct ctx* ctx = (struct ctx*)data;
*p_pixels = ctx->frame;
return NULL;
}
void unlock(void* data, void* id, void* const* p_pixels){
struct ctx* ctx = (struct ctx*)data;
uchar* pixels = (uchar*)*p_pixels;
assert(id == NULL);
}
main.cpp:
struct ctx* context = (struct ctx*)malloc(sizeof(*context));
const char* const vlc_args[] = {"-vvv",
"-q",
"--no-audio"};
libvlc_media_t* media = NULL;
libvlc_media_player_t* media_player = NULL;
libvlc_instance_t* instance = libvlc_new(sizeof(vlc_args) / sizeof(vlc_args[0]), vlc_args);
media = libvlc_media_new_location(instance, "udp://#123.123.123.123:1000");
media_player = libvlc_media_player_new(instance);
libvlc_media_player_set_media(media_player, media);
libvlc_media_release(media);
context->frame = new uchar[height * width * 3];
libvlc_video_set_callbacks(media_player, lock, unlock, NULL, context);
libvlc_video_set_format(media_player, "RV24", VIDEOWIDTH, VIDEOHEIGHT, VIDEOWIDTH * 3);
libvlc_media_player_play(media_player);

Categories

Resources