how to build BufferReceived() to capture voice using RecognizerIntent? - android

i am working on an android application using
RecognizerIntent.ACTION_RECOGNIZE_SPEECH,,, my problem is that i don't know how
to create the buffer which will capture the voice that the user inputs. i
read alot on stack overflow, but i just don't understand how
i will include the buffer and the recognition service call back into my code. AND HOW WILL I DO PLAY BACK FOR THE CONTENTS WHICH WERE SAVED INTO THE BUFFER.
this is my code:
public class Voice extends Activity implements OnClickListener {
byte[] sig = new byte[500000] ;
int sigPos = 0 ;
ListView lv;
static final int check =0;
protected static final String TAG = null;
protected void onCreate(Bundle savedInstanceState) {
// TODO Auto-generated method stub
Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
SpeechRecognizer recognizer = SpeechRecognizer
RecognitionListener listener = new RecognitionListener() {
public void onResults(Bundle results) {
ArrayList<String> voiceResults = results
if (voiceResults == null) {
Log.e(TAG, "No voice results");
} else {
Log.d(TAG, "Printing matches: ");
for (String match : voiceResults) {
Log.d(TAG, match);
public void onReadyForSpeech(Bundle params) {
Log.d(TAG, "Ready for speech");
public void onError(int error) {
"Error listening for speech: " + error);
public void onBeginningOfSpeech() {
Log.d(TAG, "Speech starting");
public void onBufferReceived(byte[] buffer) {
// TODO Auto-generated method stub
TextView display=(TextView)findViewById (;
System.arraycopy(buffer, 0, sig, sigPos, buffer.length) ;
sigPos += buffer.length ;
public void onEndOfSpeech() {
// TODO Auto-generated method stub
public void onEvent(int eventType, Bundle params) {
// TODO Auto-generated method stub
public void onPartialResults(Bundle partialResults) {
// TODO Auto-generated method stub
public void onRmsChanged(float rmsdB) {
// TODO Auto-generated method stub
public void onClick(View arg0) {
// TODO Auto-generated method stub

The Android speech recognition API (as of API level 17) does not offer a reliable way to capture audio.
You can use the "buffer received" callback but note that
RecognitionListener says about onBufferReceived:
More sound has been received. The purpose of this function is to allow
giving feedback to the user regarding the captured audio. There is no
guarantee that this method will be called.
buffer: a buffer containing a sequence of big-endian 16-bit
integers representing a single channel audio stream. The sample rate
is implementation dependent.
and RecognitionService.Callback says about bufferReceived:
The service should call this method when sound has been received. The
purpose of this function is to allow giving feedback to the user
regarding the captured audio.
buffer: a buffer containing a sequence of big-endian 16-bit
integers representing a single channel audio stream. The sample rate
is implementation dependent.
So this callback is for feedback regarding the captured audio and not necessarily the captured audio itself, i.e. maybe a reduced version of it for visualization purposes. Also, "there is no guarantee that this method will be called", i.e. Google Voice Search might provide it in v1 but then decide to remove it in v2.
Note also that this method can be called multiple times during recognition. It is not documented however if the buffer represents the complete recorded audio or only the snippet since the last call. (I'd assume the latter, but you need to test it with your speech recognizer.)
So, in your implementation you should copy the buffer into a global variable to be saved e.g. into a wav-file once the recognition has finished.


How do you get the audio byte[] from the synthesised speech created by the TextToSpeech engine?

I'm trying to get the audio byte[] that's created when the TextToSpeech engine synthesises text.
I've tried creating a Visualiser and assigned a OnDataCaptureListener but the byte[] it provides is always the same, and therefore I don't believe the array is connected to the spoken text.
This is my implementation:
AudioManager audioManager = (AudioManager) this.getSystemService(Context.AUDIO_SERVICE);
audioManager.requestAudioFocus(focusChange -> Log.d(TAG, "focusChange is: is: " + focusChange), AudioManager.STREAM_MUSIC, AudioManager.AUDIOFOCUS_GAIN_TRANSIENT_MAY_DUCK);
int audioSessionId = audioManager.generateAudioSessionId();
mVisualizer = new Visualizer(audioSessionId);
new Visualizer.OnDataCaptureListener() {
public void onWaveFormDataCapture(Visualizer visualizer,
byte[] bytes, int samplingRate) {
//here the bytes are always equal to the bytes received in the last call
public void onFftDataCapture(Visualizer visualizer, byte[] bytes, int samplingRate) {
}, Visualizer.getMaxCaptureRate(), true, true);
I also found that you can use the SynthesisCallback to receive the byte[] via its audioAvailable() method but I can't seem to implement it properly.
I created a TextToSpeechService but its onSynthesizeText() method is never called. However, I can tell that the service is working as the onLoadLanguage() is called.
My question in a nutshell: How do I get the audio bytes[] representation of the audio created when the TextToSpeech engine synthesis text?
Thanks in advance.
I heard that onAudioAvailable() was deprecated and my callback is not called, too.
So a workaround is:
In Activity:
tts = null;
catch (Exception e)
tts = new TextToSpeech(this, this);
In OnInit() method:
public void onInit(int p1)
HashMap<String,String> mTTSMap = new HashMap<String,String>();
tts.setOnUtteranceProgressListener(new UtteranceProgressListener()
public void onStart(final String p1)
// TODO: Implement this method
Log.e(TAG, "START");
public void onDone(final String p1)
if (p1.compareTo("abcde") == 0)
synchronized (MainActivity.this)
public void onError(final String p1)
//this is also deprecated...
public void onAudioAvailable(final String id, final byte[] bytes)
//never calked!
runOnUiThread(new Runnable(){
public void run()
// TODO: Implement this method
Toast.makeText(MainActivity.this, "id:" + id /*"bytes:" + Arrays.toString(bytes)*/, 1).show();
Log.v(TAG, "BYTES");
Locale enEn = new Locale("en_EN");
if (tts.isLanguageAvailable(enEn) == TextToSpeech.LANG_AVAILABLE)
/*public int synthesizeToFile(java.lang.CharSequence text, android.os.Bundle params, file, java.lang.String utteranceId);*/
// public int synthesizeToFile(java.lang.String text, java.util.HashMap<java.lang.String, java.lang.String> params, java.lang.String filename);
mTTSMap.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, "abcde"); tts.synthesizeToFile("Hello",mTTSMap,"/storage/emulated/0/a.wav");
}catch(InterruptedException e){}
Then your work is to load the a.wav to the buffer you want. Using libraries like that was mentioned in this SO answer.
Create TTS Engine.
Initialize it.
OnInit is called.
In OnInit(), you setup a new HashMap and put utterence id.
Register setOnUtteranceProgressListener.
Synthesize something to a file.
Call wait();
In onDone() method call notify();
After the wait(); read the synthesized file to a buffer.

new PeerConnectionFactory() gives error on android

I am trying to implement WebRTC DataChannel on Android. I want to create a simple peerconnection object which will open DataChannel to send data over the network using WebRTC. I am getting error when I try to create my PeerConnection Object. I learnt that we use factory to create peerconnection object using factory.createPeerConnection().
For this, I have to create the PeerConnectionFactory object first. After this, I can use it to create PeerConnection Object. I get errors Could not find method and Fatal Signal 11 (SIGSEGV) at 0x00000000 (code=1) when I try to create PeerConnectionFactory object. I also tried the following code with PeerConnectionFactory.initializeAndroidGlobals(this, false, false, false); This is what I am trying to do:
PeerConnectionFactory factory = new PeerConnectionFactory();
peer = new Peer();
This is how my Peer object looks like:
public class Peer implements SdpObserver, PeerConnection.Observer, DataChannel.Observer {
private PeerConnection pc;
private DataChannel dc;
public Peer() {
this.pc = factory.createPeerConnection(RTCConfig.getIceServer(),
RTCConfig.getMediaConstraints(), this);
dc = this.pc.createDataChannel("sendDataChannel", new DataChannel.Init());
public void onAddStream(MediaStream arg0) {
// TODO Auto-generated method stub
public void onDataChannel(DataChannel dataChannel) {
this.dc = dataChannel;
public void onIceCandidate(final IceCandidate candidate) {
try {
JSONObject payload = new JSONObject();
payload.put("type", "candidate");
payload.put("label", candidate.sdpMLineIndex);
payload.put("id", candidate.sdpMid);
payload.put("candidate", candidate.sdp);
} catch (JSONException e) {
public void onIceConnectionChange(IceConnectionState iceConnectionState) {
public void onIceGatheringChange(IceGatheringState arg0) {
// TODO Auto-generated method stub
public void onRemoveStream(MediaStream arg0) {
// TODO Auto-generated method stub
public void onRenegotiationNeeded() {
// TODO Auto-generated method stub
public void onSignalingChange(SignalingState arg0) {
// TODO Auto-generated method stub
public void onCreateFailure(String msg) {
msg, Toast.LENGTH_SHORT)
public void onCreateSuccess(SessionDescription sdp) {
try {
JSONObject payload = new JSONObject();
payload.put("type", sdp.type.canonicalForm());
payload.put("sdp", sdp.description);
pc.setLocalDescription(FilePeer.this, sdp);
} catch (JSONException e) {
public void onSetFailure(String arg0) {
// TODO Auto-generated method stub
public void onSetSuccess() {
// TODO Auto-generated method stub
public void onMessage(Buffer data) {
Log.w("FILE", data.toString());
public void onStateChange() {
"State Got Changed", Toast.LENGTH_SHORT)
byte[] bytes = new byte[10];
bytes[0] = 0;
bytes[1] = 1;
bytes[2] = 2;
bytes[3] = 3;
bytes[4] = 4;
bytes[5] = 5;
bytes[6] = 6;
bytes[7] = 7;
bytes[8] = 8;
bytes[9] = 9;
ByteBuffer buf = ByteBuffer.wrap(bytes);
Buffer b = new Buffer(buf, true);
Can anybody point me to any sample source code which implements DataChannel on Android? Kindly also let me know if I am not doing it in a right way. I could not find the documentation for Android Native WebRTC which tells how to do it. I am trying to implement whatever I have learnt from using WebRTC on web.
Kindly, let me know if my question is not clear.
PeerConnectionFactory no longer requires initializing audio & video engines to be enabled.
PeerConnectionFactory.initializeAndroidGlobals(this, false, false, false);
Now you will be able to disable audio and video, and use data channels
This is a known bug in WebRTC code for Android. Following threads talk more on this bug:
The bug is currently in open status. However, there is a workaround available which will work for now. In Android Globals, we need to pass the audio and video parameters as true:
PeerConnectionFactory.initializeAndroidGlobals(getApplicationContext(), true, true, VideoRendererGui.getEGLContext());
Use this instead PeerConnectionFactory.initializeAndroidGlobals(acontext, TRUE, false, false, NULL);
Then create the factory. factory = new PeerConnectionFactory();
Then in your class Peer create the peer connection as this : factory.createPeerConnection(iceServers, sdpMediaConstraints, this);.
This worked for me to establish ONLY DataChannel without EGLContext for video streaming.
UPDATE: If you still have this error, go to a newer version! This is very deprecated.

Use Speech Recognizer to open apps

I want to create a simple app which runs in the background using a service.
Using SpeechRecognizer it would listen for application names, and when it finds existing one it would open it. Also if it does not find coincidence or the result is not clear, it would suggest some options showing them in a list or via voice.
I already know how to use SpeechRecognizer, but what I would need is to set this service to maintain running on background and avoid it from being killed. Could this be done?
In addition to all this, I'd add at least this point:
SpeechRecognizer is better for hands-free user interfaces, since your app actually gets to respond to error conditions like "No matches" and perhaps restart itself. When you use the Intent, the app beeps and shows a dialog that the user must press to continue.
My summary is as follows:
Show different UI or no UI at all. Do you really want your app's UI to beep? Do you really want your UI to show a dialog when there is an error and wait for user to click?
App can do something else while speech recognition is happening
Can recognize speech while running in the background or from a service
Can Handle errors better
Can access low level speech stuff like the raw audio or the RMS. Analyze that audio or use the loudness to make some kind of flashing light to indicate the app is listening
Consistent, and easy to use UI for users
Easy to program
This is a work around for android version 4.1.1.
public class MyService extends Service
protected AudioManager mAudioManager;
protected SpeechRecognizer mSpeechRecognizer;
protected Intent mSpeechRecognizerIntent;
protected final Messenger mServerMessenger = new Messenger(new IncomingHandler(this));
protected boolean mIsListening;
protected volatile boolean mIsCountDownOn;
private boolean mIsStreamSolo;
static final int MSG_RECOGNIZER_CANCEL = 2;
public void onCreate()
mAudioManager = (AudioManager) getSystemService(Context.AUDIO_SERVICE);
mSpeechRecognizer = SpeechRecognizer.createSpeechRecognizer(this);
mSpeechRecognizer.setRecognitionListener(new SpeechRecognitionListener());
mSpeechRecognizerIntent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
protected static class IncomingHandler extends Handler
private WeakReference<MyService> mtarget;
IncomingHandler(MyService target)
mtarget = new WeakReference<MyService>(target);
public void handleMessage(Message msg)
final MyService target = mtarget.get();
switch (msg.what)
// turn off beep sound
if (!mIsStreamSolo)
mAudioManager.setStreamSolo(AudioManager.STREAM_VOICE_CALL, true);
mIsStreamSolo = true;
if (!target.mIsListening)
target.mIsListening = true;
//Log.d(TAG, "message start listening"); //$NON-NLS-1$
if (mIsStreamSolo)
mAudioManager.setStreamSolo(AudioManager.STREAM_VOICE_CALL, false);
mIsStreamSolo = false;
target.mIsListening = false;
//Log.d(TAG, "message canceled recognizer"); //$NON-NLS-1$
// Count down timer for Jelly Bean work around
protected CountDownTimer mNoSpeechCountDown = new CountDownTimer(5000, 5000)
public void onTick(long millisUntilFinished)
// TODO Auto-generated method stub
public void onFinish()
mIsCountDownOn = false;
Message message = Message.obtain(null, MSG_RECOGNIZER_CANCEL);
message = Message.obtain(null, MSG_RECOGNIZER_START_LISTENING);
catch (RemoteException e)
public void onDestroy()
if (mIsCountDownOn)
if (mSpeechRecognizer != null)
protected class SpeechRecognitionListener implements RecognitionListener
public void onBeginningOfSpeech()
// speech input will be processed, so there is no need for count down anymore
if (mIsCountDownOn)
mIsCountDownOn = false;
//Log.d(TAG, "onBeginingOfSpeech"); //$NON-NLS-1$
public void onBufferReceived(byte[] buffer)
public void onEndOfSpeech()
//Log.d(TAG, "onEndOfSpeech"); //$NON-NLS-1$
public void onError(int error)
if (mIsCountDownOn)
mIsCountDownOn = false;
mIsListening = false;
Message message = Message.obtain(null, MSG_RECOGNIZER_START_LISTENING);
catch (RemoteException e)
//Log.d(TAG, "error = " + error); //$NON-NLS-1$
public void onEvent(int eventType, Bundle params)
public void onPartialResults(Bundle partialResults)
public void onReadyForSpeech(Bundle params)
mIsCountDownOn = true;
Log.d(TAG, "onReadyForSpeech"); //$NON-NLS-1$
public void onResults(Bundle results)
//Log.d(TAG, "onResults"); //$NON-NLS-1$
public void onRmsChanged(float rmsdB)
As commented, I think that you don't need to use Broadcast Receiver for what you are trying to do. Instead you should define a service to be continuosly listening for speech. You can find a implementation here:
And for that about android killing services, you cannot prevent a service from being killed by the system, even system services can be killed.
Anyway, you can use the Service's startForeground() method:
By default services are background, meaning that if the system needs
to kill them to reclaim more memory (such as to display a large page
in a web browser), they can be killed without too much harm. You can
set this flag if killing your service would be disruptive to the user,
such as if your service is performing background music playback, so
the user would notice if their music stopped playing.
You can see the implementation here.

Get result metadata using hal3 in android

this problem is about HAL3 of android
I want to know when use capture method in CameraDevice.
is it real working class which is CameraDevice.CaptureListener?
it was possible to get image data but couldn't receive result metadata.
this is ResultMetaDataListener.
class ResultMetaDataListener extends CameraDevice.CaptureListener{
public void onCaptureStarted(CameraDevice camera, CaptureRequest request, long timestamp) {
// TODO Auto-generated method stub
super.onCaptureStarted(camera, request, timestamp);
public void onCaptureCompleted(CameraDevice camera, CaptureRequest request, CaptureResult result) {
// TODO Auto-generated method stub
super.onCaptureCompleted(camera, request, result);
Log.i(TAG, "Capture result is available");
Integer reqCtrlMode;
Integer resCtrlMode;
if (request == null || result ==null) {
Log.e(TAG, "request/result is invalid");
Log.i(TAG, "Capture complete");
and this is capture Method. mCamera is CameraDevice object.
ResultMetaDataListener resultListener = new ResultMetaDataListener();
mCamera.capture(, resultListener, mOpsHandler);
please, help me if you know about that

Implement barge-in for Android TTS

I am having difficulty figuring out how to resolve this issue, I am not sure if I am not setting up threads correctly or if it is even possible to resolve things properly.
This is an Android app that reads certain strings out as TTS (using the native Android TTS) at certain timings. During this TTS reading, the user should be able to barge-in with instructions such as "Stop" or "Pause." This recognition is done by using the iSpeech API.
Our current solution is to have the TTS running as a Thread that will output the proper strings. Once the user presses a button to begin the voice recognition (using an Intent), the app does voice recognition and handles it perfectly, but then TTS never again outputs anything. Logcat shows the following error:
11-28 02:18:57.072: W/TextToSpeech(16383): speak failed: not bound to TTS engine
I have thought about making the voice recognition a thread of its own that pauses the TTS, but the problem would then be that the timer controlling the TTS would become unsynced with what it should be.
Any advice or help would be appreciated.
Relevant code regarding the thread and the intent are below:
public void onCreate(Bundle savedInstanceState) {
//Prevent device from sleeping mid build.
mPlayer = MediaPlayer.create(BuildOrderActivity.this,;
tts = new TextToSpeech(BuildOrderActivity.this, new TextToSpeech.OnInitListener() {
public void onInit(int status) {
if(status != TextToSpeech.ERROR)
tts.setOnUtteranceCompletedListener(new OnUtteranceCompletedListener() {
public void onUtteranceCompleted(String utteranceId) {
buttonStart = (Button) findViewById(;
buttonStart.setOnClickListener(new View.OnClickListener() {
public void onClick(View v) {
startBuild = new StartBuildRunnable();
Thread t = new Thread(startBuild);
...//code continues oncreate setup for the view}
public class StartBuildRunnable implements Runnable {
public void run() {
double delay;
buildActions = parseBuildXMLAction();
buildTimes = parseBuildXMLTime();
say("Build has started");
delayForNextAction((getSeconds(buildTimes.get(0)) * 1000));
for (int i = 1; i < buildActions.size(); i++)
delay = calcDelayUntilNextAction(buildTimes.get(i - 1), buildTimes.get(i));
delayForNextAction((long) (delay * 1000));
say("Build has completed");
* Fire an intent to start the speech recognition activity.
* #throws InvalidApiKeyException
private void startRecognition() {
try {
recognizer.startRecord(new SpeechRecognizerEvent() {
public void onRecordingComplete() {
updateInfoMessage("Recording completed.");
public void onRecognitionComplete(SpeechResult result) {
Log.v(TAG, "Recognition complete");
//TODO: Once something is recognized, tie it to an action and continue recognizing.
// currently recognizes something in the grammar and then stops listening until
// the next button press.
if (result != null) {
Log.d(TAG, "Text Result:" + result.getText());
Log.d(TAG, "Text Conf:" + result.getConfidence());
updateInfoMessage("Result: " + result.getText() + "\n\nconfidence: " + result.getConfidence());
} else
Log.d(TAG, "Result is null...");
public void onRecordingCancelled() {
updateInfoMessage("Recording cancelled.");
public void onError(Exception exception) {
updateInfoMessage("ERROR: " + exception.getMessage());
} catch (BusyException e) {
} catch (NoNetworkException e) {

