I am new to NLP and I am using different pretrained model than Wav2Vec2.
I am now playing with createWav2Vec2 py. provided by Pytorch.
https://github.com/pytorch/android-demo-app/blob/master/SpeechRecognition/create_wav2vec2.py
I load the pretrained model from hugging face , but during the sanity check , the transcribed text is wrong
Place i changed in code from
model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-960h")
To
model1 = Wav2Vec2ForCTC.from_pretrained("patrickvonplaten/wav2vec2-base-timit-demo-colab")
Correct results
Result: I HAD THAT CURIOSITY BESIDE ME AT THIS MOMENT
But i got
Result: J <pad></s>DJ<pad>F</s>DJF<pad>JBJSN JKJCJ JFJO<pad>YLJCJ L<pad>HL<pad> F<pad>F</s> JC<pad>JHKJHLRFJ<pad>
Could somebody advise what is wrong here?
Your problem is the alphabet variable in https://github.com/pytorch/android-demo-app/blob/master/SpeechRecognition/create_wav2vec2.py. You should to replace it with here https://huggingface.co/patrickvonplaten/wav2vec2-base-timit-demo-colab/blob/main/vocab.json. You have to use only the keys of the dict as list.
For the <pad> you have to specify it in the load of the tokernizer/processor that you want it as <pad>
Related
This is my code of a tensorflow Lite model imported on Android Studio :
enter image description here
And this is the out put When I run the App:
enter image description here
I Don't understand it , How can I get the model Output ??
update :
The output is Float Array of 6 elements , but what I want is the Index of the Largesse element , I tried this code :
enter image description here
Is It right ?? I'm getting the same output 1- on every prediction
Looks like your TFLite model generates a feature vector as a float array, which represents the characteristics of the image data.
See also https://brilliant.org/wiki/feature-vector/
The feature vector is usually considered as an intermediate data and it often requires to put the feature vectors again to the additional model for image classifications or some other tasks.
I'm working with Google Places API.
I'm getting right the place in my log, but I just want the name of the type of place. Let's say this is my output
Place 'Parque Las Tejas' has likelihood: 0.950000 Type: [69, 1013, 34]
So, at first I get the position where I am, the likelihood of where I am and then I just used:
List<Integer> types = placeLikelihood.getPlace().getPlaceTypes();
thinking it would return like "park" or "square" but instead of that I get those array of numbers [69, 1013, 34].
According to what I read here, there is lots of types that defines a certain place.
What I want is to get that kind of types only, so if I'm at a restaurant I don't want the name of the restaurant but instead just the type, so "Restaurant" will be my output.
I need this because I want to give the user options depending on what type of place they are.
Any idea what I'm doing wrong?
The List<Integer> that you get is actually the id of type of places, according to the docs:
The elements of this list are drawn from Place.TYPE_*
The list is here. So basically your goal is to convert int code to a string using this list. You can find your solution here, basically you obtain all the fields from the Place class, find all the fields that start with "TYPE", get the int value and compare it to the value that you get from the getPlaceTypes().
even though this is an old post, it will help you if you an encountering this issue,
i was encountering this issue
and here is my approach
List<Place.Type> types = placeLikelihood.getPlace().getTypes();
to get all the type you can use a foreach loop
for(Object type:types){
// get all individual type
}
I'm trying to use the model used on this tutorial in an Android app. I wanted to modify DetectorActivity.java and TensorFlowMultiBoxDetector.java found here but it seems like i miss some parameters as imageMean, imageStd, inputName, outputLocationsName and outputScoresName.
From what I understand, input name is the name of the input for the model and both outputs are the names for the position and score output, but what do imageMean and imageStd stand for ?
I don't need to use the model with a camera, I just need to detect objects on bitmaps.
Your understanding of input/output names seems correct. They are tensorflow node names that can receive input and will contain the outputs at the end. imageMean and imageStd are used to scale the RGB values of the image to the mean of 0 and the standard deviation of 1. See the 8 lines starting from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/android/src/org/tensorflow/demo/TensorFlowMultiBoxDetector.java#L208
The TensorFlow android demo app you are referring to has been updated. It now supports MobileNets. Check it out at github: commit 53aabd5cb0ffcc1fd33cbd00eb468dd8d8353df2.
I am new to Parse and i am working on a Project that uses Android PARSE Sdk so i was wondering on how can i make a where query with or condition using the android sdk.
I want to make a query like this [Psuedo code]
Select * from employ where employId is ("1","2","3")
I found this on parse documentation, i don't know if it helps or not.
Edit:
This is what i found on PARSE but its not working
String[] id= {"1","2","3"};
query.whereContainedIn("employId ", Arrays.asList(id));
It returns me an empty list but if i query them one by one i get result... Can anyone tell me whats wrong ?
You can use whereContainedIn to select specific rows. See this post you can get more ideas. https://www.parse.com/questions/different-arrays-and-wherecontainedin-for-android.
List<String> employId = new ArrayList<String>();
employId.add("1"); employId.add("2"); employId.add("2");
query.whereContainedIn("employId", employId);
If you are still not clear. check this https://www.parse.com/docs/android_guide#queries
I have found the solution and i must say its pretty lame of PARSE to not mention this anywhere in there documentation.
The problem was that the values that i was using inwhereContainedIn method were of type String but in reality they were pointers to another table's row.
I was trying to get the values using only there ids[as it is displayed on parse] but instead i had to pass the whole object in order to retrieve them. That was the reason on why it was returning empty list.
The thing is Even though it displays IDs [pointer to object in a table] we cant search using only ID's instead we have to use complete Parse objects if we want to search a table based on Specific Object.
I am new to both Android and Stack Overflow. I have started developing and Android App and I am wondering two things:
1) Is it possible to parametrize a TextView? Lets say I want to render a text message which states something like: "The user age is 38". Lets suppose that the user age is the result of an algorithm. Using some typical i18n framework I would write in my i18n file something like "The user age is {0}". Then at run time I would populate parameters accordingly. I haven't been able to figure out how to do this or similar approach in Android.
2) Let's suppose I have a complex object with many fields. Eg: PersonModel which has id, name, age, country, favorite video game, whatever. If I want to render all this information into a single layout in one of my activities the only way I have found is getting all needed TextViews by id and then populate them one by one through code.
I was wondering if there is some mapping / binding mechanism in which I can execute something like: render(myPerson, myView) and that automatically through reflection each of the model properties get mapped into each of the TextViews.
If someone has ever worked with SpringMVC, Im looking for something similar to their mechanism to map domain objects / models to views (e.g. spring:forms).
Thanks a lot in advanced for your help. Hope this is useful for somebody else =)
bye!
In answer to #1: You want String.format(). It'll let you do something like:
int age = 38;
String ageMessage = "The user age is %d";
myTextView.setText(String.format(ageMessage, age));
The two you'll use the most are %d for numbers and %s for strings. It uses printf format if you know it, if you don't there's a quicky tutorial in the Formatter docs.
For #2 I think you're doing it the best way there is (grab view hooks and fill them in manually). If you come across anything else I'd love to see it.