Talkback list of common issues with reading

Talkback list of common issues with reading - android

I try to make my application suitable for disabled people.
Most of texts in application is dynamically retrieved from server so I'm not able to predict what content will be displayed.
Is there some list of most common reading issues relating with Talkback?
I found so far: date format, # character, 'OR' vs 'or', 'IT' vs 'it', $1,555.55 format etc. I want to have full support in my app but I cannot find at least the most popular issues.

Related

How to limit the use of certain character sets

I hope this question isnt going to be down-flagged for not showing some actual code, but thats the core of this situation. I simply have no clue where to start to solve this issue, even after trying to use several combinations of keywords on both Google, and here on SO.
My client suddenly decided that half of the Android App I'm developing for him has to be Chinese, so after I have made some changes in the Database so some fields can take in Simplified Chinese character sets, I need to make sure that my client (living in holland) only uses those characters in that particular EditText field in the app. (There are more Database fields that now only allow Simplified Chinese, however these values come from a dropdown list in the app, so I dont need to worry about wrong characters for them).
So how would one make sure that only Simplified Chinese is used in an EditText field?

Here is a project in Ruby that attempts to detect whether characters are Traditional Chinese, Simplified Chinese, or Japanese (maybe others?): https://github.com/jpatokal/script_detector
This detection is based on the Unihan Database, in which there is a file called Unihan_Variants.txt. (Download zip file containing this text file here.)
Conceivably, you could parse the txt file into a lookup table and check the unicode value as the text is entered during onTextChanged() for your EditText. However, the readme on the project linked above states: "It is important to understand that this requires long sections of text to work reliably, since a single character or even several characters may be valid Japanese, traditional Chinese and simplified Chinese simultaneously." So, weeding out characters on an individual basis might prove difficult.

Tesseract - OCR issues with typewriter style fonts

We are using Tesseract.NET (and the Android version too) to recognize and extract document data. It worked really good with Arial and Cambria fonts, but now we have to recognize documents like that:
Tesseract cannot recognize it. Absolutely nothing (except the big sized serial number on the right upper corner).
We tried to train it, but - maybe it's our fault - it's still unstable.
What can we do?
(Btw the font is use by national offices, we cannot get it as true type or other font format.

In the current form it is very hard for an OCR tool to recognize any letters.
Serif fonts are hard to ocr.
Letters are very close together. Some are joined.
A dictionary is not of any help.
You might be able to improve the result with the following:
As this looks like an vehicle registration certificate you should be able to predict the positions of the textstrings of interest and then ocr they separatly.
Thereby using the -psm=7 or 8 option (assume single line or word).
As some strings seem to be numbers only you can help tesseract by using the digits argument.
For the alphanumeric strings it might help to reduce the dictionary pruning (or completely remove the dawg files.)
If those strings like 'ETZ' or 'MZ' are abbreviations you could also build an dictionary with those.
Reducing the yellow and green color is also an (easy) option you could test.
Use the barcode instead of trying to ocr the string.
For tesseract questions it always helps if you specify the version used and, if you do image preprocessing, provide a sample image of the processed input.

Support multiple languages at the same time?

I have seen many apps which will ask language settings when you open the app first time,it will load particular language whole time.but in my case i need to support different languages at the same time,the user can have more than 2 language songs and it should support the meta data of 4 languages(English, Korean,Japanese,Chinese) at the same time .how ever I am getting meta data from the server or local song.so I use something like that for Korean language.(in this case it won't support,Chinese)
String songnametemp = json.getString(0)//from server
String songname=`songnametemp.getBytes("iso-8859-1"), "euc-kr")`//changing to korean
TextView songtext=(TextView)findViewbyID(R.id.song);
songtext.setText(songname)
the problem is that I can't hard code for each and every language, i would not know which language(META-DATA) songs are playing from server. is there any better way to do it?

In My lollipop device it support koreans and hindi language,but my kikat does not the support both .I think each new version they are supporting more languages.
In Windows this is handled by system library called Uniscribe, on Apple systems by ATSUI, and on Linux systems by Pango. Android is based on Linux
but unfortunately Google seem to have removed the parts for handling complex scripts. (A rather strange decision since most Android devices
are for communications including text.) Complex scripts work fine on other mobile devices using a Linux based operating system like the Nokia N9 and N900
.in android There is no way to identify which language Unicode you song meta data has,if you the know language you can encode to specific language like what you did above.
Or you can give the setting in preference"to choose language " when you launch the app first time.but however if you want support all language at the same time.
You have to encode in server only ,so you can support all languages at the same time.

When I run into this issue and have control of the data stored on the server, I convert everything to Unicode, e.g. UTF-8, before storage. This way the conversion is only done once and there is no need for downstream applications to handle anything other than say UTF-8. A major benefit is your use case of displaying multiple languages together side by side which can now happen transparently.

I don`t understand well what you trying to do , but for support multi-language in android for
- static values , just use values like that "values-en" , "values-ar"
- server values , make your web services handle language
"example.com/en/getdata"
"example.com/ar/getdata"
and in your values add link as language

Android pocketsphinx & Fsg model

Context
I am currently building an sdk/service on wich applications can access to voice based command,
For the moment i'm using android pocketsphinx to detect a keyword (which is "wake"), and then analyse whole sentence with google voice recognition,
But my problem is i want to make it all offline! So i'm in my way to replace google voice recognition by a full utilisation of pocketsphinx...
My Problem
The user define which is the word he want to detect, and previously i just compared the said-word and what google voice speech-to-text returned me...
So know I want to update the grammar that pocket sphinx use with just the word given by the user, which problematic because (following the javadoc of android pocket sphinx) it can only take grammar files!
Question
Are there any way i can update android pocketsphinx grammar on the fly?
Edit
I forgot to talk about this method:
public void addFsgSearch(String searchName, FsgModel fsgModel) (in github pocketsphinx)
wich seem to deosn't take a grammar file like any other grammar setter method, but rather a class/struct? but the problem it's it isn't documented.....

If you need to detect just one word, consider using addKeywordSearch.

I had the same issue, and more. Perhaps these undocumented discoveries can help you.
Using the overloaded method "addGrammarSearch(String name, String fsgString)" allows you to put your entire FSG or JSGF grammar definition in a string, rather than sourcing it from a file if you wish (only a small file open/read time advantage).
"addKeyphraseSearch(String name, String keyphrase)" // only accommodates ONE WORD or PHRASE, no threshold, no grammar.
"addKeywordSearch(String name, File keywordList)" // accommodates MULTIPLE key WORDS or PHRASES, adding thresholds for each.
Several caveats include:
The grammar searches use JSGF format, parsing the defined syntax correctly. However:
1.1 Tags are not implemented
1.2 Unclear if weights (though the same // syntax as in keyword lists) actually apply recognizer thresholds (they have different meanings in PocketSphinx versus Sun Microsystems).
1.3 Rule names are also not implemented either.
1.4 In other words, you provide a grammar in JSGF, and your Hypothesis as well as FinalResult strings still give you the recognized lowest-level phrase detected in the grammar -- NOT the grammar tags, nor even rule metasymbols.
1.3.1 IMHO, that makes grammars pointless, and actually less efficient and less flexible than keyword list files (which are actually words or phrases) due to the option to provide a threshold for recognizer scrutiny, per phrase. Further, if the RULE & TAG names are not returned, then there is zero information regarding the structure of the grammar that was recognized. So as syntactically complex and flexible as it is, I do not see the advantage of bothering with a grammar definition at all in PocketSphinx; the best multiple keyphrase approach is simply to expand your grammar into a keyword list file. Please correct me if I am mistaken.
Search methods, whether containing the word "phrase" or the word "word", actually accommodate both phrases or single words.
I have assumptions re: the undocumented fsgModel class, but we're not allowed to give assumptions.
Though this may help clarify some aspects,the above fails to add any functionality to the package. Lastly, the C source code has methods getRuleName() and getTagName(). But, discussions regarding this topic between users and developers seems to stonewall -- their is no motivation to add tags or rule name associations to recognized words or phrases in a defined grammar, apparently because the developers believe grammars are old-school and nobody uses them anymore.

Android: Find out which font file is appropriate for the characters I want to display

I am maintaining an Android app that people use to display strings in various exotic languages like Tibetan or old Greek. Because Android devices come with very few fonts, users can put font files on the SD card, and the app will use them.
QUESTION: Given a string, how can I automatically decide which font file is the most appropriate, so that this string appears without characters being replaced with squares/boxes?
Notes:
Each string is in one language.
Strings are displayed in a WebView.
Custom fonts work, the only problem is deciding which font file to use.
Instead of a single font, it could provide a list of fonts that are acceptable for that string.
Unnecessary context, for the curious: I am trying to develop this feature:
http://code.google.com/p/ankidroid/issues/detail?id=779
UPDATE: I ended up creating the Antisquare Open Source library based on Mostafa's idea.
It has a getSuitableFonts method which is blazingly fast.

Android by itself does not provide enough for such a task. Loading and rendering fonts in Android happens in Skia, which is written in C. Skia detects if a character can't be found in a font and falls back to another font for such characters (not the whole string). That's how Japanese, Hebrew, or Arabic text is shown in Android and that's exactly why these scripts don't have bold face! (Their font is selected through fallback and fallback only selects one font file.)
Unfortunately, this mechanism is not provided in APIs and you have to build similar thing on your own. It seems complicated, but is easier than it looks. All you have to do is:
Prepare lists of characters available in each font file.
For every string find the font that has more characters of the string.
Getting list of characters in each font
You don't have to do this on-the-fly in your Android app. You can prepare the list of characters in each font and put these lists in your app. I say that because this is way easier with tools that may not be available in Android. I would do that through Python scripting in a font app (most serious font tools have awesome Python scripting environments), but these apps are expensive and are for serious type designers. Since you're an Android developer, I recommend using sfntly, a library in Java and C++. Doing what you need (getting a list of Unicode characters available in a font file) is easy with sfntly. This sample works with CMap tables (tables that hold character to glyph mapping) and should be a good starting point for you.
Now the interesting part is that snftly is in Java and you may be able to include that in your Android app and do everything automatically. That's awesome by I recommend you start by getting familiar with snftly.
Selecting the font
After the previous part you'll have a list of Unicode character for every font, and based on these lists selecting the font file that provides most characters of every string is trivial.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.