I want to develop a complex-script IME. But I am not quite sure about the respective functionality of IME and underlying render-er. I think KitKat is using Harfbuzz-ng. For complex scripts, the mapping isn't linear like English. The characters need to be rearranged/display differently as you input.
Assumptions: The language is displayed properly on the device. e.g. you can read news etc. Android version: KitKat
So my questions are,
Is the reordering the job of the IME or the underlyging engine?
Is IME only responsible for feeding unicode points to the system and then
the render-er would do the rearrangement?
Please point me into some readings about this topic.
Yes to your both questions. IME should just feed Unicode characters in their logical order to the underlying system, the final rendering (AKA shaping) will be done by the text layout engine.
Related
I have been working on an application that involves font recognition based on a users free hand drawing characters in Android Canvas.
In this application the user is asked to enter some predefined characters in a predefined order (A,a,B,c). Based on this, is there any way to show the very similar font which matches the user's hand writing.
I have researched on this topic found some papers & articles but most of them are recognizing font from a captured image. In that case they are having a lot of problems by segmenting paragraphs, individual letters and so on. But in my scenario I know what letter the user is drawing.
I have some knowledge in OpenCV and Machine Learning. Need help on how to proceed with this problem.
It is not exactly clear to me what you want to accomplish with your application but I assume that you are trying to output a font from a database of fonts that matches a users handwriting the most.
In Machine Learning this would be a classification problem. The number of classes will by equal to the number of different fonts in your database.
You could solve this with the help of a Convolutional neural network which are widely used for image and video recognition related tasks. If you've never implemented a CNN before I would suggest that you look up this resources to learn about Torch which is a easy-to-start-with toolkit to implement CNN's. (Of course there are more Frameworks such as: Tensor Flow, Caffe, Lasagne, ...)
Torch Homepage
Deep learning with Torch: 60 minutes blitz
Torch Cheatsheet
The main obstacle you will face is that Neural Networks need thousands of images (>100.000) to properly train them and to achieve satisfying results. Furthermore you do not only need the images but also a correct label for each image. Will say, you would need a training image such as a handwritten character and the corresponding font it matches the most out of your database as its label.
I would suggest that you read about so called transfer learning which can give you an initial boost as you do not need to set up a CNN model completely by yourself. In addition people have pre-trained such a model for a related task so that you safe extra time as you would not need to train it for many hours on a GPU. (see CUDA)
A great resource to start with is the paper: How transferable are features in deep neural networks?, which could be helpful for the stated reasons.
To get tons of training and testing data you can look up the following open datasets that provide all types of characters that can be helpful for your task:
Artificial Characters Data Set
UJI Pen Characters Data Set
The Chars74K dataset
Hand written - Datasets
A New Benchmark Dataset for Handwritten Character Recognition
For access to a lot of fonts and maybe even the possibility to create further datasets on your own you can have a look at Google Fonts.
You might find this article very interesting : https://erikbern.com/2016/01/21/analyzing-50k-fonts-using-deep-neural-networks/
Seems like a pretty straightforward deep learning supervised learning problem.
Generate a ton of randomly deformed samples for letters of each target font type, and train a convnet on that set?
The ideal would be to have a huge set of labeled, handwriting to font data, but that feels unlikely.
You could also use the generated, progressive to font code to take a bunch of handwritten samples, and transform them to look more like the font of your choice, as a dataset.
This is good place to start : https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py
Digit letter recognition with convnets.
This is quite a bit of work though if you haven't worked with that stuff before.
I would suggest using OCR library tesseract. Very well developed and mature. It also has support for training with other languages which you can use to train over a set of font.
Approach
Training:-
Take all 26(per alphabet) images for n fonts. Train tessaract over 26 A's, then 26 B's and soon.
Testing:-
Take a sentence and separate all characters.
For each character, find certainty score(supported in library) from Tesseract. Note, for character 'a, use the trained model on all 'a''s from different fonts.
For all characters, find best font using some metric (average, median, etc). For example: You can sum certainty score each font received for all characters and use the font which got max result.
I have been tasked to create a new android 3rd party keyboard that supports customized emojis (My own Icons) from assets.
I want to implement a softkeyboard with my own emoji icons without using UniCode or my custom UniCode.
Questions:
If I create a custom emoji, with some string of characters which does not map to the standard set of emojis, and text this message to a friend with the customized app/keyboard, what shows up on their device? The regular ASCII characters string? or the image.
I have read two ways to add image to textView.
Html.ImageGetter
Spannable Image (String consisting of image)
Which way should i prefer?
Is there anyway to display(send) the customized emoji on the recipients device without downloading the app/keyboard?
Is it possible to send text with Image(Emoji) to other apps like facebook,skype and for messaging.
Need suggestions.
Simple Words
I simply want to send my custom(Emoji icon) to other apps as this app does with out using unicode or with my custom UniCode.
Thanks.
To answer the first part of your question, by definition Emoji are encoded characters - they are a part of unicode. See here:
http://emojipedia.org/unicode-8/
There are many references to this if you look. You will also discover that for a long time Apple and Google used two different sets. They are now merged, but then Android manufacturers and carriers have added their own emoji "versions."
Changing the keyboard to have custom images will not change the data that is transmitted to the other device. So, to answer the next part of your question: what shows up on their device is whatever the ASCII or Unicode character that was transmitted, not what the sender "typed."
In other words, to answer the next part of you question, generally speaking there is not a way to send custom characters to another device without them having your app. A keyboard would not suffice because apps do the job of displaying text/images. So unless an app knows that you are the content provider or source or whatever of the image, it will display whatever it knows to do. So, a custom keyboard won't even display custom emoji on your own device, unless you are also using your own app.
I said "generally not possible" because here are your options:
You can become a part of the Unicode Consortium (http://unicode.org/) and submit your emoji images for approval to go into a future version of Unicode. There are future emoji already in the works, FYI. That will likely take several years, by the way, and it's unlikely they will approve commercially biased images. However, unicode has the capacity to handle billions of characters and is hardly even close to being full (Unicode 16, not Unicode 8 - Unicode 8 is full). Even then, the Android team would need to adopt it and include it in a future release like smileys and the current set of emoji are.
You build your own app with your own emoji and get people on both sides of the communication to download it, like everyone else does. IMO, this not ideal for anyone but the developer of the app. Still, the ones that people enjoy I applaud for their work and success. That industry is fickle and difficult to really gain a presence in.
I'm a part of sdmmllc.com - and we're trying to develop a messaging "platform" exactly for situations like yours. We want to allow messaging apps to "discover" other messaging apps, incorporate features like custom emoji, without the user getting confused or having to download tons of apps. This is similar to plug-ins in web browsers. Our developers love us, our users love us, but it's a slow process.
Develop a competing platform. (And good luck with that - no one really seems to be getting the concept, except the few developers we have, and the hundreds of users that download our app every day and love our idea and platform... but there's no money it so far...)
you can only use those uniCode which are supported. you cannot add your own for generic use. But you can use it with in your app and between your app. It is not possible.
In short it is not possible to create your own Unicode. But you can do it with app to app. and on both ends you have to store those character in database. and match them when they get..
I'd like to localize an iOS and Android app for over 25 different languages using a custom font. The problem is no new fonts cover that type of ground. What's the current best practice for this problem in app development?
I've only come up with the following 2 solutions, however I'm unsure either are possible or a good idea.
1.) Hire a font designer to create a massive custom font across at least 3 different weights (regular, bold, italic). But that could be extremely costly considering the app license for some single-weight simplified Chinese fonts are 5k alone.
2.) Use a custom font that covers about 10 languages thanks to Latin characters (e.g. Proxima Nova) and then similar-looking fonts for unsupported languages.
It seems to me the current best practice is to use a custom font that covers a bulk of Latin-based languages and all unsupported languages fallback to the device fonts. But I've experienced problems there as well particularly with localizing dynamic third-party data from Facebook connect. If I'm in America and my friend in China has Chinese characters in their username a custom font outputs little square glyphs instead of falling back on the device's Chinese character set.
In any case both solutions add quite a bit of file size to the app which itself could be a deal-breaker. For solution 2 I've also considered using static images instead of embedding additional fonts, but that also presents a problem in localizing dynamic third-party data and creates a ton of work if the app should ever need updated.
Any suggestions would be greatly appreciated. Thanks.
Background
TextView always had issues with RTL (Right-To-Left) languages. Since I know only how to read Hebrew (in addition to English), I will talk about its issues:
Text alignment (and I'm not talking about gravity) . As an RTL language, Hebrew puts words from right to left (compared to English which is the opposite).
For demonstrating how annoying it is, imagine that instead of showing "Hello world." you usually get ".Hello world" . This could be easily fixed if you had it in a single sentence, but it's harder when there are multiple punctuations characters.
Vowels positions. Hebrew doesn't require vowels in order to read text, but sometimes it's very hard to read without them (especially the bible). For vowels, Hebrew has what is called "NIKUD", which are actually like dots inside the letters. The problem in Android was that they were usually positioned in the wrong location .
For demonstrating how annoying it is, imagine that instead of showing "Hello world." you usually get ".eHlol owrld" . Even if you try to fix it (put the vowels always one character after the current one), the position in the letter wasn't correct (imagine that the "e" in "Hello" would be like above the "H", for example) .
Only on version 4.2 (read here, under "Native RTL support") , Google has fixed all of the Hebrew related issues (or at least it seems so).
The problem
the problems with Hebrew has caused each Israeli carrier and each custom ROM maker have its own solution of how to fix the different issues, which makes it practically impossible to handle RTL text on pre 4.2 devices.
Things can get even more frustrating in case the text include both Hebrew and English letters.
What I've tried
I've read many websites talking about those problems, and I've tried many variants of the solutions, none has solved the problem on all devices:
Some suggest to put the character '\u200F' (or '\u202D') at the end/start/both of the text.
Some suggest using Html.fromHtml() method and put something special there.
Some even suggest to use the WebView instead (and maybe use WebSettings.setDefaultTextEncodingName() ).
The question
Is there a definite solution for this problem?
I would assume the best thing is that because Android 4.2 solves this, and Android is open source, we should have its TextView imported into a library that we can use, but Google hasn't provided such a library yet.
Sadly, I don't think there's a good solution ("good" meaning "does the job and is readily available"). For our own Android apps that support Hebrew, we use a custom rendering mechanism that we developed over many years. The rendering mechanism does everything: proprietary fonts; bidirectional (bidi) analysis; glyph placement; line break analysis; text flow; etc. Some of the problems trying to use native Android text handling capabilities (especially pre-4.2) are:
Really crappy fonts. However, you can package third-party fonts like DejaVu that are pretty good. The right font can do wonders with positioning of nekudot—and te'amim1, if you need that. (I agree with you about how important correct pointing placement is; reading Hebrew text with misplaced nekudot is like reading a screen-full of CAPTCHAs.)
Buggy bidi analysis. What makes it worse is that the bugs seem to be different for different versions of Android. Modifying the text to include strategically placed bidi formatting codes (RTL mark; LTR mark; etc.) can overcome many of these bugs (see the discussion here, which isn't specific to Android). However, it's a nuisance to do this and, because of the inconsistencies among Android versions, it is difficult to predict in advance what help the framework is going to need.
No (or poorly thought out) framework-level awareness of right-to-left issues. For instance, good luck getting the scroll bar to display on the left side of a Hebrew TextView. For our apps, we had to build an entire scroll-bar system just to get this to work how we wanted. (Good think Android is open source!)
Poor line and word break analysis. At least one early version of Android on which we tested thought that each nikud mark was a word boundary. When it comes to line breaks, the system often doesn't know how to handle Hebrew punctuation like maqaf, gershayim, or sof pasuk.
Some of the newer Unicode characters (like HOLAM HASER FOR VAV—U+05BA—new to Unicode 5.0) are not recognized as Hebrew script by the system.
My recommendation is that, unless you are prepared to build a top-to-bottom text handling system yourself, you give up on high-quality text display on pre-4.2 versions of Android, particularly if you need to support nekudot and te'amim. Also, plan to use the techniques I mentioned in the first two points above.
1 biblical cantillation marks
As of August 2013, Android has posted API documentation for a Bi Directional Formatter which might suit your needs. This is contained in the Android Support v4 library which I believe should run in versions prior to Android 4.2.
Refer to:
http://developer.android.com/reference/android/support/v4/text/BidiFormatter.html
I am trying to determine the best approach on Android for supporting multiple languages. I understand how resource folders work, and how they get selected when the activity loads and/or has a configuration change. I also have seen a technique of creating a new locale, assigning it as the default, and broadcasting a config change. This works. But I get the impression from this thread (https://groups.google.com/forum/?fromgroups=#!topic/android-developers/_ZGOTHwzl-w) and the answers from the google framework team this way of doing things is not recommended / supported. So my questions are:
What is the recommended way to support multi languages on the fly without sending the user to the OS menus for language selection?
Same question for keyboard input.
Finally, I see on my Motorola Xoom when I ask the Locale class for supported languages an impressive list. For instance, ja-JP, which I've tested and seen allows me to display Japanese chars. However there is no SIP for this language on the device. Can I download new keyboards to my platform in these cases? It just seems odd to me that the platform would support displaying many more languages than it could input.
Just leave the system do the work.
A user with a language and a keyboard selected in settings will just expect the same conditions from your app.
As far as I knew, there's no better approach as the strings.xml in the different values folders.