I have some misundestanding concerning to FTS, I'd be thanksful if someone will be able to help me.
GOAL: Full text search using MATCH function.
Problem: Unable to do search by extended ASCII characters like: '#¿®£$ and etc.
Details: There are three predefined tokenizers: simple, porter and unicode61. But all of these tokenizers recognize special symbols as separators, because the documentation says:
A term is a contiguous sequence of eligible characters, where eligible characters are all alphanumeric characters and all characters with Unicode codepoint values greater than or equal to 128.
Possible solution (bad one): There is a way to specify extra symbols which should be used as separators for tokens or otherwise as a part of token.
CREATE VIRTUAL TABLE text USING FTS4(column, tokenize=unicode61 "tokenchars='$%")
After that I can find words like: that's, doll$r, 60%40 and etc, because tokenizer doesn't split tokens by '$% symbols.
But it doesn't suit me because there are a lot of extended symbols in ASCII table and it's not such a good solution to list all of them.
The main question: What is the best solution to do search by special symbols.
Thanks a lot and feel free to ask for more details if need.
Related
I'm trying to put together a small android app that can randomly return an emoji to the user. My intention is to just use actual unicode emoji characters, and return them as unicode string characters.
I built a full array of unicode strings that could be randomly chosen from, and many will display correctly. However some are showing up as unsupported characters (a rectangle with an x through it).
Obviously not every platform will support every unicode emoji character, but if possible I'd like a way to determine what is and isn't a supported character. The ideal would be to query for a list of supported characters, but being able to test individual characters would also do the job just fine.
Also check out Paint.hasGlyph(String), which was added in API level 23. You can use this to test if a character like an emoji has a glyph available.
This is what the documentation says:
boolean hasGlyph (String string)
Determine whether the typeface set on the paint has a glyph supporting
the string. The simplest case is when the string contains a single
character, in which this method determines whether the font has the
character. In the case of multiple characters, the method returns true
if there is a single glyph representing the ligature. For example, if
the input is a pair of regional indicator symbols, determine whether
there is an emoji flag for the pair.
Finally, if the string contains a variation selector, the method only
returns true if the fonts contains a glyph specific to that variation.
Checking is done on the entire fallback chain, not just the immediate
font referenced.
See also
How to detect emoji support on Android by code
So, when you talk about a character being "unsupported", it sounds like what you mean is that the current font doesn't have a glyph for the character (and either the application doesn't have fallback logic to find a different font that does, or the system doesn't have any font that does).
In regular Java, this is pretty easy: given an instance of java.awt.Font, you can see if it has a glyph for a given Unicode character by using the canDisplay method.
The Android APIs, for whatever reason, don't seem to expose a way to figure out what font you're actually working with. (android.graphics.Typeface keeps that information private: see "Check the family of a Typeface object in Android".) However, you might at least try something like new java.awt.Font("SansSerif", java.awt.Font.PLAIN, 12) to get a basic 12-point sans-serif font. You'll want to test, of course, to see if that gives a usable approximation for the emoji that the real font will be able to display.
You can use Character.isDefined to check if a character is defined in the version of Unicode on the device.
Interesting problem here, wondering if anyone has come across it.
I am building an Android app that has some special characters as text (mainly Japanese characters) and our designers want some soft returns strategically placed, incase the text needs to wrap due to width limitations on smaller devices.
The problem is that since the text is essentially Japanese, there are no spaces between words.
I know we can use \u200b as a zero-width space resulting in a string like this:
abcdef\u200bghijklmnop
appearing like this when there is enough room:
abcdefghijklmnop
and like this if it needs to wrap:
abcdef
ghijklmnop
The problem is that if instead of standard English characters, we use Japanese characters it doesn't seem to work. We don't get soft line breaks at all. It instead always breaks right where it runs out of space regardless of where we put the \u200b
ヘルプヘルプ\u200bヘルプヘルプ
results in:
ヘルプヘルプヘルプヘ
ルプ
Has anyone dealt with this before or have any ideas on how to solve this?
I have a sort-of Scrabble-dictionary lookup app in mind (already have Java version working in Windows) in which the user can:
simply look up a word (e.g., hacker)
or
use wildcards (#, *, ?) to define a pattern for words he might be able to make
E.g.:
fixed length word: h???er (6 letters)
varying length word: h*er (3 to max number of letters)
fixed length word with no repeated letters: h###er (6 letters)
and unlimited combinations of ? and *: h?*er (4 to max number of letters)
I'd hate to have to make my own keypad and worry about positioning the cursor and editing text fields. It would be great if I could remap an existing keypad to look something like this:
Is it possible for an Android developer to change the face and action of a key on one of the standard keyboards? Or is there a keyboard template for just such purpose?
EDIT
Of course I mean to remap during execution of my app, which, the more I think about it, is probably impossible, except how does the "enter" key morph into "go" and "next" and stuff like that?
Likely I'll just have to settle for this setup with a standard qwerty pad and a few buttons, though I'd still have to worry about inserting the 3 wildcards into the text fields:
Might not be so bad. Agree?
I am developing an android and ios app which will be translated in multiple languages including english, hindi and gurmukhi. We have some pdf files in Gurmukhi that use ASCII fonts instead of unicode and we need to translate them to other languages so that when we try to copy the text we end up with some weird text. Is there a way to do it?
Thanks in advance.
Every symbol font has its own repertoire, so you need a custom lookup table from the ‘wrong’ character to the real character corresponding to what the glyph looks like in the particular font you are using. You can work out this table by installing the font in question and opening it in the Character Map (charmap.exe), noting down which symbol appears in each of the boxes normally occupied by A, B, C...
There is a converter tool here which knows about some common symbol fonts. If the font you want is in that list, you could try to extract the lookup table by pasting in all the characters in the range 32–126;128–255 (!"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_``abcdefghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ) and grabbing what comes out when converted to ‘Unicode’.
Do you need to do the conversion in the application itself? If so you will typically need to write a character-by-character lookup loop. If not, best to use a tool like this manually to convert the material you want into standard Unicode, because dealing with text encoded in arbitrary symbol fonts is a pain in the neck.
I would like to display special characters such as: ṁ ṭ m ē. In case they don't display here as well, this is how the four characters should look like:
In Android, these will display in squares. For other scripts, I am able to come over this problem with using a different font. But in this case setting the font (TextView.setTypeFace) will not solve this issue. These characters display correctly in for example OpenOffice (using Arial or Courier New), but inside Android it doesn't even when using the same fonts).
I also tried having the string saved as a unicode encoded string (e.g. in strings.xml: \u1E41 \u1E6D) getting the same result (in the logs they appear as they should). Any ideas?
If these characters are representable in Unicode, then you should be able to use Html.fromHtml() to get the glyph into a TextView, e.g.
textView.setText(Html.fromHtml("Ӓ"), TextView.BufferType.SPANNABLE);
It was really only a font issue. It was just hard to find a font that supports all characters I need.
Seeing that Google Translate has no problems with transliteration characters motivated me to make a more thorough search for fonts. Below is a list of useful fonts for this purpose:
http://guindo.pntic.mec.es/jmag0042/alphaeng.html (extensive but non-free)
http://users.teilar.gr/~g1951d/
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=FontDownloads
http://www.mufi.info/fonts/