I need to intercept an emoticon entry and change for my own emoticon.
When I intercept an emoticon, for example, the FACE WITH MEDICAL MASK (\U+1F604), I get an UTF-16 char (0xD83D 0xDE04), Is it possible to convert this char value to the unicode value?
I need to convert 0xD83D 0xDE04 to \u1f604.
Thanks,
I get an UTF-16 char (0xD83D 0xDE04), Is it possible to convert this char value to the unicode value?
For just a single code point in a string, you can convert it to an integer with:
int codepoint = "\uD83D\uDE04".codePointAt(0); // 0x1F604
It is, however quite tedious to go over a whole string with codePointCount/codePointAt. Java/Dalvik's String type is strongly tied to UTF-16 code units and the codePoint methods are a poorly-integrated afterthought. If you are simply hoping to replace an emoji with some other string of characters, you are probably best off doing a plain string replace or regex with the two code units as they appear in the String type, eg text.replace("\uD83D\uDE04", ":-D").
(BTW Face with medical mask is U+1F637.)
\u1f604 is the UTF-32 encoding of that emoticon. You can convert this way:
byte[] bytes = "\uD83D\uDE37".getBytes("UTF-32BE");
Related
We want to draw a music symbols in View.onDraw(),and found that unicode contains a few of symbols.here is the Code Chart
But when i call drawText("\u1D100"),only the four character after u encoded,the last "0" still draw with "0".How to solve this problem.
Strings in Java/Android are encoded using UTF-16. The \u escape notation supports up to 4 hex digits. So, to encode a Unicode codepoint above U+FFFF, you have to encode it as a UTF-16 surrogate pair. This is clearly explained in the Java/Android documentations.
U+1D100 is 0xD834 0xDD00 in UTF-16, so use this instead:
drawText("\uD834\uDD00", ...)
Alternatively, you can convert the Unicode codepoint to a char[] array at runtime and then draw it:
char[] ch = Character.toChars(0x1D100);
drawText(ch, 0, ch.length, ...)
Either way, of course you have to use a font that actually supports U+1D100.
I am trying to convert nsstring (password) to MD5 and base64. For Ascii chars all are working fine but when I am trying to test for '£' or '?' sign, MD5 is giving me some junk values. Which is failing at my server end.
NSData *pwdData = [password dataUsingEncoding:NSUTF16StringEncoding allowLossyConversion:YES];
unsigned char result[CC_MD5_DIGEST_LENGTH];
CC_MD5(pwdData.bytes, pwdData.len, result);
[Base64 initialize];
[Base64 encode:result length:CC_MD5_DIGEST_LENGTH];
But same thing works fine in Android.
MessageDigest msgDigest = java.security.MessageDigest.getInstance("MD5");
msgDigest.update(s.getBytes("US-ASCII"));
byte bytes[] = msgDigest.digest();
return android.util.Base64.encodeToString(bytes, Base64.NO_WRAP);
I am not sure is it problem with MD5 or BASE64? When Android is doing same and it works fine.
You can't encode a symbol like '£' to ASCII using lossy encoding. That gives the conversion process to drop special characters.
(The ASCII character set does not include the '£' symbol. It includes the US "#" symbol instead. back in the day, UK machines displayed that ASCII code as '£' and US machines used "#", but with the advent of Unicode, there are separate characters for both.
You should use UTF8, which, as I understand it, is an encoding that outputs all ASCII characters, with ASCII tags that mark non-ASCII unicode characters in a way that they can be "reconstituted" when converted back to unicode.
(disclaimer: I'm not an expert on the different encodings of Unicode.)
size_t mbstowcs(wchar_t *dest, const char *src, size_t n);
I have some information encoded using gb2312 which needs to change to unicode in android platform.
1.before calling this method, is it right to setlocale(LC_ALL, "zh_CN.UTF-8")?
2.how large need to allocate to dest?
3.What to pass to n, is it strlen(src)?
Thank you very much.
mbstowcs() will convert a string from the current locale's multibyte encoding into a wide character string. Wide character strings are not necessarily unicode, but on Linux they are (UCS32).
If you set the locale to zh_CN.UTF-8 then the current locale's multibyte encoding will be UTF-8, not GB2312. You would need to set a GB2312 locale for the input to be treated using that multibyte encoding.
The C standard implies that a single multibyte character will produce at most one wide character, so you can use strlen(src) as the upper bound on the number of wide characters required:
size_t n = strlen(src) + 1;
wchar_t *dest = malloc(n * sizeof dest[0]);
(glibc has an extension to the standard mbstowcs() interface, which allows you to pass it a NULL pointer to find out exactly how many wide characters will be produced by the conversion, but that won't help you on Android.) It works like this:
size_t n = mbstowcs(NULL, src, 0) + 1;
The value of n that should be passed is the maximum number of wide characters that should be written through the dest pointer, including the terminating null wide character.
However, you should instead look into using libiconv, which has been successfully compiled for Android. It allows you to explicitly choose the source and destination character sets you are interested in, and is a much better fit for this problem.
i have a String displayed on a WebView as "Siwy & Para Wino"
i fetch it from url , i got a string "Siwy%2B%2526%2BPara%2BWino". // be corrected
now i'm trying to use URLDecoder to solve this problem :
String decoded_result = URLDecoder.decode(url); // the url is "Siwy+%26+Para+Wino"
then i print it out , i still saw "Siwy+%26+Para+Wino"
Could anyone tell me why?
From the documentation (of URLDecoder):
This class is used to decode a string which is encoded in the application/x-www-form-urlencoded MIME content type.
We can look at the specification to see what a form-urlencoded MIME type is:
The form field names and values are escaped: space characters are replaced by '+', and then reserved characters are escaped as per [URL]; that is, non-alphanumeric characters are replaced by '%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks, as in multi-line text field values, are represented as CR LF pairs, i.e. '%0D%0A'.
Since the specification calls for a percent sign followed by two hexadecimal digits for the ASCII code, the first time you call the decode(String s) method, it converts those into single characters, leaving the two additional characters 26 intact. The value %25 translates to % so the result after the first decoding is %26. Running decode one more time simply translates %26 back into &.
String decoded_result = URLDecoder.decode(URLDecoder.decode(url));
You can also use the Uri class if you have UTF-8-encoded strings:
Decodes '%'-escaped octets in the given string using the UTF-8 scheme.
Then use:
String decoded_result = Uri.decode(Uri.decode(url));
thanks for all answers , i solved it finally......
solution:
after i used URLDecoder.decode twice (oh my god) , i got what i want.
String temp = URLDecoder.decode( url); // url = "Siwy%2B%2526%2BPara%2BWino"
String result = URLDecoder.decode( temp ); // temp = "Siwy+%26+Para+Wino"
// result = "Swy & Para Wino". !!! oh good job.
but i still don't know why.. could someone tell me?
I'm parsing an input stream coming from Facebook. I'm using something like
BufferedReader in =
new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));
And then in.readLine to actually read from the stream.
The stream seems to have Unicode characters already encoded in ASCII, so I see things like \u00e4 (with \u actually being two discrete ASCII characters). Right now, I'm fishing for "\u" and decoding the subsequent two hex bytes, turn them into a char and replace the string with them, which is obviously the worst way to do it.
I'm sure there's a cool way to use a native function to decode the special characters as the stream is being read (I was hoping it could be done on the InputStreamReader layer). But how?
The data format is JSON, which I didn't mention (and which Thanatos already assumed). Using Android's JSON parser will automatically decode the characters properly. Parsing JSON yourself is obviously a dumb idea on several levels.
If you see '\u00e4' with the '\' and the 'u' being separate, then the '0', '0', 'e' and '4' probably make up the 4 hex digits of a 2 byte (16 bit) Unicode character. The notation is based on C99; the alternative is '\U00XXYYZZ' where there are 8 hex digits representing a 32-bit UTF-32 character (but, because Unicode is a 21-bit code set, the first 2 of the 8 digits are always 0, and the next is often (usually) 0 too).
However, that doesn't answer your question about what's the right Android way to read the data, and you are right that there probably is one.