I'm downloading HTML source code of remote page into String variable. Unfortunetely the page is encoded via iso-8859-2 and contains characters from polish alphabet. How can I convert this string to utf-8, so I can display it's parts in TextView?
Thanks
You shouldn't need to "convert" the string at all, if you obey the Content-Encoding header sent by the web server.
Right now, you probably ignore that header while reading the response from the server (some BufferedReader-to-StringBuffer/Builder loop I assume), try this in your download code instead:
HttpResponse response = ....
String text = EntityUtils.toString(response.getEntity());
EntityUtils will automagically use the content encoding specified by the server.
Related
My problem is that I am getting strings where some characters are Unicode.
"fieldName": "Ac6jHguQjKKUxx6MSOpjO2kOLKPAdjStVs1pgTGNSU8\u003d"
Then I immediately send such a string to another API and the server returns me an error with a code of 500. If I use this string in postman and replace the unicode with a normal one, then the code 200 is returned from the server.
I thought there was a problem in the server, but they checked it and said that they were sending it as expected.
How do I translate Unicode?
The easiest way is to use URLDecoder. Here is an example.
String str = "Ac6jHguQjKKUxx6MSOpjO2kOLKPAdjStVs1pgTGNSU8\u003d";
String decode = URLDecoder.decode(str, "UTF-8");
System.out.println(decode);
//Ac6jHguQjKKUxx6MSOpjO2kOLKPAdjStVs1pgTGNSU8=
I have a Java class which I convert to string using GSON. Post this the string is base64 encoded (for some reason, lets not go there :) ) When I decode it back I lose all { and " " characters in json.
For example: {"name":"ABC"} decoded and encoded back becomes nameABC
I want to get my old data back i.e I want {"name:"ABC"} back
String json = "{\"name\":\"ABC\"}";
byte en[] = android.util.Base64.decode(json,Base64.NO_WRAP);
String st = android.util.Base64.encodeToString(en,Base64.NO_WRAP);
Something as simple as above, content is lost
Please help
You can't Base64 has set 64 characters that can converted to binary and vice versa, characters like { and " is not in the 64 set of characters check this
Try using URLDecoder with UTF-8 or any other encoding method which support UTF-8
I am using Retrofit 2.1. But when I post a field that contains cyrillic word, it gives an empty response, however it should return 2-3 items. Here is the api:
#FormUrlEncoded
#POST("my_awesome_base_url")
Call<Questions> getQuestions(#Field(value = "rowsdata", encoded = false) String rowsdata);
And the rowsdata contains some cyrillic word that db should search and respond similar results. Here is an example rowsdata:
rowsdata = {"code":"-4","start":"1","where":"where short_question like 'Вақт' ","end":"2"}
In the rowsdata, Вақт is in cyrillic, but it is somehow encoding it to some chars so that server is giving me an empty list.
I checked this on Postman, and it gave me the desired results, but when I send a request using Retrofit, it is responding like nothing is found...
Probably an encoding issue.
From developers site :
A String represents a string in the UTF-16 format in which
supplementary characters are represented by surrogate pairs (see the
section Unicode Character Representations in the Character class for
more information). Index values refer to char code units, so a
supplementary character uses two positions in a String.
Try encoding the string into UTF-8, make sure your file is UTF-8 as well (default in Android Studio I think).
Im sending an XML with HttpPost to a server. This used to wotk fine, and im doing it succesfully in other parts of the project.
Im using a StringBuilder to create the xml request but since i am appending strings as Data to the nodes, i am getting an error response from the parser on the server:
Invalid byte 2 of 2-byte UTF-8 sequence.
When i log the request and check it in w3c xml validator there are no errors.
This is an excerpt (whole method would be to big and has sensitive Data) from my Stringbuilder Method:
StringBuilder baseDocument = new StringBuilder();
baseDocument.append("<?xml version=\"1.0\" encoding=\"UTF-8\"?><request><setDisposalRequest><customer><company><![CDATA[");
baseDocument.append(company);
baseDocument.append("]]></company>");
baseDocument.append("<firstName><![CDATA[");
baseDocument.append(name);
baseDocument.append("]]></firstName>");
...
As soon as i replace the String vars i append with hardcoded Strings, all works fine
i.e
baseDocument.append(name);
to
baseDocument.append("name");
All the strings have values, non of them a null or are empty!
Before the request i set the StringEntity to xml
se.setContentType("application/xml");
what am i missing?!?
Your XML header claims that it's UTF-8, yet you never mention if you actually write UTF-8. Make sure the actual bytes you send are UTF-8 encoded. The error message suggests that you're using another encoding (probably a ISO-8859-* variant).
This is another reason that manually constructing XML like this is dangerous: there are just too many corner cases to observe and it's much easier to use a real XML handling library. Those tend to get the corner cases correct ;-)
And no: StringBuilder certainly does not break UTF-8. The problem is somewhere else.
Like the problem described by title, my problem is when I read the Http header that returned from the server in the android programs , it appears Disorderly code of strings, so, what I don't know is, what kind of charset the server used to encoding the http response Headers ?and what charset the andorid used to decode the http response headers?
How do I escape or deal the Garbled?
Since HTTP Headers are MIME, see RFC 822 where it is defined as ASCII.
3.1.2. STRUCTURE OF HEADER FIELDS
Once a field has been unfolded, it may be viewed as being composed of
a field-name followed by a colon (":"), followed by a field-body, and
terminated by a carriage-return/line-feed. The field-name must be
composed of printable ASCII characters (i.e., characters that have
values between 33. and 126., decimal, except colon). The field-body
may be composed of any ASCII characters, except CR or LF. (While CR
and/or LF may be present in the actual text, they are removed by the
action of unfolding the field.)
Then RFC 2047
describes extensions to RFC 822 to allow non-US-ASCII text data in
Internet mail header fields