Converting String to Android JSONObject loses utf-8 - android

I am trying to get a (JSON formatted) String from a URL and consume it as a Json object. I lose UTF-8 encoding when I convert the String to JSONObject.
This is The function I use to connect to the url and get the string:
private static String getUrlContents(String theUrl) {
StringBuilder content = new StringBuilder();
try {
URL url = new URL(theUrl);
URLConnection urlConnection = url.openConnection();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(urlConnection.getInputStream()));
String line;
while ((line = bufferedReader.readLine()) != null) {
content.append(line + "\n");
}
bufferedReader.close();
} catch(Exception e) {
e.printStackTrace();
}
return content.toString();
}
When I get data from server, the following code displays correct characters:
String output = getUrlContents(url);
Log.i("message1", output);
But when I convert the output string to JSONObject the Persian characters becomes question marks like this ??????. (messages is the name of array in JSON)
JSONObject reader = new JSONObject(output);
String messages = new String(reader.getString("messages").getBytes("ISO-8859-1"), "UTF-8");
Log.i("message2", messages);

You're telling Java to convert the string (with key message) to bytes using ISO-8859-1 and than to create a new String from these bytes, interpreted as UTF-8.
new String(reader.getString("messages").getBytes("ISO-8859-1"), "UTF-8");
You could simply use:
String messages = reader.getString("messages");

You can update your code as the following:
private static String getUrlContents(String theUrl) {
StringBuilder content = new StringBuilder();
try {
URL url = new URL(theUrl);
URLConnection urlConnection = url.openConnection();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(urlConnection.getInputStream(), "utf-8"));
String line;
while ((line = bufferedReader.readLine()) != null) {
content.append(line).append("\n");
}
bufferedReader.close();
} catch(Exception e) {
e.printStackTrace();
}
return content.toString().trim();
}

You've got two encoding issues:
The server sends text encoded in a character set. When you setup your InputStreamReader, you need to pass the encoding the server used so it can be decoded properly. The character encoding is usually given in the Content-type HTTP response, in the charset field. JSON is typically UTF-8 encoded, but can also be legally UTF-16 and UTF-32, so you need to check. Without a specified encoding, your system environment will be used when marshalling bytes to Strings, and vice versa . Basically, you should always specify the charset.
String messages = new String(reader.getString("messages").getBytes("ISO-8859-1"), "UTF-8"); is obviously going to cause issues (if you have non-ascii characters) - it's encoding the string to ISO-8995-1 and then trying to decode it as UTF-8.
A simple regex pattern can be used to extract the charset value from the Content-type header before reading the inputstream. I've also included a neat InputStream -> String converter.
private static String getUrlContents(String theUrl) {
try {
URL url = new URL(theUrl);
URLConnection urlConnection = url.openConnection();
InputStream is = urlConnection.getInputStream();
// Get charset field from Content-Type header
String contentType = urlConnection.getContentType();
// matches value in key / value pair
Pattern encodingPattern = Pattern.compile(".*charset\\s*=\\s*([\\w-]+).*");
Matcher encodingMatcher = encodingPattern.matcher(contentType);
// set charsetString to match value if charset is given, else default to UTF-8
String charsetString = encodingMatcher.matches() ? encodingMatcher.group(1) : "UTF-8";
// Quick way to read from InputStream.
// \A is a boundary match for beginning of the input
return new Scanner(is, charsetString).useDelimiter("\\A").next();
} catch(Exception e) {
e.printStackTrace();
}
return null;
}

Not sure if this will help, but you might be able to do something like this:
JSONObject result = null;
String str = null;
try
{
str = new String(output, "UTF-8");
result = (JSONObject) new JSONTokener(str).nextValue();
}
catch (Exception e) {}
String messages = result.getString("messages");

Related

Loading JSON in browser works, in Android it's garbage

So I'm trying to load this JSON in Android from here, and have tried both Volley and regular HTTP requests. The page (eventually) loads fine as UTF-8 JSON and it looks fine. However, in android, I get garbage like this:
Checked the document.characterSet, it's UTF-8.
Example as Volley (trimmed out some code, so brackets may not be exact):
final JsonObjectRequest jsonObjectRequest = new JsonObjectRequest(url, new Response.Listener<JSONObject>() {
#Override
public void onResponse(JSONObject response) {
}
}, new Response.ErrorListener() {
#Override
public void onErrorResponse(VolleyError ex) {
Log.e("LOG", ex.toString());
}
});
requestQueue.add(jsonObjectRequest);
Example as regular HTTP GET:
HttpURLConnection urlConnection = null;
URL url = new URL(urlString);
urlConnection = (HttpURLConnection) url.openConnection();
urlConnection.setRequestMethod("GET");
urlConnection.setReadTimeout(10000 /* milliseconds */);
urlConnection.setConnectTimeout(2500 /* milliseconds */);
urlConnection.setDoOutput(true);
urlConnection.connect();
String UTF8 = "UTF-8";
int BUFFER_SIZE = 8192;
BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream(), UTF8), BUFFER_SIZE);
StringBuilder sb = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
sb.append(line + "\n");
}
br.close();
String jsonString = sb.toString();
urlConnection.disconnect();
urlConnection = null;
br = null;
sb = null;
if (jsonString != null &&
jsonString.length() > 0) {
return new JSONObject(jsonString);
}
Both give garbage responses. What am I missing? I'm able to access other data on other sites.
The content is compressed using zlib. See header:
Content-Encoding: deflate
You'll have to read the raw bytes and decompress them before attempting to parse as JSON. Looks like Android provides native support for zlib via the Deflater class.
Note that further, readers by default use the system default character encoding. Unless your system default happens to match that of the delivered content, you'll need to tell the system how to decode the charset. The correct thing is to read raw bytes from a stream, then turn the bytes into a string using the proper character encoding,
byte[] buffer = new byte[1024];
int c;
ByteArrayOutputStream baos = new ByteArrayOutputStream();
while ((c = inputStream.read(buffer)) > 0) {
baos.write(buffer, 0, c);
}
String json = new String(baos.toByteArray(), "UTF-8"); // assuming the encoding if UTF-8
You can either know the encoding ahead of time, or parse it from the Content-Type header. I looked at the response from the provided URL and it does not specify a charset, so you'll have to hardcode the known value.
EDIT: apparently you can do this with a reader, although I haven't tried it:
Reader r = new InputStreamReader(inputStream, "UTF-8");

How to download String file which contain special characters of slovenia

I am trying to download the json file which contains slovenian characters,While downloading json file as a string I am getting special character as specified below in json data
"send_mail": "Po�lji elektronsko sporocilo.",
"str_comments_likes": "Komentarji, v�ecki in mejniki",
Code which I am using
URL url = new URL(f_url[0]);
URLConnection conection = url.openConnection();
conection.connect();
try {
InputStream input1 = new BufferedInputStream(url.openStream(), 300);
String myData = "";
BufferedReader r = new BufferedReader(new InputStreamReader(input1));
StringBuilder totalValue = new StringBuilder();
String line;
while ((line = r.readLine()) != null) {
totalValue.append(line).append('\n');
}
input1.close();
String value = totalValue.toString();
Log.v("To Check Problem from http paramers", value);
} catch (Exception e) {
Log.v("Exception Character Isssue", "" + e.getMessage());
}
I want to know how to get characters downloaded properly.
You need to encode string bytes to UTF-8. Please check following code :
String slovenianJSON = new String(value.getBytes([Original Code]),"utf-8");
JSONObject newJSON = new JSONObject(reconstitutedJSONString);
String javaStringValue = newJSON.getString("content");
I hope it will help you!
Decoding line in while loop can work. Also you should add your connection in try catch block in case of IOException
URL url = new URL(f_url[0]);
try {
URLConnection conection = url.openConnection();
conection.connect();
InputStream input1 = new BufferedInputStream(url.openStream(), 300);
String myData = "";
BufferedReader r = new BufferedReader(new InputStreamReader(input1));
StringBuilder totalValue = new StringBuilder();
String line;
while ((line = r.readLine()) != null) {
line = URLEncoder.encode(line, "UTF8");
totalValue.append(line).append('\n');
}
input1.close();
String value = totalValue.toString();
Log.v("To Check Problem from http paramers", value);
} catch (Exception e) {
Log.v("Exception Character Isssue", "" + e.getMessage());
}
It's not entirely clear why you're not using Android's JSONObject class (and related classes). You can try this, however:
String str = new String(value.getBytes("ISO-8859-1"), "UTF-8");
But you really should use the JSON libraries rather than parsing yourself
When creating the InputStreamReader at this line:
BufferedReader r = new BufferedReader(new InputStreamReader(input1));
send the charset to the constructor like this:
BufferedReader r = new BufferedReader(new InputStreamReader(input1), Charset.forName("UTF_8"));
problem is in character set
as per Wikipedia Slovene alphabet supported by UTF-8,UTF-16, ISO/IEC 8859-2 (Latin-2). find which character set used in server, and use the same character set for encoding.
if it is UTF-8 encode like this
BufferedReader bufferedReader= new BufferedReader(new InputStreamReader(inputStream), Charset.forName("UTF_8"));
if you had deffrent character set use that.
I have faced same issue because of the swedish characters.
So i have used BufferedReader to resolved this issue. I have converted the Response using StandardCharsets.ISO_8859_1 and use that response. Please find my answer as below.
BufferedReader r = new BufferedReader(new InputStreamReader(response.body().byteStream(), StandardCharsets.ISO_8859_1));
StringBuilder total = new StringBuilder();
String line;
while ((line = r.readLine()) != null)
{
total.append(line).append('\n');
}
and use this total.toString() and assigned this response to my class.
I have used Retrofit for calling web service.
I finally found this way which worked for me
InputStream input1 = new BufferedInputStream(conection.getInputStream(), 300);
BufferedReader r = new BufferedReader(new InputStreamReader(input1, "Windows-1252"));
I figured out by this windows-1252, by putting json file in asset folder of the android application folder, where it showed same special characters like specified above,there it showed auto suggestion options to change encoding to UTF-8,ISO-8859-1,ASCII and Windows-1252, So I changed to windows-1252, which worked in android studio which i replicated the same in our code, which worked.

Incorrect special character on textview in android

I am trying to display a simple ñ (Special spanish) character on the Textview but instead of ñ it is displaying some junk character �. I have try many SOV solutions but didn't work for me.ñ is coming from SOAP web service.
Below is the code:
InputStream in = urlConnection.getInputStream();
SoapObject soapObject=Utility.InToSoapObject(in);
public static SoapObject InToSoapObject(InputStream inputStream) {
SoapObject soap = null;
SoapSerializationEnvelope envelope = new SoapSerializationEnvelope(
SoapEnvelope.VER12);
try {
XmlPullParser p = Xml.newPullParser();
p.setInput(inputStream, "utf-8");
envelope.parse(p);
soap = (SoapObject) envelope.bodyIn;
} catch (Exception e) {
e.printStackTrace();
}
return soap;
}
Few things that I have tried so far but didn't work for me
Replacing ñ with \u0148
(Html.fromHtml(str)
URLEncoder.encode(str)
Replace UTF-8 with iso-8859-1
I am extracting String character from SOAP in correct manner that I have cross checked. May be there is some issue with UTF-8. Any kind of help or suggestions will be appreciate. Thanks in advance
I had the same issue when I was extracting the String output from a webservice but when I replaced UTF-8 to ISO-8859-1, it got resolved. What I used was the following,
Converted the InputStream to BufferedReader using the ISO_8859-1 format and handled the resultant BufferedReader to convert as String.
private String getResponseString(InputStream stream) throws IOException {
BufferedReader reader = new BufferedReader(new InputStreamReader(stream,"ISO-8859-1"));
StringBuilder sb = new StringBuilder();
String line = null;
try {
while ((line = reader.readLine()) != null) {
sb.append(line + "\n");
}
}
finally {
stream.close();
}
return sb.toString();
}
Try-
String = URLEncoder.encode(string, "UTF-8");

How to parse HTML full page in android

I am calling a HTML page via a web servise . I need to get hole source code of HTML page.
My problem is that, when I convert the http response to string I am getting only some part of HTML page. How do I can get hole HTML page .Please help me.
//paramString1 = url,paramString = header, paramList = paramiters
public String a(String paramString1, String paramString2, List paramList)
{
String str1 = null;
HttpPost localHttpPost = new HttpPost(paramString1);
localHttpPost.addHeader("Accept-Encoding", "gzip");
InputStream localInputStream = null;
try
{
localHttpPost.setEntity(new UrlEncodedFormEntity(paramList));
localHttpPost.setHeader("Referer", paramString2);
HttpResponse localHttpResponse = this.c.execute(localHttpPost);
int i = localHttpResponse.getStatusLine().getStatusCode();
localInputStream = localHttpResponse.getEntity().getContent();
Header localHeader = localHttpResponse.getFirstHeader("Content-Encoding");
if ((localHeader != null) && (localHeader.getValue().equalsIgnoreCase("gzip")))
{
GZIPInputStream localObject = null;
localObject = new GZIPInputStream(localInputStream);
Log.d("API", "GZIP Response decoded!");
BufferedReader localBufferedReader = new BufferedReader(new InputStreamReader((InputStream)localObject, "UTF-8"));
StringBuilder localStringBuilder = new StringBuilder();
while(true){
String str2 = localBufferedReader.readLine();
if (str2 == null)
break;
localHttpResponse.getEntity().consumeContent();
str1 = localStringBuilder.toString();
localStringBuilder.append(str2);
continue;
}
}
}
catch (IOException localIOException)
{
localHttpPost.abort();
}
catch (Exception localException)
{
localHttpPost.abort();
}
Object localObject = localInputStream;
return (String)str1;
Are you receiving the HTML in the variable paramString1?, in that case, are you encoding the String somehow or its just plane HTML?
Maybe the HTML special characters are breaking your response. Try encoding the String with urlSafe Base64 in your server side, and decoding it in the client side:
You can use the function Base64 of Apache Commons.
Server Side:
Base64 encoder = new Base64(true);
encoder.encode(yourBytes);
Client side:
Base64 decoder = new Base64(true);
byte[] decodedBytes = decoder.decode(paramString1);
HttpPost localHttpPost = new HttpPost(new String(decodedBytes));
You may not get the complete source code in your stringBuilder as it must be exceeding the max size of stringBuilder as StringBuilder is set of arrays. If u want to store that particular sourcecode. You may try this: The inputStream (which contains html source code) data, store directly into a File. Then you will have complete source code in that file and then perform file operation to whatever you require. See if this may help you.

HttpUrlConnection.getInputStream returns empty stream in Android

I make a GET request to a server using HttpUrlConnection.
After connecting:
I get response code: 200
I get response message: OK
I get input stream, no exception thrown but:
in a standalone program I get the body of the response, as expected:
{"name":"my name","birthday":"01/01/1970","id":"100002215110084"}
in a android activity, the stream is empty (available() == 0), and thus I can't get
any text out.
Any hint or trail to follow? Thanks.
EDIT: here it is the code
Please note: I use import java.net.HttpURLConnection; This is the standard
http Java library. I don't want to use any other external library. In fact
I did have problems in android using the library httpclient from apache (some of their anonymous .class can't be used by the apk compiler).
Well, the code:
URLConnection theConnection;
theConnection = new URL("www.example.com?query=value").openConnection();
theConnection.setRequestProperty("Accept-Charset", "UTF-8");
HttpURLConnection httpConn = (HttpURLConnection) theConnection;
int responseCode = httpConn.getResponseCode();
String responseMessage = httpConn.getResponseMessage();
InputStream is = null;
if (responseCode >= 400) {
is = httpConn.getErrorStream();
} else {
is = httpConn.getInputStream();
}
String resp = responseCode + "\n" + responseMessage + "\n>" + Util.streamToString(is) + "<\n";
return resp;
I see:
200
OK
the body of the response
but only
200
OK
in android
Trying the code of Tomislav I've got the answer.
My function streamToString() used .available() to sense if there is any data received,
and it returns 0 in Android. Surely, I called it too soon.
If I rather use readLine():
class Util {
public static String streamToString(InputStream is) throws IOException {
StringBuilder sb = new StringBuilder();
BufferedReader rd = new BufferedReader(new InputStreamReader(is));
String line;
while ((line = rd.readLine()) != null) {
sb.append(line);
}
return sb.toString();
}
}
then, it waits for the data to arrive.
Thanks.
You can try with this code that will return response in String:
public String ReadHttpResponse(String url){
StringBuilder sb= new StringBuilder();
HttpClient client= new DefaultHttpClient();
HttpGet httpget = new HttpGet(url);
try {
HttpResponse response = client.execute(httpget);
StatusLine sl = response.getStatusLine();
int sc = sl.getStatusCode();
if (sc==200)
{
HttpEntity ent = response.getEntity();
InputStream inpst = ent.getContent();
BufferedReader rd= new BufferedReader(new InputStreamReader(inpst));
String line;
while ((line=rd.readLine())!=null)
{
sb.append(line);
}
}
else
{
Log.e("log_tag","I didn't get the response!");
}
} catch (ClientProtocolException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return sb.toString();
}
The Stream data may not be ready, so you should check in a loop that the data in the stream is available before attempting to access it.
Once the data is ready, you should read it and store in another place like a byte array; a binary stream object is a nice choice to read data as a byte array. The reason that a byte array is a better choice is because the data may be binary data like an image file, etc.
InputStream is = httpConnection.getInputStream();
byte[] bytes = null;
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] temp = new byte[is.available()];
while (is.read(temp, 0, temp.length) != -1) {
baos.write(temp);
temp = new byte[is.available()];
}
bytes = baos.toByteArray();
In the above code, bytes is the response as byte array. You can convert it to string if it is text data, for example data as utf-8 encoded text:
String text = new String(bytes, Charset.forName("utf-8"));

Categories

Resources