XMLPullParser black diamond question marks with certain characters - android

I'm making an android app, that needs to fetch and parse XML. The class for that was made following the instructions from here http://www.tutorialspoint.com/android/android_rss_reader.htm and the fetcher method looks like this:
public void fetchXML() {
Thread thread = new Thread(new Runnable() {
#Override
public void run() {
try {
URL url = new URL(urlString);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setReadTimeout(10000 /* milliseconds */);
conn.setConnectTimeout(15000 /* milliseconds */);
conn.setRequestMethod("GET");
conn.setDoInput(true);
// Starts the query
conn.connect();
InputStream stream = conn.getInputStream();
xmlFactoryObject = XmlPullParserFactory.newInstance();
xmlFactoryObject.setValidating(false);
xmlFactoryObject.setFeature(Xml.FEATURE_RELAXED, true);
xmlFactoryObject.setNamespaceAware(true);
XmlPullParser myparser = xmlFactoryObject.newPullParser();
//myparser.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES, false);
myparser.setInput(new InputStreamReader(stream, "UTF-8"));
parseXMLAndStoreIt(myparser);
stream.close();
} catch (Exception e) {
e.printStackTrace();
}
}
});
thread.start();
}
Parser looks like the one in tutorial, with my parsing logic in it.
As you can see from
myparser.setInput(new InputStreamReader(stream, "UTF-8"));
I'm using UTF-8 charset. Now when I use getText() method in my parser for example on the word 'Jõhvi', the logcat output is 'J�hvi'. It's the same for other characters of my native language, Estonian, that aren't in English alphabet. I need to use this string as a key and in the user interface, so this isn't acceptable. I'm thinking it's a charset problem, but there is no info at the XML site I'm pulling this from and using
conn.getContentEncoding()
returns null so I'm in the dark here.

Content encoding and character encoding are not the same thing.
Content encoding refers to compression such as gzip. Since getContentEncoding() is null, that tells you there's no compression.
You should be looking at conn.getContentType(), because the character encoding can usually be found in the content-type response header.
conn.getContentType() might return something like:
text/xml; charset=ISO-8859-1
so you will have to do some parsing. Look for the character set name after "charset=" but be prepared for the case where the mime type is specified but the charset is not.

Related

Content length is not found in this URL

String thisurl ="http://songolum.com/file/ppdVkTxqhwcJu-CAzCgtNeICMi8mHZnKQBgnksb5o2Q/Ed%2BSheeran%2B-%2BPerfect.mp3?r=idz&dl=311&ref=ed-sheran-perfect";
url = null;
try {
url = new URL(thisurl);
HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
try {
// urlConnection.setRequestProperty("Transfer-Encoding", "chunked");
urlConnection.setRequestProperty("Accept-Encoding", "identity");
urlConnection.setDoOutput(true);
urlConnection.setChunkedStreamingMode(0);
int l=0;
InputStream in = new BufferedInputStream(urlConnection.getInputStream());
while(in.read()!=-1)
{
l=l+in.read();
}
System.out.println("Content-length" +l);
**I checked with other software and I found it's gzip compressed file and its with 10mb and I'm getting almost 1mb **
To answer your question directly, you were going wrong because you were calling read() twice, and also because you were adding together the values of each byte read, instead of counting them. InputStream.read() reads one byte and returns its value, or -1 on EOF. You need to read a number of bytes into a buffer and count how many bytes each read() call returned:
InputStream in = urlConnection.getInputStream();
byte[] buffer = new byte[4096];
int countBytesRead;
while((countBytesRead = in.read(buffer)) != -1) {
l += countBytesRead;
}
System.out.println("Content-length: " + l);
However, I suspect that this is not really what you need to do. The above code will simply return the size of all content in the response, including the HTTP headers and the content. Perhaps what you are looking for is the length of the document (or the file to be downloaded). You can use the Content-length HTTP header for that purpose (see other SO questions for how to get HTTP headers).
Also, note that the content may or may not be gzip-compressed. It depends on what the HTTP request says it accepts.
Please try this one hope so it will be helpful for you.
Using a HEAD request, i got my webserver to reply with the correct content-length field which otherwise was wrong. I don't know if this works in general but in my case it does:
private int tryGetFileSize(URL url) {
HttpURLConnection conn = null;
try {
conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("HEAD");
conn.getInputStream();
return conn.getContentLength();
} catch (IOException e) {
return -1;
} finally {
conn.disconnect();
}
}

httpurlconnection and special characters

i can send with my app a http post.
Problem is, that the special characters like ä, ö , ... will not be correct.
this is my code:
#Override
protected String doInBackground(String... params) {
try {
URL url = new URL("https://xxx");
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
String urlParameters = "&name" + name;
connection.setRequestMethod("POST");
connection.setRequestProperty("Accept-Charset", "UTF-8");
connection.setDoOutput(true);
DataOutputStream dStream = new DataOutputStream(connection.getOutputStream());
dStream.writeBytes(urlParameters);
dStream.flush();
dStream.close();
} catch (MalformedURLException e) {
Log.e("-->", Log.getStackTraceString(e));
} catch (IOException e) {
Log.e("-->", Log.getStackTraceString(e));
}
return null;
}
This http post will send to a php file, which saves the value name into a database.
Example:
The App send the value "Getränke"
Result in the database: "Getr"
where is my mistake?
Try this it may be help to you.
You need to set the encoding in your Content-Type header.
Set it to application/x-www-form-urlencoded; charset=utf-8. instead of
Accept-Charset.
The internal string representation in Java is always UTF-16.
Every encoding has to be done when strings enter or leave the VM. That means in your case you have to set the encoding when you write the body to the stream.
Try with following:
dStream.writeBytes(urlParameters.getBytes("UTF-8"));
Also you may need to set the encoding in your Content-Type header.
Set it to "application/x-www-form-urlencoded; charset=utf-8".
Currently you are only setting the Accept-Charset - this tells the server what to send back.

how to send unicode charcters in http post request from asyncTask

I saw this post How to send unicode characters in an HttpPost on Android but I usaully do request in this way in AsyncTask class.My log is also printing local language in urlParameters but server is returning no result while it is perfect for english Strings:
#Override
protected String doInBackground(String... URLs) {
StringBuffer response = new StringBuffer();
try {
URL obj = new URL(URLs[0]);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
// add request header
con.setRequestMethod("POST");
if (URLs[0].equals(URLHelper.get_preleases)) {
urlCall = 1;
} else
urlCall = 2;
// String urlParameters = "longitude=" + longitude + "&latitude="+latitude;
// Send post request
con.setDoOutput(true);
DataOutputStream wr = new DataOutputStream(con.getOutputStream());
wr.writeBytes(urlParameters);
wr.flush();
wr.close();
int responseCode = con.getResponseCode();
if (responseCode == 200) {
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
}
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return response.toString();
}
Is there a way to set character set UTF-8 to request parameters coding this way?
String urlParameters = "longitude=" + longitude + "&latitude="+latitude;
You need to URL-encode components you are injecting into an application/x-www-form-urlencoded context. (Even aside from non-ASCII characters, characters like the ampersand will break otherwise.)
Specify the string-to-bytes encoding that you are using for your request in that call, for example:
String urlParameters = "longitude=" + URLEncoder.encode(longitude, "UTF-8")...
...
DataOutputStream wr = new DataOutputStream(con.getOutputStream());
A DataOutputStream is for sending struct-like Java-typed binary data down a stream. It doesn't give you anything you need for writing HTTP request bodies. Maybe you meant OutputStreamWriter?
But since you already have the string all in memory you could simply do:
con.getOutputStream().write(urlParameters.getBytes("UTF-8"))
(Note the UTF-8 here is somewhat superfluous. Because you will already have URL-encoded all the non-ASCII characters into %xx escapes, there will be nothing to UTF-8-encoded. However in general it is almost always better to specify a particular encoding than omit it and revert to the unreliable system default encoding.)
new InputStreamReader(con.getInputStream())
is also omitting the encoding and reverting to the default encoding which is probably not the encoding of the response. So you will probably find non-ASCII characters get read incorrectly in the response too.

DownloadWebpageTask only handles webpage content smaller than 4048 charaters? (Android)

I'm following the tutorial to download content from webpage. http://developer.android.com/training/basics/network-ops/connecting.html#download (code is copied below so you don't have to go to this link)
It use len = 500 in this example and I change it to big value such as 50000 but while experimenting I realize this method will only download the first 4048 characters of a webpage no matter how large I set len to be. So I'm wondering if I should use another method to download web content.
Actually I'm not downloading normal webpage, I've put a php script on my server to search in my database then encode a json array as the content of the page, it's not very large, about 20,000 characters..
Main codes from the above link:
// Given a URL, establishes an HttpUrlConnection and retrieves
// the web page content as a InputStream, which it returns as
// a string.
private String downloadUrl(String myurl) throws IOException {
InputStream is = null;
// Only display the first 500 characters of the retrieved
// web page content.
int len = 500; // I've change this to 50000
try {
URL url = new URL(myurl);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setReadTimeout(10000 /* milliseconds */);
conn.setConnectTimeout(15000 /* milliseconds */);
conn.setRequestMethod("GET");
conn.setDoInput(true);
// Starts the query
conn.connect();
int response = conn.getResponseCode();
Log.d(DEBUG_TAG, "The response is: " + response);
is = conn.getInputStream();
// Convert the InputStream into a string
String contentAsString = readIt(is, len);
return contentAsString;
// Makes sure that the InputStream is closed after the app is
// finished using it.
} finally {
if (is != null) {
is.close();
}
}
}
// Reads an InputStream and converts it to a String.
public String readIt(InputStream stream, int len) throws IOException, UnsupportedEncodingException {
Reader reader = null;
reader = new InputStreamReader(stream, "UTF-8");
char[] buffer = new char[len];
reader.read(buffer);
return new String(buffer);
}
Are you sure it's not just LogCat truncating the message?
(Android - Set max length of logcat messages)
Try:
Printing out line by line in your readIt method
Doing this (Displaying More string on Logcat)
Saving to SD card and looking at the file
Actually doing what you want to do with it (put it in a TextView or whatever)

OutputStreamWriter's flush method throws IOException when trying to write chinese characters

Below is the code I am using to send SOAP requests in my Android app and it works fine with all requests except one. This code throws IOException : Content-length exceeded on wr.flush(); when there are chinese characters in requestBody variable.
The content-length in that case is 409
URL url = new URL(Constants.HOST_NAME);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
// Modify connection settings
connection.setRequestMethod("POST");
connection.setRequestProperty("Content-Type", "text/xml; charset=utf-8");
connection.setRequestProperty("SOAPAction", soapAction);
String requestBody = new String(soapRequest.getBytes(),"UTF-8");
int lngth = requestBody.length();
connection.setRequestProperty("Content-Length", (""+lngth));
// Enable reading and writing through this connection
connection.setDoInput(true);
connection.setDoOutput(true);
// Connect to server
connection.connect();
OutputStreamWriter wr = new OutputStreamWriter(connection.getOutputStream(), "UTF-8");
wr.write(requestBody);
wr.flush();
wr.close();
Any clue what is going wrong when there are chinese characters in the string?
EDIT: I have removed the 'content-lenght' header field and it works, but why?
This code sets the request's Content-Length property to the number of characters in the string representation of the message:
String requestBody = new String(soapRequest.getBytes(),"UTF-8");
int lngth = requestBody.length();
connection.setRequestProperty("Content-Length", (""+lngth));
But then you convert that string representation back to bytes before writing:
OutputStreamWriter wr = new OutputStreamWriter(connection.getOutputStream(), "UTF-8");
So you end up writing more bytes then you've claimed. You'll run into the same problem with any non-ASCII characters. Instead, you should do something like this (copy-and-paste, so may have syntax errors):
byte[] message = soapRequest.getBytes();
int lngth = message.length;
connection.setRequestProperty("Content-Length", (""+lngth));
// ...
connection.getOutputStream().write(message);
To simplify the other answer: Content-Length MUST be length in bytes, and you are specifying length in chars (Java's 16-bit char type). These are different, in general. Since UTF-8 is a variable-byte-length encoding, there is difference for anything beyond basic 7-bit ASCII range. The other answer shows proper way to write code.
My guess is that you have not converted the chinese to utf-8. If you support users entering doublewide and extended character sets into your fields, you'll need to make sure to convert your inputs from those character sets (ASCII, UNICODE or UCS) to UTF-8.
Once you determine the character encodings you are working with, you can use something like:
FileInputStream(inputFile), "inputencoding");
Writer output = new OutputStreamWriter(new FileOutputStream(outputFile), "outputencoding");
Reference
when creating your streams for reading/writing to convert between two.
Another alternative is to look into setting the request property controlling the language of the http request. I do not know much about that.

Categories

Resources