Hello I am trying to read a UTF-8 encoded txt files with Hebrew chars on my android application, and now after managing doing for some reason the 'a' char is always appended at the beginning of the String i read.. and I wonder why
Here is my code:
void Read(){
try {
File fileDir = new File("/sdcard/test.txt");
BufferedReader in = new BufferedReader( new InputStreamReader(
new FileInputStream(fileDir), "UTF8"));
String str;
while ((str = in.readLine()) != null) {
Log.i("TEST",str);
}
in.close();
}
catch (UnsupportedEncodingException e)
{
System.out.println(e.getMessage());
}
catch (IOException e)
{
System.out.println(e.getMessage());
}
catch (Exception e)
{
System.out.println(e.getMessage());
}
}
this is the result i get
05-15 01:53:25.269: INFO/TEST(16236): אבגדהוזחטיכלמנסעפצקרשתa
In order to get a better answer, I need two questions answered:
What is the exact code point of the character in question (your "a")?
What is the exact byte sequence in your file, around the questionable area?
I'm going to take a guess here: You say the character is the first thing in the file ("appended at the beginning of the String") and that you got back it's in the Arabic Presentation Forms B block. The last character of Arabic Presentation Forms B, which oddly has nothing to do with Arabic, is U+FFEF, or the byte order mark (BOM). It usually appears at the beginning of UTF-16 or UTF-32 encoded files, and identifies the "endianess" of the encoding (whether the file is UTF-16LE or UTF-16BE encoded, likewise for UTF-32). It typically does not appear, however, in UTF-8 data, as UTF-8 has no notion of "byte order". That said, some brain-dead Windows programs will stick it there, and then have an additional option of "UTF-8 without BOM". (The BOM is used then to identify a file as likely being encoded in UTF-8.) My guess is you have a BOM in your data, and your program is reading it and passing it on to you.
IF this is your problem, and your file is genuinely encoded in UTF-8, you should be able to find the following byte sequence near the beginning of the file: EF BB BF — this is the UTF-8 representation of U+FFEF.
Related
I need write PDF File, and I use this sample(http://www.vogella.com/tutorials/JavaPDF/article.html) with this version "itextpdf-5.4.1.jar".
This create the PDF file, but when the word has "você" write this "você".
I find this code but has not work:
Document document;
...
...
document.addLanguage("pt-BR");
How set encoding or language to Brasil?
Thanks!
Take a look at my answer to Divide page in 2 parts so we can fill each with different source (this is year another question answered in The Best iText Questions on StackOverflow). In this example, we read a series of text files that are stored in UTF-8. To achieve this, we use this method:
public Phrase createPhrase(String path) throws IOException {
Phrase p = new Phrase();
BufferedReader in = new BufferedReader(
new InputStreamReader(new FileInputStream(path), "UTF8"));
String str;
while ((str = in.readLine()) != null) {
p.add(str);
}
in.close();
return p;
}
If you remove the "UTF8" and if you read that text as if it were ASCII, then you'd get the same behavior you are describing in your question: each byte would be treated as a single character whereas you have characters that require two bytes.
This is not really an iText question. This is a pure encoding question.
I am working n in the meanwhile learning to code in android
I came across this mediastore which returns all the details about all the mp3 files stored on the card(Internal & External).
But this method is very very slow
I guess that's what has been implemented in the default music application
No wonder it sometimes fails to find out all the files...
I was thinking of implementing a faster search algorithm for this purpose, But am confused with the initial requirements of these algorithms
1: I thought of implementing the Binary search method (Divide n Conquer) to find files, but then the algorithm requires information about the number of files to be scanned.How do I get that?
2: I thought of implementing separate threads for each divided cluster.But then will it really work?
Plz help me in this!
the last question : How in the world does poweramp find out all the files so quickly,
My android has about 200 songs on the card, but this app only takes some seconds to get them all!!
Really puzzled!!
Try this code:
go to Download and DCIM folder there you can use this command to get all files and if you want to filter files then here is the link
Process process = null;
try {
process = Runtime.getRuntime().exec("ls -l");
} catch (IOException ez) {
ez.printStackTrace();
}
BufferedReader in = new BufferedReader(new InputStreamReader(process.getInputStream()));
StringBuilder total = new StringBuilder();
String line;
try {
while ((line = in.readLine()) != null) {
System.out.println("line: "+line.toString());
total.append(line);
}
} catch (IOException ex) {
ex.printStackTrace();
}
System.out.println("Command Output: "+total.toString());
I know this is old question but I am not satisfied with answers given before.
My question is how to display french letters (like- é,à etc.) using UTF-8 or any technique.
Currently these are displaying "?".
My code is:
InputStream is = null;
try {
if(CommonUtilities.prefs.getString("KEY_LOCALE", "fr").equalsIgnoreCase("fr")){
is = getAssets().open("terms_txt_fr.txt");
}else{
is = getAssets().open("terms_txt_en.txt");
}
String term_tx =convertStreamToString(is);
String valueUTF8 = new String(term_tx.getBytes(), "UTF-8");
terms_tx.setText(valueUTF8);
} catch (IOException e) {
e.printStackTrace();
}
You code is fine, but you need to make sure, your input is UTF-8 encoded.
There are a lot of methods how to do it. Each of them is specific to your environment like your operating system. A good summary is given here:
Best way to convert text files between character sets?
On Linux (I guess on OSX it's the same) you do:
iconv -f ISO-8859-15 -t UTF-8 in.txt > out.txt
Most good text editors have options for how to save files. You might want to check that too.
In modern operating systems UTF-8 should be the default.
I'm encountering an odd situation whereby strings that I load from my resource XML file that have Spanish characters in them display correctly in my TextViews, but strings that I'm fetching from a JSON file that I load via HTTP at runtime display the missing char [] boxes
ESPAÑOL for example, when embedded in my XML strings works fine, but when pulled from my JSON is rendered as SPAÃ[]OL, so the Ñ is transformed into a à and a missing char!
I'm not sure at what point I need to intercept these strings and set the correct encoding on them. The JSON text file itself is generated on the server via Node, so, I'm not entirely sure if that's the point at which I should be encoding it, or if I should be encoding the fileReader on the Android side, or perhaps setting the TextView itself to be of some special encoding type (I'm unaware that this is an option, just sort of throwing my hands in the air, really).
[EDIT]
As per ianhanniballake's suggestion I am logging and seeing that the screwy characters are actually showing up in the log as well. However, when I look at the JSON file with a text viewer on the Android file system (it's sitting on the SDCARD) it appears correct.
So, it turned out that the text file was, indeed, encoded correctly and the issue was that I wasn't setting UTF-8 as my encoding on the FileInputStream...
The solution is to read the file thusly:
static String readInput() {
StringBuffer buffer = new StringBuffer();
try {
FileInputStream fis = new FileInputStream("myfile.json");
InputStreamReader isr = new InputStreamReader(fis, "UTF8");
Reader in = new BufferedReader(isr);
int ch;
while ((ch = in.read()) > -1) {
buffer.append((char) ch);
}
in.close();
return buffer.toString();
} catch (IOException e) {
e.printStackTrace();
return null;
}
}
I am trying to read a CSV file and have it display the contents as a basic list for an Android app. I am using the method given by Kopfgeldjaeger in this thread.
I have added a couple of 'toasts' which either display 'success' or 'fail' at the bottom of the Android screen if the code managed (or didn't manage) to load the CSV file properly. See below:
try {
CSVReader reader = new CSVReader(new InputStreamReader(getAssets().open("file.csv")));
for(;;) {
next = reader.readNext();
if(next != null) {
list.add(next);
} else {
break;
}
}
Toast.makeText(getApplicationContext(), "SUCCESS",
Toast.LENGTH_SHORT).show();
} catch (IOException e) {
e.printStackTrace();
Toast.makeText(getApplicationContext(), "FAIL",
Toast.LENGTH_SHORT).show();
}
When I load the app, I get the 'SUCCESS' message, so all is well so far. Now, I'd like to see if I can load any of the data. In Kopfgeldjaeger's answer, it is suggested that I could access a string using the following code:
list.get(1)[1]
So, in order to check that it's worked, I try to generate another toast, as follows:
Toast.makeText(getApplicationContext(), list.get(1)[1],
Toast.LENGTH_SHORT).show();
This added toast causes the program to fail to load properly. The question is, have I gotten the toast syntax wrong, or is my CSV file not loading properly?
There are a couple of things to check:
Make sure your csv file has a size of at least 2 x 2 entries, otherwise retrieving the data from line index 1 and column index 1 won't work. For example, print or debug list.size() and list.get(0).length to see if they're both at least 2.
Confirm that your csv file is actually comma separated, and not e.g. semicolon separated. I have seen occassions where certain software seems to choose its own delimiter.
As a recommendation: the referenced csv reader is part of ByteCode's OpenCSV. You may want to include the latest source code or jar from that project. It supports custom delimiter characters and also provides a shorthand for parsing all the csv data into a list of string arrays:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
List myEntries = reader.readAll();