Why are foreign characters not read using inputStream? - android

I have a text file which contains data I need to preload into a SQLite database. I saved in in res/raw.
I read the whole file using readTxtFromRaw(), then I use the StringTokenizer class to process the file line by line.
However the String returned by readTxtFromRaw does not show foreign characters that are in the file. I need these as some of the text is Spanish or French. Am I missing something?
Code:
String fileCont = new String(readTxtFromRaw(R.raw.wordstext));
StringTokenizer myToken = new StringTokenizer(fileCont , "\t\n\r\f");
The readTxtFromRaw method is:
private String readTxtFromRaw(Integer rawResource) throws IOException
{
InputStream inputStream = mCtx.getResources().openRawResource(rawResource);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
int i = inputStream.read();
while (i != -1)
{
byteArrayOutputStream.write(i);
i = inputStream.read();
}
inputStream.close();
return byteArrayOutputStream.toString();
}
The file was created using Eclipse, and all characters appear fine in Eclipse.
Could this have something to do with Eclipse itself? I set a breakpoint and checked out myToken in the Watch window. I tried to manually replace the weird character for the correct one (for example í, or é), and it would not let me.

Have you checked the several encodings?
what's the encoding of your source file?
what's the encoding of your output stream?
the byteArrayOutputStream.toString() converts according to the platform's default character encoding. So I guess it will strip the foreign characters or convert them in a way that they are not displayed in your output.
Have you already tried to use byteArrayOutputStream.toString(String enc)? Try "UTF-8" or "iso-8859-1" or "UTF-16" for the encoding.

Related

Can I remove the symbol new line from base 64 encoded key file?

I have a base 64 encoded key file. If I open it by Text Editor, I see 4 lines like this:
Then I copy the text and paste to Android Studio, I see the symbol "\n" is generated as below:
This pubic key doesn't work. So I tried :
Remove all "\n" symbol. Still doesn't work.
Replace the "\n" symbol with the space " ". Again doesn't work.
Could you please show me where I am wrong?
Rather than pasting the contents of the file into a string, why not just copy the file itself into your assets folder. For example:
public String readPublicKeyFromFile() {
String publicKeyString; = "";
try {
InputStream is = getAssets().open("public_key.txt");
byte[] buffer = new byte[size];
is.read(buffer);
is.close();
// Convert the buffer into a string.
return new String(buffer);
} catch (IOException e) {
throw new RuntimeException(e);
}
return null;
}
Its android studio console character limitation that it shows long string in multiple lines.
Best way is to copy that string in any text editor(notepad) and make it single line string and then paste it to studio.
Another way is just delete that '\n' character from your string it will be single line string.
e.g.
private static final String = "abcdefgh" +
"ijklmnop" +
"qrstuvwxyz";
just remove '\n' character from your string.
If you creating the "publickey.txt" (base64) file, just use "Base64.NO_WRAP" flag for creating the file. This flag not allow the "\n" character.
By default it takes the "Base64.DEFAULT" flag, so every 64 characters after "\n" will be added automatically.
// for encoding the String with out \n
String base64Str=Base64.encode(your_string,Base64.NO_WRAP);
// for decoding
byte[] resByte=Base64.decode(base64Str,Base64.NO_WRAP);
// convert into String
String resStr=new String(resByte,"UTF-8");

Open raw text recourse and diplay wrong next line

InputStream inputStream = getResources().openRawResource(idtext);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
String myText = "";
int in;
try {
in = inputStream.read();
while (in != -1) {
byteArrayOutputStream.write(in);
in = inputStream.read();
}
inputStream.close();
myText = byteArrayOutputStream.toString();
} catch (IOException e) {
e.printStackTrace();
}
myTextView.setText(myText); `
My code is used to display long text file in raw res. I don't know why, but some of text file display wrong about next line, any help?
It would be good if you can share some sample display vs your expected output.
From the initial guess, it could be because of couple of reasons:
The encoding used in the text file. So, if you have written the text in ASCII and displaying it while using UTF-8 strings, it would mess up few things. This should be consistent.
Could be a good case of how line feeds are encoding in the file, like \r\n or just \n.
You can also try encapsulating your InputStream to FileReader and line reading streams which are more specialized in directly reading strings, rather than converting it.
You can probably use a library like Apache Commons IO to manage all the stuffs for you.

How to read/write a string encoded with android.util.Base64

I would like to store some strings in a simple .txt file and then read them, but when I want to encode them using Base64 it doesn't work anymore: it writes well but the reading doesn't work. ^^
The write method:
private void write() throws IOException {
String fileName = "/mnt/sdcard/test.txt";
File myFile = new File(fileName);
BufferedWriter bW = new BufferedWriter(new FileWriter(myFile, true));
// Write the string to the file
String test = "http://google.fr";
test = Base64.encodeToString(test.getBytes(), Base64.DEFAULT);
bW.write("here it comes");
bW.write(";");
bW.write(test);
bW.write(";");
bW.write("done");
bW.write("\r\n");
// save and close
bW.flush();
bW.close();
}
The read method :
private void read() throws IOException {
String fileName = "/mnt/sdcard/test.txt";
File myFile = new File(fileName);
FileInputStream fIn = new FileInputStream(myFile);
BufferedReader inBuff = new BufferedReader(new InputStreamReader(fIn));
String line = inBuff.readLine();
int i = 0;
ArrayList<List<String>> matrice_full = new ArrayList<List<String>>();
while (line != null) {
matrice_full.add(new ArrayList<String>());
String[] tokens = line.split(";");
String decode = tokens[1];
decode = new String(Base64.decode(decode, Base64.DEFAULT));
matrice_full.get(i).add(tokens[0]);
matrice_full.get(i).add(tokens[1]);
matrice_full.get(i).add(tokens[2]);
line = inBuff.readLine();
i++;
}
inBuff.close();
}
Any ideas why?
You have a couple of errors in your code.
First a couple of notes on your code:
When posting here, attaching a SSCCE helps others to debug your code. This is not a SSCEE because it doesn't compile. It lacks several defined variables, so one must guess what you really mean. Also you have pasted close-comment token in your code: */ but there is no one start-comment token.
Catching and just suppressing exceptions (like in catch-block in read method) is really bad idea unless you really know what you're doing. What it does most of the time is hide the potential problems from you. At least write the stacktrace of an exception is a catch block.
Why don't you just debug it, check what exactly outputs to the destination file? You should learn how to do that because that will speed up your development process, especially for larger projects with hard-to-catch problems.
Back to the solution:
Run the program. It throws an exception:
02-01 17:18:58.171: E/AndroidRuntime(24417): Caused by: java.lang.ArrayIndexOutOfBoundsException
caused by line here:
matrice_full.get(i).add(tokens[2]);
inspecting the variable tokens reveals that it has 2 elements, not 3.
So lets open the file generated by the write method. Doing that shows this output:
here it comes;aHR0cDovL2dvb2dsZS5mcg==
;done
here it comes;aHR0cDovL2dvb2dsZS5mcg==
;done
here it comes;aHR0cDovL2dvb2dsZS5mcg==
;done
Note line breaking here. This is because the Base64.encodeToString() appends additional newline at the end of the encoded string. To generate a one single line, without extra newlines, add Base64.NO_WRAP as the second parameter like this:
test = Base64.encodeToString(test.getBytes(), Base64.NO_WRAP);
Note here, you must delete file that was created earlier as it has improper line breaking.
Run the code again. It now creates a file with the proper contents:
here it comes;aHR0cDovL2dvb2dsZS5mcg==;done
here it comes;aHR0cDovL2dvb2dsZS5mcg==;done
Printing the output of matrice_full now gives:
[
[here it comes, aHR0cDovL2dvb2dsZS5mcg==, done],
[here it comes, aHR0cDovL2dvb2dsZS5mcg==, done]
]
Note that you're not doing anything with the value in decode variable in your code, hence the second element is the Base64 representation of that value which is read from the file.

Android: Character encoding raw resource files

I'm in the process of translating one of my apps to Spanish, and I'm having a character encoding problem with a raw HTML file I'm sticking into a WebView. I have the spanish translation of the file in my raw-es folder, and I'm reading it in with the following function:
private CharSequence getHtmlText(Activity activity) {
BufferedReader in = null;
try {
in = new BufferedReader(new InputStreamReader(getResources().openRawResource(R.raw.help), "utf-8"));
String line;
StringBuilder buffer = new StringBuilder();
while ((line = in.readLine()) != null) buffer.append(line).append('\n');
return buffer;
} catch (IOException e) {
return "";
} finally {
closeStream(in);
}
}
But everywhere there is a spanish character in the file, there is a diamond with a question mark inside of it when I run the app, and look at the activity that displays the HTML. I'm using the following to load the text into the WebView:
mWebView.loadData(text, "text/html", "utf-8");
I originally created the file in Microsoft Word, so I'm sure there is some sort of character encoding issue going on, but I'm not really sure how to fix it, and a Google search isn't helping. Any ideas?
Don't use loadData. Use loadDataWithBaseURL instead. You would say:
mWebView.loadDataWithBaseURL( null, text, "text/html", "utf-8", null );
I had a similar issue with a French translation where diamond symbols with question marks were appearing in place of certain characters, including those which I had escaped. I got around it by opening file properties in Eclipse and changing the encoding to "ISO-8859-1". Don't know if this would work for Spanish though.

Android: How to read a txt file which contains Chinese characters?

i have a txt file which contains many chinese characters, and the txt file is in the directory res/raw/test.txt. I want to read the file but somehow i can't make the chinese characters display correctly. Here is my code:
try {
InputStream inputstream = getResources().openRawResource(R.raw.test);
BufferedReader bReader = new BufferedReader(
new InputStreamReader(inputstream,Charset.forName("UTF-8")));
String line = null;
while ((line= bReader.readLine())!= null) {
Log.i("lolo", line);
System.out.println("here is some chinese character 这是一些中文字");
}
} catch (IOException e) {
e.printStackTrace();
}
Both Log.i("lolo", line); and System.out.println("here is some chinese character 这是一些中文字") don't show characters correctly, i can not even see the chinese characters in the println() method.
What can i do to fix this problem? Can anybody help me?
In order to correctly handle non-ASCII characters such as UTF-8 multi-byte characters, it's important to understand how these characters are encoded and displayed.
Your console (output screen) may not support the display of non-ASCII characters. If that's the case, your UTF-8 characters will be displayed as garbage. Sometimes, you will be able to change the character encoding on the console. Sometimes not.
Even if the console correctly displayed UTF-8 characters, it's possible that your string does not correctly store the Chinese characters. You may think that it's correct because your editor displays them, but ensure that the character encoding of your editor also supports UTF-8.
I also was trying to figure out that. First you need to open the .txt file with the notepad and then click on File->Save as, there you will see a dropdown menu that says Enconding, so change it to UTF-8. After saving the file you should remove the .txt extension to the file and then add the file to the path res/raw and then you can refer to it from the code as R.raw.txtFileName.
That's all, i will put my code where I used the chinese characters and I could show them in the emulator.
If you have any other question, let me know because i am also developing something related with characters. Here is the code:
public List<String> getWords() {
List<String> contents = new ArrayList<String>();
try {
InputStream inputStream = getResources().openRawResource(R.raw.chardb);
BufferedReader input = new BufferedReader(new InputStreamReader(inputStream,Charset.forName("UTF-8")));
try {
String line = null;
while (( line = input.readLine()) != null){
contents.add(line);
}
}
finally {
input.close();
}
}
catch (IOException ex){
ex.printStackTrace();
}
return contents;
}

Categories

Resources