Reading Windows Unicode files on Android

Reading Windows Unicode files on Android - android

I just found out that Android can correctly read in a file which is encoded using Windows ANSI (or the so-called multi-byte encoding) and convert it to Java Unicode strings. But it fails when reading a Unicode file. It seems that Android is reading it in a byte-by-byte fashion. A Unicode string "ABC" in the file would be read in to a Java String of length 6, and the characters are 0x41, 0x00, 0x42, 0x00, 0x43, 0x00.
BufferedReader in = new BufferedReader(new FileReader(pathname));
String str = in.readLine();
Please, is there a way to read Windows Unicode files correctly on Android? Thank you.
[Edited]
Experiements: I saved two Chinese characters "難哪" in two Windows text files:
ANSI.txt -- C3 F8 AD FE
UNICODE.txt -- FF FE E3 96 EA 54
Then I put these files to Emulator's SD card, and I used the following program to read them in: (Notice that the locale of the Emulator has already been set to zh_TW).
BufferedReader in = new BufferedReader(new FileReader("/sdcard/ANSI.txt"));
String szLine = in.readLine();
int n = szLine.length(), j, i;
in.close();
for (i = 0; i < n; i++)
j = szLine.charAt(i);
Here is what I saw on the Emulator:
ANSI.txt -- FFFD FFFD FFFD
UNICODE.txt -- FFFD FFFD FFFD FFFD 0084
Apparantly Android (or Java) is unable to properly decode the Chinese characters. So, how do I do this? Thank you in advance.

The FileReader apparently assumes that the encoding will be ASCII-compatible. (Could expect UTF-8 or any of the older ASCII extensions).
Also, it is not a "Unicode file" - it is an "UTF-16 encoded file".
You will have to use a StreamReader and specify the encoding yourself:
BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(pathname), "UTF-16LE"));
You should also really read that article - it seems to me that there is a lot that you misunderstand about character sets and encoding.

You can try following code.
Normally Window base Ascii file that within the chinese words
may not be correct process under android system.
It's normally default to use the UTF8 format in stream Process.
Once you place a Window base Ascii file that within chinese words into Android system.
the normal stream process can't correct recognize the part of chinese.
following code, can correct parser String from Window Base Acsii text file that within chinese words that put at Android System SD or Asset folder.
It's very simple just Use "BIG5" format decoder , at InputStreamReader Ojbect.
I have been verified. It's working well. Try it !! FYI. KNC.
String pathname="AAA.txt";
BufferedReader inBR;
inBR = new BufferedReader(new InputStreamReader(new FileInputStream(pathname), "BIG5"));
String sData="";
while ((sData = inBR.readLine()) != null) {
System.out.println(sData);
}

A Unicode string "ABC" in the file would be read in to a Java String of length 6, and the characters are 0x41, 0x00, 0x42, 0x00, 0x43, 0x00.
How are you getting the length? What you have described is absolutely correct for a Java String. Java strings are UTF-16 (i.e., Unicode). This means that ABC will be stored in a Java string exactly as you describe (0x41, 0x00, 0x42, 0x00, 0x43, 0x00).
The String 'length', however, as returned by int String.length() will be 3 even though it is 6 bytes long.

Related

Android how add cyrillic symbol in exif file

How add cyrillic symbol in exif file?
My code alwais added symbol "?".
This code:
String userComment = "АБВГДЕЁЖЗИЙКЛСНОПРСТУФХЦЧШЩЪЫЬЭЮЯ";
exifInterface.setAttribute("UserComment", userComment );
exifInterface.saveAttributes();
or
String userComment = "АБВГДЕЁЖЗИЙКЛСНОПРСТУФХЦЧШЩЪЫЬЭЮЯ";
exifInterface.setAttribute("UserComment", new String(userComment.getBytes(), "UTF-8"));
exifInterface.saveAttributes();

Think that ExifInterface class does not support this cyrillic.
I tried to get the bytes from that tag with
byte [] bytes1 = exifInterface.getAttribute("UserComment").getBytes();
byte [] bytes2 = exifInterface.getAttribute("UserComment").getBytes("utf-8");
// byte [] bytes3 = exifInterface.getAttribute("UserComment").getBytes("utf-16");
// byte [] bytes3 = exifInterface.getAttribute("UserComment").getBytes("ISO-8859-1");
byte [] bytes3 = exifInterface.getAttribute("UserComment").getBytes("windows-1252");
And then displayed the bytes in hexadecimal notation. They were all rubbish.
There were 66 bytes for your 33 characters. Dont know which encoding is used.
I wanted to compare them with the bytes of your alphabet string.
Also tried compiling for Andoid 7 but all the same.
I give up ;-).

"UserComment" tag in Exif support ASCII or Unicode.
Unfortunately, Android's ExifInterface only use ASCII to write or read the tag.
So, cyrillic symbol is not support by Android's ExifInterface.
But this lib may help you:
https://github.com/ddyos/UnicodeExifInterface

Android write PDF file whith itextpdf pt-BR language

I need write PDF File, and I use this sample(http://www.vogella.com/tutorials/JavaPDF/article.html) with this version "itextpdf-5.4.1.jar".
This create the PDF file, but when the word has "você" write this "vocÃª".
I find this code but has not work:
Document document;
...
...
document.addLanguage("pt-BR");
How set encoding or language to Brasil?
Thanks!

Take a look at my answer to Divide page in 2 parts so we can fill each with different source (this is year another question answered in The Best iText Questions on StackOverflow). In this example, we read a series of text files that are stored in UTF-8. To achieve this, we use this method:
public Phrase createPhrase(String path) throws IOException {
Phrase p = new Phrase();
BufferedReader in = new BufferedReader(
new InputStreamReader(new FileInputStream(path), "UTF8"));
String str;
while ((str = in.readLine()) != null) {
p.add(str);
}
in.close();
return p;
}
If you remove the "UTF8" and if you read that text as if it were ASCII, then you'd get the same behavior you are describing in your question: each byte would be treated as a single character whereas you have characters that require two bytes.
This is not really an iText question. This is a pure encoding question.

Use IPP(Internet Printing Protocol) or LPR(Line printer Remote) to print file in android

My requirement is to print a file from an android device without using any cloud based service.
I have been able to achieve it using "Raw" print protocol i.e by simply sending the file to printer's IP address at Port 9100. Here is the code snippet for that:
client = new Socket(ip,port); //Port is 9100
byte[] mybytearray = new byte[(int) file.length()]; //create a byte array to file
fileInputStream = new FileInputStream(file);
bufferedInputStream = new BufferedInputStream(fileInputStream);
bufferedInputStream.read(mybytearray, 0, mybytearray.length); //read the file
outputStream = client.getOutputStream();
outputStream.write(mybytearray, 0, mybytearray.length); //write file to the output stream byte by byte
outputStream.flush();
bufferedInputStream.close();
outputStream.close();
The problem with "Raw" printing protocol is that there is no way to get the status back from the printer.
So, I recently read about IPP and LDR using which we can get the status back from printer.
I have tried to find a way to implement them using android but had no success. I have already went through this answer but had no success in finding my solution.
It will be really helpful if someone can guide me on how to implement IPP or LDR in android.
Thanks in advance!

General usage of IPP:
Once a print job has been submitted the printer returns a job-id
Use the Get-Job-Attributes-Operation in order to get the current job-state
Wait until the attribute job-state equals to 9 (means 'completed')
There are other final job-states you should check for: aborted or canceled
For prototyping you could use the ipptool (native for desktop usage):
# ipptool -t -d job=482 ipp://192.168.2.113/ipp job.ipp
{
OPERATION Get-Job-Attributes
GROUP operation-attributes-tag
ATTR charset attributes-charset utf-8
ATTR language attributes-natural-language en
ATTR uri printer-uri $uri
ATTR integer job-id $job
}
Update 5/2020
I have published a kotlin implementation of the ipp protocol.
https://github.com/gmuth/ipp-client-kotlin
Once submitted you can wait for the print job to terminate: job.waitForTermination()

Using Base64 encoded characters as filename in Android app

I'm trying to generate filenames in my Android app from a 4 byte byte array. I'm Base64 encoding the byte array with the URL_SAFE option. However, the generated string seems to end with a newline character, which makes it unusable as a filename. Is there anyway to remove the newline?
My code is as follows:
byte[] myByteArray = new byte[4];
myByteArray = generateBytes(myByteArray); // fills the byte array with some data
final String byteString = Base64.encodeToString(myByteArray, Base64.URL_SAFE);
After some googling, I found out that in Android, Base64 encoding automatically inserts a newline after the string, and that using the NO_WRAP flag would solve this. However, is the NO_WRAP flag generated output filename safe?
Thanks.

OK, turns out I can use (Base64.URL_SAFE | Base64.NO_WRAP) to apply both flags.

How do I go about setting TextView to display UTF-8 when the String is not an embedded Resource?

I'm encountering an odd situation whereby strings that I load from my resource XML file that have Spanish characters in them display correctly in my TextViews, but strings that I'm fetching from a JSON file that I load via HTTP at runtime display the missing char [] boxes
ESPAÑOL for example, when embedded in my XML strings works fine, but when pulled from my JSON is rendered as SPAÃ[]OL, so the Ñ is transformed into a Ã and a missing char!
I'm not sure at what point I need to intercept these strings and set the correct encoding on them. The JSON text file itself is generated on the server via Node, so, I'm not entirely sure if that's the point at which I should be encoding it, or if I should be encoding the fileReader on the Android side, or perhaps setting the TextView itself to be of some special encoding type (I'm unaware that this is an option, just sort of throwing my hands in the air, really).
[EDIT]
As per ianhanniballake's suggestion I am logging and seeing that the screwy characters are actually showing up in the log as well. However, when I look at the JSON file with a text viewer on the Android file system (it's sitting on the SDCARD) it appears correct.

So, it turned out that the text file was, indeed, encoded correctly and the issue was that I wasn't setting UTF-8 as my encoding on the FileInputStream...
The solution is to read the file thusly:
static String readInput() {
StringBuffer buffer = new StringBuffer();
try {
FileInputStream fis = new FileInputStream("myfile.json");
InputStreamReader isr = new InputStreamReader(fis, "UTF8");
Reader in = new BufferedReader(isr);
int ch;
while ((ch = in.read()) > -1) {
buffer.append((char) ch);
}
in.close();
return buffer.toString();
} catch (IOException e) {
e.printStackTrace();
return null;
}
}

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.

Reading Windows Unicode files on Android - android

Related

Android how add cyrillic symbol in exif file

Android write PDF file whith itextpdf pt-BR language

Use IPP(Internet Printing Protocol) or LPR(Line printer Remote) to print file in android

Using Base64 encoded characters as filename in Android app

How do I go about setting TextView to display UTF-8 when the String is not an embedded Resource?

Categories

Resources