Which is the charset for characters ́ ̂? - android

When i save a pdf file in my android from my application.
I use Zip4J Library to unzip the file that i download, and then, save the pdf file in android folder.
The pdf files that have accents (special characters) appears like:
é = ╠ü
â = ╠é
and so on.
You know what is the charset of these files and why are they been saved like this?

Looks like UTF-8 octets displayed as DOS code page 850:
╠ is 0xcc in cp850
ü is 0x81 in cp850
UTF-8 octets cc 81 are combining acute accent and if preceded by e, it will make it é
Similarly the ̂ makes up to a combining circumflex accent.
So the data seems to be saved all right as UTF-8. You are just displaying it incorrectly.

Related

Converting .m3u playlists from Windows for Android media players using Notepad++

Winamp saves playlists that are saved in same folder as the music as relative paths for Windows, but copying and pasting into Android doesn't work unless I convert it to Linux relative paths. So
#EXTM3U
#EXTINF:262,Corona - Rhythm Of The Night
Unsorted\Corona - Rhythm Of The Night.mp3
#EXTINF:324,The B-52's - Love Shack
The B-52's - Love Shack.mp3
needs conversion to
#EXTM3U
#EXTINF:262,Corona - Rhythm Of The Night
./Unsorted/Corona - Rhythm Of The Night.mp3
#EXTINF:324,The B-52's - Love Shack
./The B-52's - Love Shack.mp3
for VLC Player on Android to read the playlist properly.
Well, figuring out how to convert \ to / on Notepad++ without regular expressions enabled was easy enough, but I'm too new at regex to get a grip on how to even read the table of contents on its guides even though all I want to do after that is to add ./ to the start of every odd line after the first line.
You may use
(?:.*\R){2}\K
and replace with ./.
Details
(?:.*\R){2} - two consecutive occurrences ({2}) of any 0+ chars other than line break chars, as many as possible (.*),
\K - match reset operator discarding all text matched so far from the match buffer.
The replacement is ./, i.e. it is inserted at the end of the match.
The method I use is, open playlist in VLC for windows; then in VLC:
[Menu] Media
[select] {Save Playlist to File}
[edit] {File name}
[select] {Save as type} M3U playlist (*.m3u)
[Enter]
Then I open in Notepad++, and do the following two search & replaces:
1) Change line termination characters
Find -- "CRLF"
\r\n
Replace with {Extended} -- "LF"
\n
[Select] {Search Mode} Extended
[Select] {Replace All}
2) Change/replace path separation characters
Find -- ASCII Encoded "\"'s -- value:
%5C
Replace with -- ASCII Encoded "/"'s -- value:
%2F
[Select] {Search Mode} Normal
[Select] {Replace All}
You do not need to use regular expressions for this.
Just replace "/" with "\", it's that simple.
Then, you need to specify the correct path by removing the previous android path, which may be the internal memory or SD card to match your directory tree
In the example in the image above, the playlist must be in the same directory as the songs (the folder structure) on android to work, as the player will search for the songs in the current directory symbolized by ". \"
For the playlist to work anywhere it will be run inside windows, you need to replace it with the full path, as in the image below

About Android webview file upload

I am using webview in android app, the webview will load a html page contains file upload control.I override the "openFileChooser" function, so I can select file in webview,but after I choose file and return to webview, the file name while change to url-encoded format(If the file name isn't ascii character) ?
There's nothing you can (or should) do here.
The operating system handles these things, i.e. placing the selected file in the value attribute of the <input type="file">.
But there's nothing wrong with these URL-encoded names. You shouldn't rely on the filenames from the client side, anyway.
What you should use is only the file's content and maybe its size and extension as reported.

Android replaces certain characters in the filename with ASCII hex value

I'm creating files in my app with '{' and '}' in the filename (e.g. {foo}.xml). However, the special characters are being replaced with the ASCII hex values instead (i.e. {foo}.xml is created as %7Bfoo%7D.xml). Any thoughts on how to get around this and have it actually create '{foo}.xml' ?
new File("/sdcard/{file name}.xml").createNewFile() successfully creates a file with name "{file name}.xml" on SD card in my case.
new File(context.getFilesDir() + "/{file name}.xml").createNewFile() successfully creates such file in private app files area.
I checked that all right using ADB shell and file explorer.

Base64 special characters new File

I'm working in Android, developping an app in which I'm uploading files to dropbox. As i don't want the title of this files to be seen, i'm encrypting them and the enccoding the result bytearray. The problem is that when you use the sentences:
String fileNameEncrypted = Base64.encodeToString(encrypted, Base64.DEFAULT);
File file = new File(mDirectoryPath + "/" + fileNameEncrypted);
The string "fileNameEncrypted" contains forward and back slashes and maybe other characters that are not allowed for a file name. Besides, the forward slashes are confused with subfolders.
How could I solve this problem?
PS: my goal is the filename can't be read in the dropbox app.
[EDIT the whole message according to comments]
Because base64 encode use special char (/) and lower/upper case char, it's seems to not be very compliant with filename for some OS like windows. Where file "aaa.txt" is equals to "AAA.txt".
Even the safe mode of base64 use lower and upper case charset.
The ASCII hex format (base16) provides a more compliant charset 0-9 A-F for store byte array
the char 'A' = 0x41 in base16. You can wrote this as "41"
A more complete example
"test.txt" can be translate to : 746573742E747874
If you need to really hide the name you can combine the encoding with a hash function. Because hash is a one way function you will definitely hide the filename, but you will not be capable to recover the real name from this.
If you need a two way function you can use a simple crypto method like aes with a internal key
You can use the Guava library to perform the transformation on base16 or base32 who has a more compliant charset than base64 for windows.
byte[] encrypted = "test.txt".getBytes();
BaseEncoding encoder = BaseEncoding.base16().lowerCase();
String newFilename = encoder.encode(encrypted);
If you want to use base32 juste change the encoder.
You can use the base64 encoder in filename safe mode with
Base64.encodeToString(encrypted, Base64.URL_SAFE)
Documentation:
Encoder/decoder flag bit to indicate using the "URL and filename safe" variant of Base64 (see RFC 3548 section 4) where - and _ are used in place of + and /.

iText as text Extracting/Reading from PDF on android

I'm having a problem with iText. Other people say that iText is for PDF Creation only? and it can not read or extract text from a PDF. is that true?
If it is true then what are other options i can choose to EXTRACT text from PDF File and Save it on a Variable or Display it in Android device?
If iText is capable of Extracting text from PDF, then HOW?
iText can extract text from PDFs. While it is true that it originated as a tool to create new and manipulate existing PDFs, it in the recent years also has become better and better at extracting text. This obviously implies that you should use a current iText version (5.3.x) for text extraction.
The book "iText in Action, second edition" by the main iText developer, Bruno Lowagie, explains basic iText text extraction in chapter 15, and the samples from that chapter are available in the iText Sourceforge SVN repository, cf. Samples for chapter 15. A good starting point is ExtractPageContentSorted2 which extracts the text of a whole page.
If you have special requirements, you may use ExtractPageContentSorted1 as a starting point which explicitly defines a text extraction strategy; depending on your requirements you will need your own startegy. If you want the text from a specific region only, look at ExtractPageContentArea.
To really fine tune the text extraction capabilities of iText, you should have a look at the itext-question mailing list archive (e.g. at nabble.com) as recently the iText text extraction API was extended to serve additional use cases.
Use below code to extract text from pdf :
String pat = data.getData().getPath();
File f = new File(pat);
//f is file path of pdf file
read = new PdfReader(new FileInputStream(f));
parser = new PdfReaderContentParser(read);
strw = new StringWriter();
stretegy = parser.processContent(j, new SimpleTextExtractionStrategy());
strw.write(stretegy.getResultantText());
String da = strw.toString();
//set extracted text from pdf file
//to Edit-text
edt1.setText(da);

Categories

Resources