Best Character for splitting Android

Best Character for splitting Android - android

I started an android project, just like chat program.
Data downloaded from my server just like this
1~my name~my username~message
Nah, my question is, is there any character that compatible with android
to replace the delimiter (~) above. Im afraid, if in other day, user use the
character ~, program will crashed.
I used character ÷, but my android cant read it, it turned to '?'.
Did someone had the same problem ??

First of all it is almost bad idea to create your own format for client-server communication, my best advice is to give a shot to json or xml. There are lots of library available both on client side and server side to form/parse them all you have to do is use you back-end language to return either one of the format.
For python : http://docs.python.org/library/json.html
For php : http://php.net/manual/en/book.json.php
For Android : http://developer.android.com/reference/org/json/JSONObject.html
You can easily find other languages with simple search.

If you're using also Java on the server side, you could define an object like ChatMessage and just send it per Socket and an Object Stream to the Server.
As Burak noted, your way is the wrong way... but there are several other ways, IMHO an object stream might be the easiest solution for you.

If you use a delimiter which is a possible content of the data put into the flow you are delimiting, you will have a problem.
To prevent that, you need to prevent the character from occurring in a way that could be misinterpreted.
At the input side, detect occurrences and replace them either with a special code, or with an escaping prefix character, or quote the contents (though then you have to handle literal occurences of the quote characters)
If you use an escaping character, your splitting code must ignore any delimiter following an escape character or within a quoted sequence.
At the output side you should replace the codes or escape sequences with a literal instance of the encoded character or remove any quoting characters.
As others have mentioned, there are a number of standard schemes and functions for handling them.

Related

How do the Unicode control characters work?

What I'm doing now is to show the phone number correctly under right-to-left layout. I want +111111111 but it appears like 111111111+ now. I found a solution that using LRM(left-to-right mark), which is a Unicode control character '\u200E'.
There may be several formats for phone numbers in different place of world like XXX-XXX-XXXX. To prevent further bugs, I have to understand how those control characters work, especially which changes the direction of strings.
In my understanding, for common characters:
strings are stored as bytes in memory.
the editor/textview loads the bytes and look them up in
Unicode.
the editor/textview shows those Unicode in the form of
fonts.
So, when or which step do those control characters like LRM work? How to make sure that using them does not cause further bugs?
I wish I had made it clear for you.

How to limit the use of certain character sets

I hope this question isnt going to be down-flagged for not showing some actual code, but thats the core of this situation. I simply have no clue where to start to solve this issue, even after trying to use several combinations of keywords on both Google, and here on SO.
My client suddenly decided that half of the Android App I'm developing for him has to be Chinese, so after I have made some changes in the Database so some fields can take in Simplified Chinese character sets, I need to make sure that my client (living in holland) only uses those characters in that particular EditText field in the app. (There are more Database fields that now only allow Simplified Chinese, however these values come from a dropdown list in the app, so I dont need to worry about wrong characters for them).
So how would one make sure that only Simplified Chinese is used in an EditText field?

Here is a project in Ruby that attempts to detect whether characters are Traditional Chinese, Simplified Chinese, or Japanese (maybe others?): https://github.com/jpatokal/script_detector
This detection is based on the Unihan Database, in which there is a file called Unihan_Variants.txt. (Download zip file containing this text file here.)
Conceivably, you could parse the txt file into a lookup table and check the unicode value as the text is entered during onTextChanged() for your EditText. However, the readme on the project linked above states: "It is important to understand that this requires long sections of text to work reliably, since a single character or even several characters may be valid Japanese, traditional Chinese and simplified Chinese simultaneously." So, weeding out characters on an individual basis might prove difficult.

Identify type of regex pattern

In my application I am adding edittext based on response provided by server. For each edittext server also provides regex pattern to match. I am able to successfully able to match pattern and do validations. But I want to identify type of regex pattern so that I can open keyboard according to value edittext should accept.
For example,
If edittext should accept email address then keyboard with # sign opens up and if edittext accept numeric values it should open numeric keypad.
Is there any library which can return type from its regex pattern such as "Email", "Number" etc. from regular expressions as there can be several different types of regex pattern?
EDIT: I know how to set input type for edittext but I need to find out type from regex pattern. I am not able to make changes in server I have to handle this on client side.

There most probably isn't one. The reason is - there is no way to tell for sure. Anything you can come up with will be heuristic.
Heuristic one:
If the pattern looks for something containing a dot, followed by # sign, followed by something containing a dot - it's an email validation.
If the pattern contains only \d, or number ranges ([1-5]), or single numbers (7) plus repetition meta characters (?, *, +, {4, 12}), it's a number validation.
If the pattern contains \w and no # sign, it's a regular text.
Continue in the same spirit.
+ high control. You can always add new guesses when you see that your results aren't accurate in some case
- requires more code to implement
- requires very good knowledge of regexes
Heuristic two:
Use a list of strings, which you know the type of and try to match them with the regex. Aka, for emails try example#gmail.com.
+ easy to implement. Small chance problematic logic
- least amount of control. If the server is giving you email patterns for different domains you can't guess that this is an email pattern, unless you know all possible domains
Heuristic three:
Use a library that can generate example strings from regex and match them with your own regexes to determine the type. Here is one for Java and another one for JavaScript.
+ gives a good combination of high control and easy implementation
- you still have to write your own regexes (not as trivial as the 2nd heuristic)
- people sometimes write regexes that allow some false positives. Therefore, generated strings might not be in the perfect format (not as much control as the 1st heuristic)
Are the regexes static?
If yes - you should make a mapping and use that.
If no - use a heuristic like one of the above and improve it over time as you gain more statistics about how the generated regexes usually look.

Extract text from PDF in code

I'm making an app for my school which people can check with if they've got a schedule change. All schedule changes are listed here: http://www.augustinianum.eu/roosterwijzigingen/14062012.pdf. I want to search that page for a keyword (the user's group, which is entered in an EditText). I've found out how to make the app check if the edittext matches a certain string, so now I only need to download all of the text on that page to a string. But the problem is that it's not a simple webpage, but a PDFpage. I've heard that you need a special pdf library or something to extract the text from the PDF and then put that text into a string and then search the string for keywords using contains().
However I've got some questions about that:
This PDF is made with a PDF-creator, it's not a scanned page or so. You can actually for example select the text or search it for keywords using CTRL+F. So I wonder if it is actually required to extract the PDF and stuff or is there maybe an easier way.
I want the app to check for changes every, let's say hour. So it also has to download the PDF and extract the text every hour (about 8 pages), would that consume very much juice?
I've heard that there are many many libraries which do what I want. So which should I use? (If possible, I'd like one which is free :))
Could anyone explain to me how to use it in my code? (I'm not really experienced, so plz keep it a little easy :))
THANK YOU ALL SO MUCH!!!

Unfortunately, I did not working with java and you have to implement it in java code by yourself. Now I'll tell you, how finally I did it:
1) I took the file by your link. PHP is doing it by #fopen("http://...").
2) I opened it as a binary (it is important) and extracted two parts:
2.1) Data 3 0 obj part, which represents creation and modification dates. I did it by regex. It was simple and I mention it above.
2.1) Data stream from 5 0 obj, which represents the deflated data. IMPORTANT! Microsoft Excel inserts two bytes 0D 0A as a line break. Do not forget it, when you filtering the content by regexp. This bytes in the start and in the end have not to be included in extracted string.
3) I inflate a coded stuff by function $uncompressed = #gzuncompress($compressed) and put it in external file. You can see results there
4) Funniest part. The raw data inside the file in textual format. It looks like [(V)-4(RI)16(J)] TJ, and means VRIJ. You can read about texts in PDF in the PDF Reference v1.7, part 5.
5) I believe, the regular expressions can help you extract or/and transform the data.
IMPORTANT: I said "data stream from 5 0 obj", but number of the object "is subject of change". You must control the reference to the object from dictionary->pages->page->content chain. Description of the "bread crumbs" you can find in the manual I mentioned above.
Unfortunately, Excel do not embed any table structure in the PDF, but you can find the coordinates of the text portions and interprete it. Anyway it is a mess.
Do you think, dear Merlin, it is hard? No, dear, it is not. It is not hard, because there is no unicode symbols. The unicode in the PDF is THE REAL SUCK!
Good luck!

This PDF was made by Microsoft Excel and have the date stamps:
3 0 obj
<</Author(Janszen, Jan)
/CreationDate(D:20120613153635+02'00')
/ModDate(D:20120613153635+02'00')
/Producer(˛ˇMicrosoftÆ ExcelÆ 2010)
/Creator(˛ˇMicrosoftÆ ExcelÆ 2010)>>
endobj
You can use almost any programming language for taking the file by URL and extraction "ModDate" content. New ModDate means information update. For extracting this information you need not any libraries - this is the text in the file, lines 9, 10 and 11.
Ask Jan Janszen to add you in distribution list. The data in the file is encoded. You have to use a lot of programming techniques to reach source and restore information.

What is the best way to store and work with locale-dependent regular expressions in Android?

I use a bunch of locale-dependent Regular Expressions in my project,
For example (a simplest one):
\b(one|two|three|...|\d+)\b
So I want to store those regular expressions in something like values-en/re.xml and then use them through Context/R.re.* to parse the string that entered by user.
<string name="number_re">\b(one|two|three|...|\d+)\b</string>
So if he uses Russian locale and enters some phrase in Russian, I will use values-ru/re.xml with some value like:
<string name="number_re">\b(один|два|три|...|\d+)\b</string>
Is it the proper way intended (and it will not fail on some special characters used in both string resources and regular expressions) or there is another way to do it that I've missed?

I don't know what's considered proper or not proper (that's more of a app by app case). This seems like a reasonable approach. I mean you have separated out the locale specific aspect of your algorithm into place that has a mechanism for loading content by locale. Technically these regexes are resources. If it works for you and you aren't hindered by it now, I don't forsee this coming back and causing issues later.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.