Handling phone numbers properly (storing, ideally using a unique form)

Handling phone numbers properly (storing, ideally using a unique form) - android

This question is not specific to Android but I have included the tag.
I need to be able to store phone numbers in some sort of standard form (ideally a string) where equality can be tested/evaluated quickly (hence a string would be ideal)
I found some answers already, the best ones pointed to http://developer.android.com/reference/android/telephony/PhoneNumberUtils.html (I'm fine with using a library to do it for me)
BUT this isn't really good enough, I've tried a variety of format numbers, learnt about the Editable factory to use some of the static methods in that class, but they don't seem to return the form I was expecting.
I was expecting something like a phone-number-hash, that two inputs representing the same number would yield the same in this "standard form" and that one could dial this standard form and be fine. I thought that all the various +s and whatnot would be short-hands for this standard form.
I'm not sure if such a thing exists now.
I understand that some things mean "current area" (or country) which is why land-lines can ommit area codes, I expected a function that would return the format for the current location (but this doesn't apply to mobiles, if it were a land line to prepend the area code for example, this would be (closer) to the "standard form" I keep assuming exists)
I am pretty sure that some full-form for phone numbers exists, thinking about how the telephone system works (which I infer I admit) there ought to be a form that identifies a number uniquely across the whole planet, and when this is not the case (such as local calls from land-lines without area codes) it is an optimisation.
So I have two questions:
How can I "expand" a phone number to a unique string for that number, such that any alternate forms of writing that number (with spaces, an 0 or +44....) "expand" to this unique number?
Are there any ISO(/IEC?) (what's the O stand for?) standard documents with drafts open to the public? I've read the Wikipedia page (ages ago, I've spent so many hours wiki-browsing, and opened hundreds of tabs) but it covers history, or some information on formatting), I'd like to know more about the thing I've taken for granted now for some 8 years or so.
Additionally, why is Windows Phone 8 a tag? To make the 12 proud Lumina owners not feel left out? (It was suggested as a tag!)
Addendum
Unfortunately Any API in android to normalize phone number there are no solutions there (this includes libphonenumber) and my quest to find out has lead to some interesting reads:
http://en.wikipedia.org/wiki/Panel_switch
http://en.wikipedia.org/wiki/Nonblocking_minimal_spanning_switch
http://en.wikipedia.org/wiki/Telephone_exchange
and I still cannot conclude there isn't some "full form" for numbers.
I dare not create a solution that simply swaps +44 for an 0 and such.

After reading your question, I was reminded of Google's library called libphonenumber. Its Google's common library for parsing, formatting, storing and validating international phone numbers. It does the following things ( some of which seem what you might be able to use):
Parsing/formatting/validating phone numbers for all countries/regions
of the world.
getNumberType - gets the type of the number based on
the number itself; able to distinguish Fixed-line, Mobile, Toll-free,
Premium Rate, Shared Cost, VoIP and Personal Numbers (whenever
feasible).
isNumberMatch - gets a confidence level on whether two
numbers could be the same.
isPossibleNumber - quickly guessing whether a number is a possible
phonenumber by using only the length information, much faster than a
full validation.
isValidNumber - full validation of a phone number
for a region using length and prefix information.
AsYouTypeFormatter - formats phone numbers on-the-fly when users enter each digit.
PhoneNumberOfflineGeocoder - provides geographical information related to a phone number.
As far as international format of phone number is concerned, E.164 format is an recommended by International Telecommunication Union. It defines a numbering plan for the world-wide public switched telephone network and is a general format for international telephone numbers ( usually stats with + followed by country code, Area code and the number).
Using the above library, validity of all the phone numbers can be checked if you mention the international code along with the phone number ( example 1 for US & Canada). If you don't have the code but you know the country's name for which you want to check the number, then also you can validate. You can also convert all the valid numbers of 1 standard E.164 format using this library. You can also 'expand' a number in Local National format of that particular country. You can save it as String as well. Although it does use PhoneNumberUtils that you mentioned in your question.
I am not sure if this is what you are looking for but I hope this information helps you.

Related

What this string means? [duplicate]

This question already has answers here:
What is a UUID?
(8 answers)
Closed last month.
ua:fa95ebdb-6da9-498c-aabb-77c7baaa28d3
When I woke up this morning, this "code" was the last text copied to my phone (Samsung, Android version 12), of course it was not me who copied this string, and I spent the night alone. I saw this when opening the GBoard keyboard, which offered to paste it. I didn't enable clipboard with this keyboard, so it was less than an hour before I woke up.
At first I thought it was me accidentally writing this text while sleeping (my phone is next to me all night, not turned off). But looking closer, I saw that it is a hex code. While searching on the net, I saw that ua can mean "user agent" but impossible to find what this hexadecimal string means...
Does anyone know this kind of string? Or would have any idea what could have happened to cause this to end up being copied to my phone? I admit it scares me...

The format of this string is called "UUID" (universally unique identifier), or "GUID" (globally ...) in some places.
It represents a series of 128 bits which are usually chosen at random when a unique ID for anything is needed.
The collision probability on 128 random bits is so low that they are considered unique (or more precisely: unique enough) even without a central coordinating instance that guarantees global uniqueness.
That being said - if "ua:" stands for "user agent" in this case, then it seems to be a string identifying your browser, and it might have gotten into your clipboard from a badly programmed tracking script on some website you visited.

This looks like an app's internal ID (GUID) to identify an entity. Within the context of the app in question,n copy and pasting something will result in an entity with that ID being copied. But if you paste it externally to the app, you will see the entity's ID. This is a common technique for app development.

How to get the current locale's alphabet?

Background
Today I've noticed that on Google's Contacts app, if you have both English and Hebrew contacts, and you switch to English locale as the main one, the first contacts are in English:
But, if you switch to Hebrew locale as the main one, the first contacts are in Hebrew:
The problem
I don't see which functions are used to do that. I tried to search over the Internet about this behavior and how it's done, but couldn't find it.
Comparing the values of characters will always return the same result, so the order here should be more dynamic.
What I've found
I thought this will help me:
val unicodeLocaleKeys = Locale.getDefault().unicodeLocaleKeys
But it always returns an empty set.
I also searched for such a function in classes such as Character, Unicode*, and String. I don't think it exists there.
The question
How does Google Contacts app get to sort the contacts by the current locales?
Is it possible perhaps to get the whole set of characters used by a specific locale?
Maybe it's possible to compare characters, while giving order of priorities of locales (users can choose multiple locales) ?

Maybe you are looking on the wrong thing.
Contact app seems not to have an alphabet built in (per locale), but just a collation (local sort) and display the first character. Possibly it will find "symbols" (Unicode categories) and put all symbols in the same bin.
Eventually you can get, from Unicode, the script name (and the direction). You may get the alphabet in few places (e.g. Wikipedia). It will fail for Chinese, and other rich alphabets. The problem: the "alphabet" is language specific. On some European countries you may have (some) accented characters, or character groups interpreted as a single character (also on phone books).
So, if you want to keep thing simple:
use collation and just first character
the same, but remove accent, and try to find if the letter has same priority in alphabetic order: in this case: ignore accent, else: keep it, see e.g.Å - place in alphabet. Maybe do the same with two letters, e.g. ll in the past.
find a library with handle such complex cases (and that it will updated regularly). This will help probably for Chinese and other languages with huge amount of characters.
EDIT: in short, instead of normal sorting of strings using str1.compareTo(str2), you should use :
Collator.getInstance().compare(str1,str2)

How to limit the use of certain character sets

I hope this question isnt going to be down-flagged for not showing some actual code, but thats the core of this situation. I simply have no clue where to start to solve this issue, even after trying to use several combinations of keywords on both Google, and here on SO.
My client suddenly decided that half of the Android App I'm developing for him has to be Chinese, so after I have made some changes in the Database so some fields can take in Simplified Chinese character sets, I need to make sure that my client (living in holland) only uses those characters in that particular EditText field in the app. (There are more Database fields that now only allow Simplified Chinese, however these values come from a dropdown list in the app, so I dont need to worry about wrong characters for them).
So how would one make sure that only Simplified Chinese is used in an EditText field?

Here is a project in Ruby that attempts to detect whether characters are Traditional Chinese, Simplified Chinese, or Japanese (maybe others?): https://github.com/jpatokal/script_detector
This detection is based on the Unihan Database, in which there is a file called Unihan_Variants.txt. (Download zip file containing this text file here.)
Conceivably, you could parse the txt file into a lookup table and check the unicode value as the text is entered during onTextChanged() for your EditText. However, the readme on the project linked above states: "It is important to understand that this requires long sections of text to work reliably, since a single character or even several characters may be valid Japanese, traditional Chinese and simplified Chinese simultaneously." So, weeding out characters on an individual basis might prove difficult.

Validating phone number in android

I want to validate phone number.
Phone Number = countryCode+areaCode+PhoneNumber.
I want to know what is the standard minimum and maximum length of Phone Number so i could validate it.
-thanks

Try this regex:
^[+]?[0-9]{10,13}$
Or this:
PhoneNumberUtils.isGlobalPhoneNumber("+912012185234");

It would depend on if you are accepting any country code or are looking for specific country code/s.
Then how many digits you are expecting for area code and then the actual phone number and whether these are being fetched as separate variables or one string.
So after answering these question. You can use (test your code) regex (learn how to use it), (and as #Basil has suggested), but vary the length (and possibly digits, if there are limited country codes) accordingly.
Here is a good answer to using regex in android.

How To Detect Is Text Human Readable?

I am wondering if there's a way to tell a given text is human readable. By human readable, I mean: it has some meanings, format like an article written by somebody, or at least generated by a software translator that is intended to be read by a human.
Here's the background story: recently I am making an app that allows user to upload a short text to a database. At the early stage of deployment I noticed some user always uploaded corrupted text due to a problem with encoding. This problem is fixed later, but leaves me wonder if there's a way to pick up non human readable text before serving the text back to users.
Any advice will be appreciated. The scope might be too large to include other languages, so at the moment let's limit the discussion to English only.

You can try a language identification tool, or something similar.
Basically you have to count the characters, or groups of character (character n-grams), and compare the distribution of the letters of the text submitted with the distribution of the letters of a collection of texts written in good english. (Make sure that such collection of texts is representative of the expected input).
In the continuity of a N-gram approach you might want to try a dictionary based approach and check for the presence of 'stop words' (e.g. 'the', 'a', 'an', 'of') in the input text.

Most of the NLP-Libraries will do the job (Spacy is a very common one). You can also go for language detection: Langdetect will support you on this
(https://pypi.org/project/langdetect/) as many others will do. If you need to be less specific (more math than language) you should look for Phonotactics (with BLICK for Python: https://github.com/mmcauliffe/python-BLICK) that looks into the construction of character order in a string.

Do a hexdump and make sure each character is less than or equal to 0x7f.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.