I have data with local font in my database. I can extract it successfully in php (displaying the same local word in webpage). Now, I want to send this data using JSON so that it can be accessed by Android app. While encoding in JSON, is it possible to encode in different font's unicode?
After retrieving JSON encoded values, the word can be displayed in font A (but in broken format). But I want to show it using font B (so that broken stuffs are solved in this).
So, is there any way to encode using different font so that I can fix my issue ?
Thanks.
So, is there any way to encode using different font so that I can fix my issue ?
No. Unicode strings contain only plain semantic text. There are no markup constructs in it that would allow you to choose a different font; that has to be provided by a layer above text such as HTML.
the word can be displayed in font A (but in broken format). But I want to show it using font B (so that broken stuffs are solved in this).
The app that consumes the JSON and displays data from it must be altered to display using Font B.
Note that Android support for Indic scripts has historically been notoriously poor and still suffers from bugs; prior to the 4.x series you will probably be missing any kind of Font B or even correct Devenagari glyph layout.
By definition, JSON must be encoded in UTF-8.
When you talk about font, I assume you are talking about encoding - text by itself does not specify which font will be used to render it on user screen. Best option is to always store your data in UTF-8, so you don't have to recode anything. All open-source databases (PostgreSQL, MySQL, etc) provide a way to store text in UTF-8 by default. Also, Android uses UTF-8 by default for all strings (JSON, XML, etc).
If you data is encoded in some legacy encoding (like CP1250 or CP1251), your should use appropriate method to convert that encoding into UTF-8 before creating JSON. Almost every platform provides a way for such conversion. One of the popular libraries for this is iconv (it has bindings for PHP, C/C++, etc), but I am sure there are many other ways to accomplish that.
Related
I am thinking about making a keyboard (for myself and a few friends) with the sprites of the 721 Pokémon in it. First, however, I need to figure out a good way to store these characters. My idea was to store these in unused Unicode characters, but I need 721 of them.
Is there a better way to have custom emoji without overwriting existing ones? If not, what are 721 characters I can use (preferably together, no breaks in between) to store the Pokémon?
As a bonus, how can I store the shiny versions of these Pokémon?
Then, how do I draw these characters using my keyboard?
Encoding
You probably want to store them in a Private Use Area (PUA) block.
There are 3 of them, I would use one of the supplementary ones, as the risk to stumble on someone's else private use is reduced.
Don't override the existing ones.
Rendering
You will need to use your own font and embed it in your application.
You will need a way to tell the text engine "hey, for characters in this range use this font". So you need some rich text format (i.e. HTML), so use a WebView.
TextView might also work, if you use a Spanned created Html.fromHtml. But I am not sure if that supports specifying an embedded font, need to try.
Input
You would need a custom keyboard... There might be some open source, data driven one. Or one in the store that allows you to customize it. Or you can add them to the dictionary, with a shortcut, but then you will need to know all the names (so that you can type p.pikachu). Or you can use a character picker.
Storage
At this point they are strings. You can store as any other text, move through the wire, on disk, etc.
Currently I am creating an Android application which allows to extract main content and picture from a website. Now I am using Jsoup API to extract all p tags from the HTML. However, it is not a good solution. Any suggestion or better solution enable me to extract main content and picture from a website in Android?
I didn't find anything that works for me, so I published Goose for Android, here: https://github.com/milosmns/goose
Some description follows...
Document cleaning
When you pass a URL to Goose, the first thing it starts to do is clean
up the document to make it easier to parse. It will go through the
whole document and remove comments, common social network sharing
elements, convert em and other tags to plain text nodes, try to
convert divs used as text nodes to paragraphs, as well as do a general
document cleanup (spaces, new lines, quotes, encoding, etc).
Content / Images Extraction
When dealing with random article links you're bound to come across the
craziest of HTML files. Some sites even like to include 2 or more HTML
files per site. Goose uses a scoring system based on clustering of
English stop words and other factors that you can find in the code.
Goose also does descending scoring so as the nodes move down - the
lower their scores become. The goal is to find the strongest grouping
of text nodes inside a parent container and assume that's the relevant
group of content as long as it's high enough (up) on the page.
Image extraction is the one that takes the longest. Trying to find the
most important image on a page proved to be challenging and required
to download all the images to manually inspect them using external
tools (not all images are considered, Goose checks mime types,
dimensions, byte sizes, compression quality, etc). Java's Image
functions were just too unreliable and inaccurate. On Android, Goose
uses the BitmapFactory class, it is well documented, tested, and is
fast and accurate. Images are analyzed from the top node that Goose
finds the content in, then comes a recursive run outwards trying to
find good images - Goose also checks if those images are ads, banners
or author logos, and ignores them if so.
Output Formatting
Once Goose has the top node where we think the content is, Goose will
try to format the content of that node for the output. For example,
for NLP-type applications, Goose's output formatter will just suck all
the text and ignore everything else, and other (custom) extractors can
be built to offer a more Flipboardy-type experience.
Why do you think it's not a good solution to use Jsoup?
I've written many web scrapers for different webpages, and in my experience Jsoup is the way to go for that task. You should study the Jsoup Syntax it is very powerful and with the right selectors you could extract most information from HTML documents very easy. Generally it becomes harder to extract information when the document has no id, class attributes or other unique features.
Other HTML parsers that might be interesting for you are JTidy and TagSoup
You could try the textracto api it automatically identifies the main content of HTML documents. There is also the opportunity to parse OpenGraph meta data, therefore you were also able to extract a picture (og:image).
I'm relatively new to mobile app development - I'm kinda learning as I go. I'm creating an app that will serve multiple purposes - notifications, audio/video, etc. One of the features of the app will be to display the contents of an unpublished book (no plans to publish it either via the traditional methods available today). Essentially, I want the part of the app to do teh following:
1) Have a menu which will server as a table of contents.
2) Display the text, which will be in English and Arabic.
3) Have the english text searchable.
4) Have the ability to favorite certain sections of the text.
Just wondering what's the best way to build this? Should I convert sections of my file to html and use webview? Or should I use textview?
I'm looking for the option that gives me the most robustness in terms of functionality, and flexibility when it comes to design (i.e. background images, custom fonts, formatting).
Thanks in advance.
WebView or HTML page is not a very good approach.
You can try an approach in which your data is stored in json format in your resources-->raw folder and then parse each element of the JsonObject to populate views dynamically.(If you have server then you fetch data via HttpConnection). For start you can see here
convert your file to html file and display it using webview container...If you have your book in word format then convert it to html file in any website and then display it using webview.
I've just discovered an issue where city names that contain accent marks, e.g. La Cañada, Peñasco, etc., won't save to my database. Looking through the answers to another SO question, What is the best collation to use for MySQL with PHP?, I've tried changing both my database and the varchar's collation type from latin1_swedish_ci to utf8_general_ci which still refused the character. I also tried utf8_unicode_ci with a similar result.
I've verified that the save works if I strip out the accent mark on the client side, but ideally I'd like to keep it in there, since that is the real name of the city (according to google maps apis anyway).
What collation types do you use to support ñ?
Additional info: Using MySQL, phpMyAdmin, and CakePHP with an Android app as the client
Thanks for the suggestions so far. I guess this is turning into a CakePHP question now... I noticed that by default utf8 is not enabled, so I enabled it in my app/config/database.php file. I FTPed the file back to the server and tried it again still without any luck. Do I need to redeploy the application to kick off those db config changes, or is there another area of my application I should check? First time CakePHP user here.
Collation is merely the order in which characters are sorted, which is a necessary step in performing comparisons. It has nothing to do with how data is stored (except insofar as a given collation is specific to some encoding).
Character encoding is the mapping of characters to their binary representation for storage, which determines the supported character set.
However, just because a database/table/column are using a particular character encoding is not the end of the story. You must also consider the encoding used within your application and on its interfaces with other components (such as MySQL).
Whilst it's aimed at PHP, UTF-8 all the way through pretty much covers all of the things you need to consider.
I'm using SQLite database in my android project. I want to store there lots of string containing polish fonts. To manage the database I'm using SQLite Database Browser. The problem is: when I'm importing csv filled with strings to database, my text gets changed from, for example "Wysyłaj własnoręcznie" to "Wysy³aj w³asnorêcznie". Any ideas how to properly convert this kind of characters?
Presumably you are opening an InputStream to some source for the csv text, then wrapping that in an InputStreamReader. You need to specify the proper encoding when creating the InputStreamReader. The default encoding is probably ISO-8869-1 and your text is probably UTF-8. This would explain why characters beyond U_007F are not interpreted correctly.