I'm making an app for my school which people can check with if they've got a schedule change. All schedule changes are listed here: http://www.augustinianum.eu/roosterwijzigingen/14062012.pdf. I want to search that page for a keyword (the user's group, which is entered in an EditText). I've found out how to make the app check if the edittext matches a certain string, so now I only need to download all of the text on that page to a string. But the problem is that it's not a simple webpage, but a PDFpage. I've heard that you need a special pdf library or something to extract the text from the PDF and then put that text into a string and then search the string for keywords using contains().
However I've got some questions about that:
This PDF is made with a PDF-creator, it's not a scanned page or so. You can actually for example select the text or search it for keywords using CTRL+F. So I wonder if it is actually required to extract the PDF and stuff or is there maybe an easier way.
I want the app to check for changes every, let's say hour. So it also has to download the PDF and extract the text every hour (about 8 pages), would that consume very much juice?
I've heard that there are many many libraries which do what I want. So which should I use? (If possible, I'd like one which is free :))
Could anyone explain to me how to use it in my code? (I'm not really experienced, so plz keep it a little easy :))
THANK YOU ALL SO MUCH!!!
Unfortunately, I did not working with java and you have to implement it in java code by yourself. Now I'll tell you, how finally I did it:
1) I took the file by your link. PHP is doing it by #fopen("http://...").
2) I opened it as a binary (it is important) and extracted two parts:
2.1) Data 3 0 obj part, which represents creation and modification dates. I did it by regex. It was simple and I mention it above.
2.1) Data stream from 5 0 obj, which represents the deflated data. IMPORTANT! Microsoft Excel inserts two bytes 0D 0A as a line break. Do not forget it, when you filtering the content by regexp. This bytes in the start and in the end have not to be included in extracted string.
3) I inflate a coded stuff by function $uncompressed = #gzuncompress($compressed) and put it in external file. You can see results there
4) Funniest part. The raw data inside the file in textual format. It looks like [(V)-4(RI)16(J)] TJ, and means VRIJ. You can read about texts in PDF in the PDF Reference v1.7, part 5.
5) I believe, the regular expressions can help you extract or/and transform the data.
IMPORTANT: I said "data stream from 5 0 obj", but number of the object "is subject of change". You must control the reference to the object from dictionary->pages->page->content chain. Description of the "bread crumbs" you can find in the manual I mentioned above.
Unfortunately, Excel do not embed any table structure in the PDF, but you can find the coordinates of the text portions and interprete it. Anyway it is a mess.
Do you think, dear Merlin, it is hard? No, dear, it is not. It is not hard, because there is no unicode symbols. The unicode in the PDF is THE REAL SUCK!
Good luck!
This PDF was made by Microsoft Excel and have the date stamps:
3 0 obj
<</Author(Janszen, Jan)
/CreationDate(D:20120613153635+02'00')
/ModDate(D:20120613153635+02'00')
/Producer(˛ˇMicrosoftÆ ExcelÆ 2010)
/Creator(˛ˇMicrosoftÆ ExcelÆ 2010)>>
endobj
You can use almost any programming language for taking the file by URL and extraction "ModDate" content. New ModDate means information update. For extracting this information you need not any libraries - this is the text in the file, lines 9, 10 and 11.
Ask Jan Janszen to add you in distribution list. The data in the file is encoded. You have to use a lot of programming techniques to reach source and restore information.
Related
I hope this question isnt going to be down-flagged for not showing some actual code, but thats the core of this situation. I simply have no clue where to start to solve this issue, even after trying to use several combinations of keywords on both Google, and here on SO.
My client suddenly decided that half of the Android App I'm developing for him has to be Chinese, so after I have made some changes in the Database so some fields can take in Simplified Chinese character sets, I need to make sure that my client (living in holland) only uses those characters in that particular EditText field in the app. (There are more Database fields that now only allow Simplified Chinese, however these values come from a dropdown list in the app, so I dont need to worry about wrong characters for them).
So how would one make sure that only Simplified Chinese is used in an EditText field?
Here is a project in Ruby that attempts to detect whether characters are Traditional Chinese, Simplified Chinese, or Japanese (maybe others?): https://github.com/jpatokal/script_detector
This detection is based on the Unihan Database, in which there is a file called Unihan_Variants.txt. (Download zip file containing this text file here.)
Conceivably, you could parse the txt file into a lookup table and check the unicode value as the text is entered during onTextChanged() for your EditText. However, the readme on the project linked above states: "It is important to understand that this requires long sections of text to work reliably, since a single character or even several characters may be valid Japanese, traditional Chinese and simplified Chinese simultaneously." So, weeding out characters on an individual basis might prove difficult.
I am currently developing an Android app which is a Dictionary, where I am fetching meanings online with Wiktionary API with this: [http://en.wiktionary.org/w/api.php?action=query&prop=revisions&titles=overflow&rvprop=content&format=jsonfm
But I want to download the Wiktionary database offline and embed it inside my Android App.
Here is the Wiktionary Database Download Page:
1. Wiktionary
2. Wikimedia Downloads
According to my research I found out that Wiktionary Offline Database is in XML and SQL. But these files are too big. Embedding these files would make the APK size huge.So is there any solution to embed this easily in my App?
The developer [ of English Dictionary - Offline ] claims that they are using Wiktionary. I am still
wondering where did they get a Wiktionary Dump File >22 MB
I'm not being paid enough to tell you that.. (joke). Thing is you need to extract the dictionary entries from the XML files and once you get only those then the final content (text) file becomes smaller.
Alternatively...
You can try this TSV file (courtesy of: semisignal.com) which is a snapshot of November 2012 definitions. This contains most words your end-user checking English would need. The TSV is 54MB and is handled like a text file.
Try a definition : brushable -- TSV has below :(Compare to Wiktionary's entry for Brushable).
English brushable Adjective # Able to be [[brushed]]
English brushable Adjective # Able to be controlled by [[brushing]]
TIPS: For reducing filesize, you can trim off the starting "English" since you already know its all English definitions. Each trim will save you 7 bytes (multiply by total definitions).
Use a String.replace on "English " (with that space) to clear it.
Also replace "Adjective" "Verb" "Noun" with short codes that your
App knows the meaning of and shows entry type in the User
Interface. Code could be 1 meaning list entry as Adjective.
Your trimmed text file could like example below. Each double fullstop just means "next section of entry", so basically entry..type..definition where <xyz> is a link to another entry in the dictionary. 54 bytes of TSV entry now becomes 35 bytes for that one line.
brushable..1..Able to be <brushed>.
Save the final edited (reduced) text file. Embed that into your APK.
I suggest implementing the online API access, so small app can be downloaded and used, plus add a button somewhere that downloads the offline part. Also check network connection, and if it's not wi-fi, warn the user so the mobile data plan will not be abused for downloading 100 MB dictionary.
I want to create a word document within an android app and send that document through mail.
Is there any tutorial for creating a word document in android. I have gone through several other questions on this website but i didn't got a clear answer.
can we do this on Android using Apache POI.
If any any sample example exist then please mention.
Thanks in advance.
You can use any Java library in Android, so I do think this would be the way you could accomplish what you want (using Apache POI).
You can send the attachment by adding it as an extra to the Intent you use to create a mail message (lots of examples of that).
Apache POI looks your best bet, but note that the component that deals with Word docs only supports simple files:
HWPF and XWPF for Word Documents
HWPF is our port of the Microsoft Word 97 (-2003) file format to pure Java. It supports read, and limited write capabilities. It also provides simple text extraction support for the older Word 6 and Word 95 formats. Please see the HWPF project page for more information. This component remains in early stages of development. It can already read and write simple files.
We are also working on the XWPF for the WordprocessingML (2007+) format from the OOXML specification. This provides read and write support for simpler files, along with text extraction capabilities.
You should seriously consider whether you can use a different format for your emails - plain text, or maybe HTML.
I started an android project, just like chat program.
Data downloaded from my server just like this
1~my name~my username~message
Nah, my question is, is there any character that compatible with android
to replace the delimiter (~) above. Im afraid, if in other day, user use the
character ~, program will crashed.
I used character ÷, but my android cant read it, it turned to '?'.
Did someone had the same problem ??
First of all it is almost bad idea to create your own format for client-server communication, my best advice is to give a shot to json or xml. There are lots of library available both on client side and server side to form/parse them all you have to do is use you back-end language to return either one of the format.
For python : http://docs.python.org/library/json.html
For php : http://php.net/manual/en/book.json.php
For Android : http://developer.android.com/reference/org/json/JSONObject.html
You can easily find other languages with simple search.
If you're using also Java on the server side, you could define an object like ChatMessage and just send it per Socket and an Object Stream to the Server.
As Burak noted, your way is the wrong way... but there are several other ways, IMHO an object stream might be the easiest solution for you.
If you use a delimiter which is a possible content of the data put into the flow you are delimiting, you will have a problem.
To prevent that, you need to prevent the character from occurring in a way that could be misinterpreted.
At the input side, detect occurrences and replace them either with a special code, or with an escaping prefix character, or quote the contents (though then you have to handle literal occurences of the quote characters)
If you use an escaping character, your splitting code must ignore any delimiter following an escape character or within a quoted sequence.
At the output side you should replace the codes or escape sequences with a literal instance of the encoded character or remove any quoting characters.
As others have mentioned, there are a number of standard schemes and functions for handling them.
Is there a definitive method of creating either a PDF or a MS Word Doc file within the app and email it immediately (and possibly, also store it).
I have been trying for quite some time and have found out the JAVA libraries: apwlibrary and iText. But both of them dont provide any tutorials of sorts.
Could anyone point me in the right direction?
EDIT: Come to think of it, is could an online PDF generator be used, first by sending the data to the service, then retrieve the result and save it on the phone?
I would recommend apache fop http://xmlgraphics.apache.org/fop/
you can use standard FOP to generate pdf.
Unless it is a core feature of your device to create a pdf file I would suggest not to do it yourself. Adding PDF creation is going to be quite a lot of work potentially depending on your performance needs. Java libraries will be easier to add but less performant. Native libraries combined with Java will be more hazzle to maintain build and bug fixing wise.
If you just need to email some information why dont you create a message text in html and use a intent to email it with the build in email program instead? Or if you want you could e.g. put the PDF generation on a server and just email a link..
I'm working right now with JasperReports, an open source library to create reports in Java and export them to PDF, DOC, XLS... Using it in conjunction with iReport to create a group of templates makes it really easy to create files filled with content from different types of sources (I'm using JavaBeans).
If you don't like the idea of having static templates (That's a bit annoying depending on your needs), you can always take a look at DynamicJasper (The examples on the website are great).
Good Luck!
I have used Apache POI. It seemed to work well. http://poi.apache.org/
This actually, http://poi.apache.org/hwpf/