I am currently developing an app based on Android, however this question is more of a problem that relates to almost any type of the program developing.
My question is : Let's say your program or app saves a file, say a text file that says:
"hello hello hello world!!!"
and you save it. And one day your app loads that file and just add a simple character "a" within the original text file. Thus the text file has to save
"hello hello a hello world!!!"
then, if you want to save this changed file, is the app or program obliged to write the whole file from start to end again to save the revised data with so little change? I heard that modern file systems split the file into pieces at saving and reassemble them when loading by finding each piece of the file with some sort of an address table.
So, wouldn't it be faster to just rewrite the revised part and leave the other parts alone, and then when you load the file you just put the piece with the revised data in right place within the other pieces?
Related
I am new to android development. My problem is simple. I have got some test_cases files in a folder. I want to read them ,perform some test operations on them and want show the names of files with test results on screen .But i don't know how to do that.Any suggestion will be helpful....
In an Android project I have several thousand highly dynamic audio files, in multiple languages. Highly dynamic as in they may change from week to week during development.
Some files are duplicates (within a language), and will have to be in order not to break the logic in the code and for maintanability - but it is a waste of space!
Example (just an example, don't worry about semantics):
raw/en/time_rep.mp3 - used as in "one more time"
raw/en/time.mp3 - used as in "it is now time for"
raw/de/time_rep.mp3 - may be translated to "mal"
raw/de/time.mp3 - may be translated to "zeit"
So, the word is same in English, and therefore a duplicate, but not in German, therefore, we need two resources.
Ideally in English both R.raw.time_rep and R.raw.time would refer to the same time.mp3 audio file, but not in German.
For strings and images it is possible to create an AliasResource, but not for raw files.
Any thoughts on how I can create "soft links" to avoid duplicate having raw resources, so that I can still reference R.raw.time_rep and R.raw.time from within the code, with little to no manual changes whenever I get a new batch of updated raw audio files?
NB: Don't worry about identifying the duplicates. I can do this in a batch script while converting and post processing the audio files.
Any thoughts on how I can create "soft links" to avoid duplicate having raw resources, so that I can still reference R.raw.time_rep and R.raw.time from within the code, with little to no manual changes whenever I get a new batch of updated raw audio files?
Just create lookup table (in any form you like: HashMap, database table etc) and then use it to pick right audio file instead of picking it directly like you do now.
I've build an application that uses Tesseract (V3.03 rc1) to identify some specific text strings. These are, unfortunately, printed on a custom font that requires that I build my own traineddata file. I've built the application on both iOS (using https://github.com/gali8/Tesseract-OCR-iOS for inspiration) and Android (using https://github.com/rmtheis/tess-two/ for inspiration as well).
The workflow for both platforms is as follows:
I select a bounding box on the preview screen for where I can crop out the relevant text, and crop the image accordingly.
I use OpenCV to get a binary image (using OpenCV's adaptive threshold function with the same parameters for both platforms)
I pass this binary image to Tesseract. Both platforms (Android and iOS) use the same traineddata file.
And yet, iOS recognizes the text strings perfectly, while Android keeps misidentifying certain characters (6s for Ss, As for Hs).
On both platforms, I use the same white list string, I disable load_type_dawg and load_system_dawg, and also choose to save the blob choices.
Has anyone encountered this kind of situation before? Am I missing a setting on Android that's automatically handled in iOS? Is there something particular about Android that hasn't crossed my mind?
Any thoughts or advice would be greatly appreciated!
So, after a lot of work, I found out what was wrong with my Android application (thankfully, it wasn't an issue with Tesseract at all). As I'm more familiar with iOS apps than Android, I wasn't sure how I could load the traineddata file onto the application without requiring the user to have the file loaded on their external storage device. I found inspiration in this project (http://www.codeproject.com/Tips/840623/Android-Character-Recognition), as they autoload the trained data file.
However, I misunderstood how it worked. I originally thought that the TessDataManager did a file lookup on the project's local tesseract/tessdata folder in order to get the trained data file (as I do this also on iOS). However, that's not what it does. It, rather, checks the internal file structure (data/data/projectname/files/tesseract/tessdata/traineddatafilegoeshere) to see if the file exists and if it doesn't, it copies over the trained data file it keeps in the Resources/Raw directory. In my case, it defaulted to the eng file, so it never read my custom font file.
Hopefully this helps someone else having similar issues. Thanks to Robin and RmTheis for all of your help!
Many times I've seen Android apps that have a list of languages displayed and I can tap on any of this language and download it for this specific app (GO Weather widget has this functionality).
I'm interested in how is this implemented and what is the best way to load languages dynamically in Android apps? Adding 100 string.xml resources in app project is not a solution and besides if I want to provide some kind of "funny holiday language" pack or add a new language I would need to upload the project to Google Play again and again...
Thanks!
While it's possible to use Expansion Files to add on to your app, they are limited in some ways. The main problem for you would be that you can only have a limited number of expansion files. If you wanted 100 languages, your only option would be to load them all in the expansion file, and download the whole thing. While that might not be a problem, since a list of translated strings probably isn't that large, you may want to go a different route.
The best option I see for downloading separate language add-ons is to forgo using strings.xml altogether. Just use a simple CSV file to hold your strings, mapping names to strings. When your program starts, read it in to a string array/map/whatever, and you have all your strings at the ready. This way, if you want to add a language, it's as easy as downloading a text file and saving it to your data directory.
Also, you can keep a file listing all the available languages on the same server, so you don't have to update the app if you want to add seasonal or limited-time-only languages, like you mentioned. Just read in the file to get the list.
Note, you'll need somewhere to host the files, but that's hardly a barrier in this day and age.
I'm making an app for my school which people can check with if they've got a schedule change. All schedule changes are listed here: http://www.augustinianum.eu/roosterwijzigingen/14062012.pdf. I want to search that page for a keyword (the user's group, which is entered in an EditText). I've found out how to make the app check if the edittext matches a certain string, so now I only need to download all of the text on that page to a string. But the problem is that it's not a simple webpage, but a PDFpage. I've heard that you need a special pdf library or something to extract the text from the PDF and then put that text into a string and then search the string for keywords using contains().
However I've got some questions about that:
This PDF is made with a PDF-creator, it's not a scanned page or so. You can actually for example select the text or search it for keywords using CTRL+F. So I wonder if it is actually required to extract the PDF and stuff or is there maybe an easier way.
I want the app to check for changes every, let's say hour. So it also has to download the PDF and extract the text every hour (about 8 pages), would that consume very much juice?
I've heard that there are many many libraries which do what I want. So which should I use? (If possible, I'd like one which is free :))
Could anyone explain to me how to use it in my code? (I'm not really experienced, so plz keep it a little easy :))
THANK YOU ALL SO MUCH!!!
Unfortunately, I did not working with java and you have to implement it in java code by yourself. Now I'll tell you, how finally I did it:
1) I took the file by your link. PHP is doing it by #fopen("http://...").
2) I opened it as a binary (it is important) and extracted two parts:
2.1) Data 3 0 obj part, which represents creation and modification dates. I did it by regex. It was simple and I mention it above.
2.1) Data stream from 5 0 obj, which represents the deflated data. IMPORTANT! Microsoft Excel inserts two bytes 0D 0A as a line break. Do not forget it, when you filtering the content by regexp. This bytes in the start and in the end have not to be included in extracted string.
3) I inflate a coded stuff by function $uncompressed = #gzuncompress($compressed) and put it in external file. You can see results there
4) Funniest part. The raw data inside the file in textual format. It looks like [(V)-4(RI)16(J)] TJ, and means VRIJ. You can read about texts in PDF in the PDF Reference v1.7, part 5.
5) I believe, the regular expressions can help you extract or/and transform the data.
IMPORTANT: I said "data stream from 5 0 obj", but number of the object "is subject of change". You must control the reference to the object from dictionary->pages->page->content chain. Description of the "bread crumbs" you can find in the manual I mentioned above.
Unfortunately, Excel do not embed any table structure in the PDF, but you can find the coordinates of the text portions and interprete it. Anyway it is a mess.
Do you think, dear Merlin, it is hard? No, dear, it is not. It is not hard, because there is no unicode symbols. The unicode in the PDF is THE REAL SUCK!
Good luck!
This PDF was made by Microsoft Excel and have the date stamps:
3 0 obj
<</Author(Janszen, Jan)
/CreationDate(D:20120613153635+02'00')
/ModDate(D:20120613153635+02'00')
/Producer(˛ˇMicrosoftÆ ExcelÆ 2010)
/Creator(˛ˇMicrosoftÆ ExcelÆ 2010)>>
endobj
You can use almost any programming language for taking the file by URL and extraction "ModDate" content. New ModDate means information update. For extracting this information you need not any libraries - this is the text in the file, lines 9, 10 and 11.
Ask Jan Janszen to add you in distribution list. The data in the file is encoded. You have to use a lot of programming techniques to reach source and restore information.