Intelligent and thrifty HTML parsing (without downloading complete page source)? - android

I need to parse some webpage in my application, but I have 1 big problem - data. Page I want to parse has something between 400-500kb, depending on time. I need to parse it few times per day - depending on user request etc, but standard should be 10-20 times per day. However, I'm worried about data - if I parse it 10-20 times per day its 150-300mb in 1 month (10 x 30 x 0,5mb). Which is too much, as many people have 100mb limit. Or even 500mb limit, and I can't eat half of it with my app.
I need only very small part of web page data, is there a way to download for example only a part of web page source, or only some specific tags, or download it compressed, or any other kind of download whithout eating hundreds of mb per month?

Doing this would probably need some co-operation from the web-server, if you are downloading the page from a server that isn't under your control then this is probably not possible.
One thing to bear in mind is that modern web browsers and servers typically gzip text-based data, so the actual amount of data being transferred will be significantly less than the uncompressed size of the pages (to get a rough idea of how big the transfer will be, try using a zip utility to squash the raw HTML).
One further thing that might help is the HTTP Range header, which may or may not be supported by your server - this lets you request particular parts of a resource, specified by a byte range.

The best way I can think of doing it is to set up a proxy server, which will download the page periodically, and extract the data you need, exposing that to your app in a smaller, more suitable format.
You could for example use a command line tool like wget or curl on a linux server, then use a script (php/perl/python/ruby/bash) to parse the data, and re-format it. Then you would serve the content using a web server (apache/lighttpd).
Personally, I would do the whole thing in node.js, if you have the luxury of your own server to use for this task.

Related

How to send big size image to a server in android?

I have to send a DNG file which has a size around 30 mb to my server and then I have to process DNG file in matlab and after that I need to get the results back from matlab to android device. I am new to sending images to a server and I do not know if is there any special way for big size images. I saw similar questions but I could not understand what to do for sending images to a server.
Could you please help me which steps should I follow respectively and which methods or libraries do I need to use ? Thanks.
If you'd like to send big files using HTTP, chunks are the way to go.
You would need a backend server supporting this kind of operation (either with some homemade recipe or with a standardized implementation).
You'd basically need an API to create the file description (including the expected size) which would return a handle on this future file (at least an ID). Then use PUT or PATCH and send the chunks one by one.

Give me a suggestion to improve my app performance, on retrieving huge data(nearly more than 10,000 records) from server?

On performance view, JSON parsing take huge time for retrieving Data.In my app i need to get nearly 10,000 records from Server.On emulator,it gets data immediately and works efficiently.But in my android phone,it takes more than 2 minutes to retrieve all data.Kindly,give me a suggestion for improve the performance on phone.
The emulator has access to your host machine's resources and is therefore not a good way to test performance.
I have used the Jackson streaming JSON parser with large data sets and it works well for me. However, I run this process in the background and am able to accept long fetch/parse times. Depending on the size of the data and the speed of the device you're running on, 2 minutes does not seem extraordinarily long to me.
Maybe you could fetch a smaller subset of the data first, and then display it while you fetch the rest in the background. You're probably going to have to do some kind of optimization like this in order to improve performance.
I think you can parse the complex JSON response using GSON. Please check these tutorial http://www.javacodegeeks.com/2011/01/android-json-parsing-gson-tutorial.html
You just create the model classes and use the proper annotations then the data will be parsed to model objects directly.
The question is, what causes this slowdown. Because of everything goes in the rmulator like charm, it is probably the network speed. You can help this if you find a solution to compress the json data.
It is a text, with a lot of repeat, it is very, very good compressable. And http supports compression.
You need to set it in your http server.
If you find this a promising direction, I suggest to make a new question, giving your http server version. Good luck!

Android: How to check a modified status of image on web

I'm developing an Image Manager in Android. It will always check images in servers to draw...
I use a Disk-cache to cache images in SDCARD, and i must refresh them by cycles for some minutes..
But, Performance is not good if always update non-changed files.
How to check a changed status of an image? I want only get changed-files...
Can get a hash-code of image? or check-sum code?
I thinked a soluton: create a XML file on server, that stores all hash-sum list...
But, it not possible... so, Images are stored in many many sites...
thanks!!!
A simple solution:
Attach a time stamp to each image every time it is updated/modified.
On each refresh cycle, get only the time stamps of images and compare with timestamps of images on device.
Only download images which have timestamps greater than that of the corresponding cached images.
This way, you'll only be downloading text data (very small in size) in each cycle and image downloading will only be done when it is necessary.
You can also wire up your backend server to respond to a REST API and return only the updated images. The decision on whether to handle the calculations and logic on client side or server side will need considerations of the tradeoffs involved and will depend a lot on your apps specific requirements.

Android: Set HttpUrlConnection speed limit

is it possible to set a download speed limit for a HttpUrlConnection in Android?
My app displays data. The data is retrieved from a web server.
There are two types of data that are loaded from the web server to display them in my app:
- small files (about 1 MB)
- big files (about 100 MB)
The problem is:
When I start to downlaod a big file, which is about 100 MB and may take about 5 minutes,
my app is nearly unuseable in the meantime.
A typical scenario is:
User klicks on a big file --> big file is downloaded in the background.
In the meantime the User wants to display another little file (1 MB, should take about a few seconds to load it from server ). But the problem is, that the first downlaod (loading the big file) uses the whole bandwith and therefore the download of the small file takes about 2 minutes (instead of a few seconds).
So I would like to set a speet limit for big files (for example half of the bandwith etc.) or to implement some priority queue for downloads...
How do I set the download limit?
What I would do is either use the DownloadManager as the previous commenter suggested, if you're developing for API level 9+. The trouble is with this is that downloads are shown in the notification bar and you might not want that.
As far as I can see there is no way to limit bandwidth on a specific download using the HttpClient used with Android. But I am guessing that you are downloading the file using an AsyncTask per file, and AsyncTasks are executed serially therefore that might explain why the 2nd file doesn't start downloading.
I strongly suggest looking at RoboSpice which is perfect for this type of background downloading. I'm pretty sure you will be able to download multiple files at once as well.

Android uploading pictures to server in most efficient way

I need to get images along with other data (very similar to email with attachements) to the server. I also need to do it in reliable manner so I can retry, etc on failure.
Server is WCF REST server and I do lot of other communications with it(JSON) but just got this new requirement to upload images.
Since I use JSON to post data to my server - I use GSON on Android side to serialize data.
Here is how I got it implemented so far (everything else works this way but I just started with images)
User filling activity fields (text data)
User takes some picture(s) via camera intents. Currently I just use 1 file for pictures
I take picture from SDCard, load/resize it - dispaly on ImageView and store in byte[]
User submits - I take all data along with images from byte[] and put it into Java object
Call GSON converter and serialize object
Save object into SQLite
AsyncTask looks in SQLite for records, opens cursor and get's text
AsyncTask creates HttpConnection and posts text data to my server.
THE END
Now to my problems..
Obviously on #3 - I "explode" ram with my byte arrays. Sometime I even feel my Nexus S becomes sluggish. But by doing that - I avoid filling SD card or app folder with many files. I take picture and than grab it. Next picture will overwrite previous one.
Step #5 IS slow. I didn't try custom serializer on GSON and instead of serializing byte array into something like [1,-100,123,-12] I can get much smaller size with Base64 but still. It will be slow. And I can have up to 20 images...
Step #6 is no problem. But with certain size (I tried 300px image) I started to get error in step 7 on OpenCursor
07-06 20:28:47.113: ERROR/CursorWindow(16292): need to grow: mSize = 1048576, size = 925630, freeSpace() = 402958, numRows = 2
07-06 20:28:47.113: ERROR/CursorWindow(16292): not growing since there are already 2 row(s), max size 1048576
07-06 20:28:47.113: ERROR/Cursor(16292): Failed allocating 925630 bytes for text/blob at 1,1
So, this whole thing is not something I like. Ideally I want all data to be uploaded in single piece to server.
I was thinking maybe storing images timestamped on SD card and store only their name in DB. Than I would process them right before sending to server. And on success I would delete those images. This kind of logic will make SQLite schema much more complex but maybe there is no better way?!
I guess I'm looking for best practice to deal with images. How to do followin with minimal memory/CPU usage:
Take picture
Display thumbnail
Resize
Send to server
EDIT 1:
Currently I'm researching possibility of uploading whole shizang as a multi-part MIME message. That would require adding some JAR's to my Android package. Also I'm not sure how effective will be Apache code to load images and sending them(I guess better than my code)
http://okandroidletsgo.wordpress.com/2011/05/30/android-to-wcf-streaming-multi-part-binary-images/
And that I would have to deal with parsing all this on WCF side since there is no way to do it with built-on .NET framework.
http://antscode.blogspot.com/2009/11/parsing-multipart-form-data-in-wcf.html
PLEASE TELL ME IF YOU TRIED THIS!
EDIT 2:
MIME is no good. There is no point since it serializes binary using Base64 which is same thing..
Nobody answered but here is what I figured hard way:
Rule #1: When dealing with images - avoid using objects/memory. Sounds obvious but it's not. I figured that resizing image to 800x600 is OK. Anything bigger - you may consider just leaving it as is because it is possible to do http stream on bigger file but it's hard to work with OOM exceptions when you load images into memory for processing
Rule #2: When use GSON - use JsonWriter to populate stream. Otherwise memory will explode. Than pass that stream into HttpClient. JsonWriter will write in chunks and data will be sent as it process.
Rule #3: See rule #2. It will work OK for multiple small images. This way GSON will serialize them 1 by one and feed into stream. Each image WILL be loaded int memory anyway.
Rule #4: This is probably the best solution but requires more coordination with server. Images sent 1 by 1 before message sent to server. They sent as stream without any encoding. This way they don't have to be base64 encoded and they don't have to be loaded in memory on device. Size of transmission will be smaller as well. When all images sent - post main informational object and collect all package together on server.
Rule #5: Forget about storing BLOB in SQLite
Bottom line:
It is much cheaper in term of resources to send images WITHOUT any resizing. Resizing makes sense only when Image get's to about 800x600-ish
Sending multiple images in a single package makes sense when image get's small like 600x400-ish
As soon as you need to upload files - start thinking streams everywhere. DO NOT load stuff into memory.

Categories

Resources