Sync large amounts of data between mobile app and webserver

Sync large amounts of data between mobile app and webserver - android

The Setup
I have native iOS and Android apps which sync data to and from my webserver. A requirement of the apps is that they work offline so data is stored on the apps in sqlite databases.
The apps communicate with the server with a series of REST calls which send JSON from the server for the apps to store in their databases.
My Problem
The scale of this data is very large, some tables can have a million records, and the final size of the phone databases can approach 100mb.
The REST endpoints must limit their data and have to be called many times with different offsets for a whole sync to be achieved.
So I'm looking for ways to improve the efficiency of this process.
My Idea
An idea I had was to create a script which would run on the server which would create an sqlite file from the servers database, compress it and put it somewhere for the apps to download. Effectively creating a snapshot of the server's current data.
The apps would download this snapshot but still have to call their REST methods in case something had changed since the snapshot happened.
The Question
This would add another level of complexity to my webapp and I'm wondering if this is the right approach. Are there other techniques that people use when syncing large amounts of data?

This is a complex question, as the answer should depend on your constraints:
How often will data change? If it is too often, then the snapshot will get out of date really fast, thus apps will be effectively updating data a lot. Also, with the big volume of data, an application will waste CPU time on synchronization (even if user is not actively using all of that data!), or may become quickly out of sync with the server - this is especially true for iOS where Applications have very limited background capabilities (only small window, which is throttled) compared to Android apps.
Is that DB read-only? Are you sending updates to the server? If so, then you need to prepare conflict resolution techniques and cover cases, in which data is modified, but not immediately posted to the server.
You need to support cases when DB scheme changes. Effectively in your approach, you need to have multiple (initial) databases ready for different versions of your application.
Your idea is good in case there are not too many updates done to the database and regular means of download are not efficient (which is what you generally described: sending millions of records through multiple REST calls is quite a pain).
But, beware of hitting a wall: in case data changes a lot, and you are forced to update tens/hundreds of thousands of records every day, on every device, then you probably need to consider a completely different approach: one that may require your application to support only partial offline mode (for most recent/important items) or hybrid approach to data model (so live requests performed for most recent data in case user wants to edit something).

100mb is not so big. My apps have been synching many GBs at this point. If your data can be statically generated and upated , then one thing you can do is write everything to the server, (json, images, etc...) and then sync all on your local filesystem. In my case I use S3. At a select time or when the user wants to, they sync and it only pulls/updates what's changed. AWS actually has an API call called sync on a local/remote folder or bucket. A single call. I do mine custom, but essentially it's the same, check the last update date and file size locally and if it's different, you add that to the download queue.

Related

Export data to preload into a Firebase instance in a mobile app

The use case is a dual platform mobile app for an event. There is a schedule with photos, links, bios of the speakers and talk descriptions. If all of the attendees happen to download and open the app at the same time and in the same place, they might not get the best experience -> the WiFi might slow the calls into the data server, calls into the FireBase server side will spike.
Is it possible to export a database from the server side and preload the event schedule data into the mobile app download? Upon launch the app can sync any last minute updates, as needed, with a connection and a short sync to Firebase.
If this type of architecture is not available, is there an alternative that the Firebase team would recommend?

There is no way within the Firebase Database API to preload data into the disk cache.
Two things I can think of (neither of them very nice):
Have the client read the JSON file from your app's resources and write it to the location. The end result of this will be that the data on the server stays unmodified. But it does result in each client writing the same data to the server, so the inverse of your original problem (and likely worse performing).
Have a wrapper around the Firebase API calls that loads from the JSON file and then have them later attach listeners after a random delay (to reduce the rush on the app).
As said, neither of them is very good. For both of them, you can download the JSON from the Firebase Database console.
In my experience the usage of conference apps is a lot lower than most developers/organizers imagine. It's also typically quite well spread out over the duration of the conference. So reducing the amount of data you load might be enough to make things work.

On android you can ship a sql database in the assets directory with the app and then reconcile it with the updates when the users open the app. The Firebase database is a json file. You could also ship that in the assets directory and then reconcile on first load.

Android: store data locally or not?

My Android app is fetching data from the web (node.js server).
The user create a list of items (usually 20-30 but it can be up to 60+). For each item I query the server to get information for this item. Once this info is fetched (per item), it won't change anymore but new records will be added as time go by (another server call not related to the previous one).
My question is about either storing this info locally (sqlite?) or fetching this info from the server every time the user asks for it (I remind you the amount of calls).
What should be my guidelines whether to store it locally or not other than "speed"?

You should read about the "offline first" principles.
To summarize, mobile users won't always have a stable internet connection (even no connection at all) and the use of your application should not be dependant on a fulltime internet access.
You should decide which data is elligible for offline storage.
It will mainly depend on what the user is supposed to access most often.

If your Items don't vary, you should persist them locally to act as a cache. Despite the fact that the data mayn't be really big, users will welcome it, as your app will need less Internet usage, which may lead to long waits, timeouts, etc.
You could make use of Retrofit to make the calls to the web service.
When it comes to persisting data locally within an Android application, you can store it in several ways.
First one, the easiest, is to use Shared Preferences. I wouldn't suggest you this time, as you're using some objects.
The second one is to use a raw SQLite database.
However, I'd avoid making SQL queries and give a try to ORM frameworks. In Android, you can find several, such as GreenDAO, ORMLite, and so on. This is the choice you should take. And believe me, initially you might find ORMs quite difficult to understand but, when you learn how do they work and the benefits they provide us, you'll love them.

Looking to implement a local data cache in an android app

I have an in-house app that reads data from a WCF server. It keeps a local copy of important items, but once it uploads successfully, it deletes the local copy. This works great as long as there is cell coverage. I have figured out that I need to keep a local copy of all recently accessed data so that the tablet isn't rendered useless if it loses cell coverage. I was wondering if anyone had already written or thought through a system that would manage local data efficiently. There are several important aspects I would like to see:
Whenever a record is read from the server, a local copy is created on "disk"
Also, when data is read from the server, it checks to see if the local copy has been successfully updated to the server before it overwrites the local copy. If the local copy hasn't been updated on the server, it needs to use the local copy.
If it tried to update to the server and fails, then there needs to be a background process that tries to send it later when a cell signal becomes available.
It needs to be able to handle different record types and different key types for looking up the records.
It needs to be able to purge local copies of the records if they have not been accessed for a certain period of time.
Number 4 is my big sticking point. Is there a good way to keep a collection of different types of records and different key types and numbers of keys in order to access them?

I wound up using Akavache github.com/akavache/Akavache to implement my scenario. It is a really cool set of libraries. You can load it using NuGet. Here is a link to a blog that has a basic demo of how to use it:
https://codemilltech.com/akavache-is-aka-awesome/.

Web server best practices

I need a suggestion on a design decision.
I am creating a ecommmerce app and have a lot of items (>10000) to show in my app. Now here are the 2 options I have
1) Get all the items information from the server and save in local db and synchronize the information every time (let say 15 min)
2) Get the information every time from the web server (through rest api).
There are pros and cons of both the methods. Using local db I can get fast results and less server bandwidth but will have to handle synchronization
With second approach, I will have a lot of server request to and free and load on server.
I would also like to know how does other apps like amazon and flipkart handle this. Do they save in local db or request server every time.

What you should be looking for is a mixed design between local and remote.
In terms of data types there are two major types:
blobs 'binary large objects' for example: images, videos ...
and small data (usually json/xml representation of the items).
Amazon and other web apps provide fresh data every time the app loads, and at the same time keeps a local copy of the data incase the app went offline, or sometimes even use that data in the next load while waiting for the backend.
On the other end those app maintain a cache level for large data so that they don't have to load it more than once.
But the key for this to work is to have a very fast backend that contains many features to improve its speed including:
cloud front end that allows users to communicate with the closest server around them.
memcached or any other caching technology that will keep the info about the items in the RAM in the servers and not having to query the database to get them.
what usually happens is that the backend ensures that its data always loaded in the ram/cache by ether querying the database every specific time or by pushing to cache every time an insert/delete/update happened to the database.
Here is how Twitter is doing it
One last note Your app shouldn't take longer to interact with more than a web page, its not acceptable for native apps to take longer than web apps to allow the user to interact with them.

Only load what you want to show, but cache intelligently.
For example; an image for a product isn't going to change very often, but the amount of items available for a very popular item can change every second. Decide on a case by case basis what piece of information to refresh and when.
You definitely don't want to pull down everything from your server every time someone launches the app. That does not result in lower bandwidth. It will melt your server, eat up their data plan, and fill their phone storage with products that they will never see.

providing a web service: what are best practices for splitting JSON data into two datasets?

I have a database and I need to present a data via HTTP using JSON web services. Currently, I'm designing JSON datasets that will be provided as web services. The data from the tables will be aggregated to suite an app needs.
If the data size is large and we will try to download that at once then it might take too much time and an app will not be responsive at start. That's bad. It's well known we should minimize the number of http requests the app will make to download the data. However if we would split the data into small chunks then during every app action step an app will performe http requests and that might be unresponsive solution too.
We are talking here about mobile app development, so internet will be exposed over cellular ISP or wifi, so the speed might be quite slow. I understand the split process depends on app workflow process and etc., just curious are there any general guidelines? For example: if JSON data is larger than 1MB then definitely split it into smaller chunks...

Look at how your mail read work. You probably have tens of thousands of emails in your account. The app will show the first ones then provide a button at the bottom of the list to display more items. It's a pretty good solution usually to provide a lot of data.
Also #Selvin ideas are just great. Don't use UI thread to download stuff, use a different thread. Services are pretty good for getting data asynchronously.

One of the way is to create a service which gets started when there is network. All downloaded data can be cached in sqlite and use a content provider to get the data. But it depends on your app.
Sometimes it depends on your UI Screen. For eg. You can create a Listview with load more. On click of it load extra data.
Other way is to create API such as which gives only updates based on timestamp. But its all depends on the app. Just sharing my ideas. Might not be perfect. Others can surely give a better one.

Develop Reference

The Android operating system is a mobile operating system that was developed by Google (GOOGL?) to be primarily used for touchscreen devices, cell phones, and tablets.