Avoid duplicates while syncing offline data in mobile app to server - android

We have a mobile application. There is a sales module. It calls the sales rest api to create a sales order.
Format of the sales api POST request:
//server generates primary key. there is no support for request id currently
{
"customer_id" : 1,
"items" : [
{ "item_id" : 123 },
{ "item_id" : 456}
],
"store_id" : 10,
"sale_time" : "2019-10-01 13:45:01"
}
As we have to support offline mode, all sales will be stored locally on the device and when the internet connection is available it will sync with the server.
But I got to know that http libraries can retry http request multiple times if the internet connectivity is poor. But this may cause to create multiple sales entry on the server.
How to implement syncing offline data properly given that server(rest api) generates the primary key for the new record? Updating existing record is fine as it has primary key generated by server but how to deal with new records without a primary key?
Scenario:
App is in offline mode
User creates a sales order and saves it locally
After sometime, say there is limited internet connectivity.
App fetches all yet to be synced records and calls rest API for each one.
Sales Rest API called for 1st record and app is waiting for response.
Internet connection fails, but server creates record and fails to notify the app
After few minutes say Internet connection is back, now app tries to call sales api for the 1st record again and receives success response as well. But there are two entries in the server.
How to fix this?

2 ways to handle this
1)using sync-status
when you store the data in storage add one more flag as "sync-status" and the data type will be boolean default is false. and get only those data that have sync-status as false. if data successfully sends to server change status as true.
2)sever syn-date-time
if your server send you sync date-time then this will be possible, you need to check the date after server-sync-date-time that data only u need to sync with the server

This is a tricky scenario. If you are having to deal with a single client communicating with an API, then simply passing a unique identifier to the API from the client and storing it, checking it before creating any future records will do the trick, as mentioned by Md. Asaduzzaman.
However, in a scenario where you have many clients using the same API, I would further extend this to include user information, so you know what client this request has come from. This is to avoid overlapping identifiers between clients. Yes GUIDs are unlikely to overlap, but it is possible.
This is cool, for creating new records for example, but what about updating existing records? In this case, you would also need something like a lastModifiedTime field both on the client and API, if the API lastModifiedTime is ahead of the client, then you have a conflict that needs to be dealt with.
You could perhaps have a single table in the API which is called EntitySyncStatus consisting of EntityTypeId, EntityId, ClientEntityId, UserId(or DeviceId), and lastModifiedTime could either be maintained in this table, or in the entity table.
This model will then support syncing multiple objects, without storing sync related fields on each individual entity, but also provide more flexibility and visibility.

Related

In any sort of Mobile/Web combo application, how would they synchronize the data between the 2 databases? [duplicate]

I'm looking for some general strategies for synchronizing data on a central server with client applications that are not always online.
In my particular case, I have an android phone application with an sqlite database and a PHP web application with a MySQL database.
Users will be able to add and edit information on the phone application and on the web application. I need to make sure that changes made one place are reflected everywhere even when the phone is not able to immediately communicate with the server.
I am not concerned with how to transfer data from the phone to the server or vice versa. I'm mentioning my particular technologies only because I cannot use, for example, the replication features available to MySQL.
I know that the client-server data synchronization problem has been around for a long, long time and would like information - articles, books, advice, etc - about patterns for handling the problem. I'd like to know about general strategies for dealing with synchronization to compare strengths, weaknesses and trade-offs.
The first thing you have to decide is a general policy about which side is considered "authoritative" in case of conflicting changes.
I.e.: suppose Record #125 is changed on the server on January 5th at 10pm and the same record is changed on one of the phones (let's call it Client A) on January 5th at 11pm.
Last synch was on Jan 3rd. Then the user reconnects on, say, January 8th.
Identifying what needs to be changed is "easy" in the sense that both the client and the server know the date of the last synch, so anything created or updated (see below for more on this) since the last synch needs to be reconciled.
So, suppose that the only changed record is #125.
You either decide that one of the two automatically "wins" and overwrites the other, or you need to support a reconcile phase where a user can decide which version (server or client) is the correct one, overwriting the other.
This decision is extremely important and you must weight the "role" of the clients. Especially if there is a potential conflict not only between client and server, but in case different clients can change the same record(s).
[Assuming that #125 can be modified by a second client (Client B) there is a chance that Client B, which hasn't synched yet, will provide yet another version of the same record, making the previous conflict resolution moot]
Regarding the "created or updated" point above... how can you properly identify a record if it has been originated on one of the clients (assuming this makes sense in your problem domain)?
Let's suppose your app manages a list of business contacts. If Client A says you have to add a newly created John Smith, and the server has a John Smith created yesterday by Client D... do you create two records because you cannot be certain that they aren't different persons? Will you ask the user to reconcile this conflict too?
Do clients have "ownership" of a subset of data? I.e. if Client B is setup to be the "authority" on data for Area #5 can Client A modify/create records for Area #5 or not? (This would make some conflict resolution easier, but may prove unfeasible for your situation).
To sum it up the main problems are:
How to define "identity" considering that detached clients may not have accessed the server before creating a new record.
The previous situation, no matter how sophisticated the solution, may result in data duplication, so you must foresee how to periodically solve these and how to inform the clients that what they considered as "Record #675" has actually been merged with/superseded by Record #543
Decide if conflicts will be resolved by fiat (e.g. "The server version always trumps the client's if the former has been updated since the last synch") or by manual intervention
In case of fiat, especially if you decide that the client takes precedence, you must also take care of how to deal with other, not-yet-synched clients that may have some more changes coming.
The previous items don't take in account the granularity of your data (in order to make things simpler to describe). Suffice to say that instead of reasoning at the "Record" level, as in my example, you may find more appropriate to record change at the field level, instead. Or to work on a set of records (e.g. Person record + Address record + Contacts record) at a time treating their aggregate as a sort of "Meta Record".
Bibliography:
More on this, of course, on Wikipedia.
A simple synchronization algorithm by the author of Vdirsyncer
OBJC article on data synch
SyncML®: Synchronizing and Managing Your Mobile Data (Book on O'Reilly Safari)
Conflict-free Replicated Data Types
Optimistic Replication YASUSHI SAITO (HP Laboratories) and MARC SHAPIRO (Microsoft Research Ltd.) - ACM Computing Surveys, Vol. V, No. N, 3 2005.
Alexander Traud, Juergen Nagler-Ihlein, Frank Kargl, and Michael Weber. 2008. Cyclic Data Synchronization through Reusing SyncML. In Proceedings of the The Ninth International Conference on Mobile Data Management (MDM '08). IEEE Computer Society, Washington, DC, USA, 165-172. DOI=10.1109/MDM.2008.10 http://dx.doi.org/10.1109/MDM.2008.10
Lam, F., Lam, N., and Wong, R. 2002. Efficient synchronization for mobile XML data. In Proceedings of the Eleventh international Conference on information and Knowledge Management (McLean, Virginia, USA, November 04 - 09, 2002). CIKM '02. ACM, New York, NY, 153-160. DOI= http://doi.acm.org/10.1145/584792.584820
Cunha, P. R. and Maibaum, T. S. 1981. Resource &equil; abstract data type + synchronization - A methodology for message oriented programming -. In Proceedings of the 5th international Conference on Software Engineering (San Diego, California, United States, March 09 - 12, 1981). International Conference on Software Engineering. IEEE Press, Piscataway, NJ, 263-272.
(The last three are from the ACM digital library, no idea if you are a member or if you can get those through other channels).
From the Dr.Dobbs site:
Creating Apps with SQL Server CE and SQL RDA by Bill Wagner May 19, 2004 (Best practices for designing an application for both the desktop and mobile PC - Windows/.NET)
From arxiv.org:
A Conflict-Free Replicated JSON Datatype - the paper describes a JSON CRDT implementation (Conflict-free replicated datatypes - CRDTs - are a family of data structures that support concurrent modification and that guarantee convergence of such concurrent updates).
I would recommend that you have a timestamp column in every table and every time you insert or update, update the timestamp value of each affected row. Then, you iterate over all tables checking if the timestamp is newer than the one you have in the destination database. If it´s newer, then check if you have to insert or update.
Observation 1: be aware of physical deletes since the rows are deleted from source db and you have to do the same at the server db. You can solve this avoiding physical deletes or logging every deletes in a table with timestamps. Something like this: DeletedRows = (id, table_name, pk_column, pk_column_value, timestamp) So, you have to read all the new rows of DeletedRows table and execute a delete at the server using table_name, pk_column and pk_column_value.
Observation 2: be aware of FK since inserting data in a table that´s related to another table could fail. You should deactivate every FK before data synchronization.
If anyone is dealing with similar design issue and needs to synchronize changes across multiple Android devices I recommend checking Google Cloud Messaging for Android (GCM).
I am working on one solution where changes done on one client must be propagated to other clients. And I just implemented a proof of concept implementation (server & client) and it works like a charm.
Basically, each client sends delta changes to the server. E.g. resource id ABCD1234 has changed from value 100 to 99.
Server validates these delta changes against its database and either approves the change (client is in sync) and updates its database or rejects the change (client is out of sync).
If the change is approved by the server, server then notifies other clients (excluding the one who sent the delta change) via GCM and sends multicast message carrying the same delta change. Clients process this message and updates their database.
Cool thing is that these changes are propagated almost instantaneously!!! if those devices are online. And I do not need to implement any polling mechanism on those clients.
Keep in mind that if a device is offline too long and there is more than 100 messages waiting in GCM queue for delivery, GCM will discard those message and will send a special message when the devices gets back online. In that case the client must do a full sync with server.
Check also this tutorial to get started with CGM client implementation.
this answers developers who are using the Xamarin framework (see https://stackoverflow.com/questions/40156342/sync-online-offline-data)
A very simple way to achieve this with the xamarin framework is to use the Azure’s Offline Data Sync as it allows to push and pull data from the server on demand. Read operations are done locally, and write operations are pushed on demand; If the network connection breaks, the write operations are queued until the connection is restored, then executed.
The implementation is rather simple:
1) create a Mobile app in azure portal (you can try it for free here https://tryappservice.azure.com/)
2) connect your client to the mobile app.
https://azure.microsoft.com/en-us/documentation/articles/app-service-mobile-xamarin-forms-get-started/
3) the code to setup your local repository:
const string path = "localrepository.db";
//Create our azure mobile app client
this.MobileService = new MobileServiceClient("the api address as setup on Mobile app services in azure");
//setup our local sqlite store and initialize a table
var repository = new MobileServiceSQLiteStore(path);
// initialize a Foo table
store.DefineTable<Foo>();
// init repository synchronisation
await this.MobileService.SyncContext.InitializeAsync(repository);
var fooTable = this.MobileService.GetSyncTable<Foo>();
4) then to push and pull your data to ensure we have the latest changes:
await this.MobileService.SyncContext.PushAsync();
await this.saleItemsTable.PullAsync("allFoos", fooTable.CreateQuery());
https://azure.microsoft.com/en-us/documentation/articles/app-service-mobile-xamarin-forms-get-started-offline-data/
I suggest you also take a look at Symmetricds. it is a SQLite replication library available to android systems. you can use it to synchronize your client and server database, I also suggest to have separate databases on server for each client. Trying to hold the data of all users in one mysql database is not always the best idea. Specially if the user data is going to grow fast.
Lets call it the CUDR Sync problem (I don't like CRUD - because Create/Update/Delete are writes and should be paired together)
The problem may also be looked at from write-offliine-first or write-online-first perspective. The write-offline-approach has a problem with unique identifier conflict, and also multiple network calls for same transaction increasing risk (or cost)...
I personally find write-online-first approach easier to manage (so it will be the single source of truth - from where everything else is synced). The write-online-approach will require not letting users write offline first - they will write offline by getting ok response form online write.
He may read offline first and as soon as network is available get the data from online and update the local database and then update the ui....
One way to avoid the unique identifier conflict would be to use a combination of unique user id + table name or table id + row id (generated by sqlite)... and then use the synced boolean flag column with it.. but still the registration has to be done online first to get the unique id on which all other ids will be generated... here the issue will also be if clocks are not synced - which someone mentioned above...

Sync data between client and server

I have mobile app. Something like to do list or calendar. Teoretically user can have a few devices with that application on a defferent platforms and so on. I would like to create a automatic synchronization between them through a own server. What is the best practice: update all the information or only the changes? On the one hand usually there is no a lot of data when it's about a to do list but who knows?
The correct approach is not date/time as others suggest, as time can go out of sync. The right algorithm is to keep the checksum of the data entries during last synchronization. On next synchronization you compare current checksums with stored ones, then you know whether the entry has been changed on the server, on the client or both.
Our open-source Rethync SDK lets you implement the above approach quite easily and is available for Android (not for iOS at the moment).
I am doing something similar in my application. I have a last modified date field with each entity that I need to sync. Then periodically, I post this data to the server (actual data + date and time). Now the server can do one of two things. It will check the corresponding data on server side and compare the last modified date. If what the server is latest, it will return the latest data in response. If not, it will update its data and send a response indicating what client has is latest.
Of course you can do several optimization. That is, mark the data as "dirty" so you know whether to even send your data to server. If the phone does not have modified data, your sync is basically getting the latest data from server.
Basically server does the heavy lifting and does all the logic necessary to maintain the latest data on its end and send responses to client appropriately.
Good Luck
Best approach is use a time stamp to handle this.
Initial request to server with time stamp value 0.
Server will give the all the data first time with Time-stamp.
Store the Time stamp to sharedpreferences.
In All next request pass the time stamp back to the server
Server will send only those data which are add/update/ after that
given time stamp
That is it.
There is a new alternative to the syncing problem. It's called EnduroSync from Orando Labs. You can sync object data stores between devices on Android and iOS now, with others coming soon.
Full Disclosure: I work for Orando Labs.
The EnduroSync clients allow you to create object data stores on the local devices. The clients are fairly sophisticated - data is modeled as native objects for each client we support (iOS and Android now, more coming). The clients work offline and online. The data is saved to an sqlite database locally.
As you change objects in your model, the deltas are recorded on the device. At some point, you can 'sync' the object data store. Syncing uses a commit/push/pull process (like git), but this is invisible to you. The sync brings your local copy up to date with whatever is on the server, and sends up any changes you have made. Conflicts are resolved using a timestamp based merge, so newer data is not overwritten by older data.
EnduroSync is an online service, so there is no server setup on your end.
There is also a flexible permission system which lets you share the object data stores in a variety of ways. For instance, most applications will have one or more object data stores for each user, for preferences, notes, tags, etc. You can also share object data stores per app, per user type, and with wild cards, many other ways.
So basically you use our client SDK's to model your data on the device. Modeling is with simple objects in the native programming language of the device. If you sign up for the syncing service, you get the syncing also.
Here is another approach.
Issue :I need to have the appointments of doctors syned to client (mobile device) from the server. Now the appointments can drop off or the data could possibly change on the server. Having the client to know what change and sending a request back to server could be an expensive service.
Possible approach : Have the server do the heavy lifting. Keep a table which stores values of time stamp and if a change happened with regard to an appointment - cancellation / reschedule etc. The client would then look at this table to see if anything changed. In reality we don't need to sync anything but only the delta which server can provide to the client based on what it has and what is at Client. There is one aspect which needs to be taken care of is updation of info from client to server and traditional conflict management can be done where client can update the server when a data connectivity between client and server exists.
Essentially the approach is to have only the deltas synced by maintaining a checksum or data change log to PUSH changes to the client.

Android Single Server Multiple Client Data Synchronization

These days i am working on an Android Application and i have problem with data synchronization.
I am using JSON for transferring data.
Now i'll explain my problem.
Assume that you have one server and multiple Android devices which is sending data between each other. We have same database tables not only on android devices but also server.
Operation of system is like this :
At the beginning of day, Android devices must receive data from server.
During day, Android devices can change data in own database but it won't change data on server database.
At the end of day, Android devices will send
inserted,
updated,
deleted data to server.
My problem is starting here..
My code is logging every changes in Android device for sending at the and of day to server.
User A is added this data during day (id=1024 name=testA value=testAvalue)
User B is added this data during day (id=1024 name=testB value=testBvalue)
And then user B wanted to change data name during day (id=1024 name=testC value=testBvalue)
At the and of day, Firstly user A sent data to server. Now server has this data
(id=1024 name=testA value=testAvalue)
Secondly, user B sent data to server. Create(id=1024 name=testB value=testBvalue)
Here this data id will not be 1024 because server database has data which id is 1024. New data id will be 1025.
Now server has 2 data
(id=1024 name=testA value=testAvalue)
*(id=1025 name=testB value=testBvalue)*
And then, server will receive edit command like this
Edit(id=1024 name=testC value=testCvalue)
It will edit 1024. data and it will be wrong
(id=1024 name=testC value=testCvalue)
(id=1025 name=testB value=testBvalue)
This is a typical "he who saves last wins" problem. You have copies of the data out in the field, and you need to aggregate and synchronize updates at the end of the day. The problem here is not so much technical as it is a problem of design:
"How can I accept batched data updates from multiple sources when the updates conflict?"
So, this is really a conflict resolution problem. The "right" solution will depend on your application requirements.
One solution is to assign "ownership" of records to a device, so that only that device can make an update. An example might be a sales force in which representatives are assigned clients. Only the rep assigned to a client can make changes to records associated with that client.
Another solution is to write specific rules for resolving conflicts into the system. Your specific business case will determine exactly how to resolve each conflict. As long as your application cannot make changes to the server in realtime, this is probably your best bet to accept updates to the same record from multiple sources.
One of your problem is in ID collision. You need to select the appropriate column types for your primary keys based on your specific requirements.
Let me suggest few options (they all have some advantages and disadvantages).
Auto-Increment (Identity) columns: the clients only create temporary ids and replace with permanent ones after the server creates new records and send the newly generated ids to the clients.
Or the server will designate the range of ids for each client. E.g. Client A will insert records from 1-100000, Client B: 100001-200000, etc. Once new records riches the range limit the server should issue the new range.
GUIDs: each client will generate unique id on insert using new UUID() command and then your server will be able to insert or update the client changes without any problems.

Up-Sync and Down-Sync in Android?

I am working on a Point of Sale application that needs to be very good syncing mechanism. We have Magento Database.The android device have SQLite local Db. Now we need to sync in the following way:
Local ------Sync To---------------> Server (Up Sync)
Server------Sync To---------------> Locals (Down Sync)
There are 2 things:
1) write-to (How to take care??)
For every change that i do on local ,it will directly sync my local to server
2) write-back (How to take care???)
Whenever there is a change in server, we need to sync all our locals with server.
So, the task is: to identify a server update
And sync our locals.
Like there are 4 devices are running in a store and we have added one new customer through one device. Now i want that the three other devices local db also updated with the information about that customer and server also updated.
I heard about the background threads and run threads after a time interval. But what is the best way to do that which don't affect the application. Also the all Big Retail stores uses the syncing process. what they used for that ?
Any help is appreciated.
It fully depends on you database structure...
you have DATABASE in LOCAL (device) and on SERVER
NOW
You need to have TIMESTAMP fieLd added to the TABLES which actually you want to keep in SYNC.
When ever you will make any changes on server the TIMESTAMP will be updated there and same will be the case for the local database also what you have to do is now.
Run a service in the background which will keep on comparing the TIMESTAMPS of LOCAL with that of SERVER.
Now you have to put condition that if TIMESTAMP of SERVER is newer to that of LOCAL then bring changes from SERVER to LOCAL,
and vice versa will be the condition to take changes from LOCAL to SERVER.
Further you have to decide how frequently you want to run this SERVICE.
ALTERNATIVELY:
You can make the table there on SERVER which will store LAST_SYNCHED date for particular device
Whenever you will login in you device (or any other particular event on which you want it to perform this) the server will check-
when this device was LAST_SYNCHED
then it will compare it to TODAYS DATE
and will check what upadets actualy happened between these dates and
will send the changes to the LOCAL (device)
and vice versa for LOCAL (device) to SERVER
you have to play with TIMESTAMPS rest you can have your own logic how to structure the database.
I told you what I have have observed, when I have been a part of such similar project
EDIT
The above Process defines how to sync the devices with server I mean the strategy..
If you want your devices to get notified from server when to sync instead of hitting the WEB-SERVICE recurrently ..
You can make use of PUSH NOtification, GCM is one of them that send push notification to devices, you can integrate it to your project
For syncing you need to handle following two situations.
How and when to receive server updates
How to identify local non-synced data
How and when to receive server updates:
For receiving updates, we can use GCM (Google Cloud Messaging). If any updates made in server, server sends a push message to all devices. Devices will receive that push and based on the message, devices will download the data from server. (I think this is better approach than continuous hitting service for some particular intervals like polling)
For receiving only updated data from server, server maintains modified_timestamp column for all tables. First time devices will send empty timestamp, so that server sends all data to the device with server timestamp. Device receives the new data and updates local db and saves the latest server timestamp. For next time to get server updates, device will send stored server timestamp then server will send only modified data after that timestamp only. For each response server sends server timestamp, devices needs to store that timestamp and needs to use while calling service.
How to identify local non-synced data:
For sending local updates, local db needs to maintain one 'isSynced' column in tables. If any row modified in local isSynced will be false, after successful syncing local data to server isSynced will be true. so that we can handle local data up to date with server.
Updated:
You can find more information on this developer link
Have you considered using commercial solution?
http://www.mobeelizer.com/ seems like what you want to achieve. There are probably many other.
Note: no affiliation with that company.
I would say that the problem statement is incomplete. In above described setup what is missing is what actually you are going to synchronise.
Usual case in POS is that there exist few indices (id,value,...) tables that shall be distributed from central server to the client devices. In most cases it is price list, stock list, etc. Those tables should rarely be modified on client devices (actually could but then has to be redistributed from central server and acknowledged by client devices).
The other direction tend to be also pretty straightforward on client device you generate bills or invoices. These are again local stuff that shall be propagated towards server. Thus you actually store them locally and at sync point dispatch them to the server. Later on you might receive your own items back from server as an acknowledge.
EDIT: to detect changes, on-write timestamps as mentioned above is a must.
So far above described is the data flow.
Next you have to move into solution domain and implement these rules. There is couple of sync approaches (i.e.SyncML). On the other hand keeping it simple rulez. Thus the main concern should be some kind of locking and queueing that makes the thing robust.
It could also use the agent based client, in such case each device has it own agent (could be replica of last known state of the device db) but I would consider this as an advanced feature that might come in future release:-)
I am also working on the sales app in which i have to my local goals to server and server goals to my local goals
My proceder is that when ever my app is started i get the latest data from my server of my all my member and update my local data base with this data and when ever i change data in my local data base also update on sever side
also i used a sync button which will fetch latest data from the server if my team member changes its goal or profile
IF you want updated data on all the devices, why don't you use remote database only, why are you introducing local database for this.
For your case i will suggest you to work with only remote database directly so the things can be done real time.

Client Server data sharing issue

I have a webservice with a mobile application. The user, with the application shares the data on the server - have a constraint in the DB that the name of the shared object is unique per user. Also application stores locally all created data (by the user - that is also shared).
I have the following scenario:
User creates data with data-name X.
User shares this data.
Server has in it DB data-name X for this user
User has a new phone and install the application.
NO INTERNET CONNECTION
user creates again data with data-name X.
it is stored only locally - since NO INTERNET CONNECTION.
Internet connection restored.
Now a BG service run and start sharing all u shared data - in the BG.
The problem found because of the constraint.
What should be done to solve the problem? I can popup a new window saying that it already shared and ask the user to rename/overwrite it, give option to D/L this data to its local DB etc. But since it is done in the BG - is it user-friendly to show this popup?
Any other ideas?
Probably there is a common way of doing it.
I can really use some help reagrding this issue.
Here is how a recent app I've done handles this:
A user creates a new record on the mobile device. The new record gets assigned a negative primary key _id number.
The app checks for internet connection and, if a connection exists, the app does an HTTP post to the server, creating the record on the server side. The server then sends back a response creating the new PK _id which gets updated in the app.
If no internet connection, the app creates a record in my db_changes table containing the table name and the PK _id of the new record.
A service runs in 5 minute increments which gets updates from the server and posts new data to the server. The db_changes table is polled each time for any existing inserts or updates that yet to be posted to the server.
Once successful, the record from the db_changes table gets deleted.
In my situation, this works perfectly.
for this, you generally don't use a name or something of that sort, but UUIDs - i.e. 32-to-64 character long random strings that uniquely identify an object. When you create an object, just create a UUID on the device and sync this to the server. Heres the documentation of the UUID class in android.
While it theoretically feasible to have the same UUIDs, it's something you generally don't worry too much about it, as stated here: http://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates
For iOS, you can use the CFUUID class to generate UUID
Another name for UUIDs ist GUID, Globaly unique identifiers. Hence, you remove any kind of uniqueness constraint.

Categories

Resources