I'm working on the app where we download a set of files and process all of them on a fly. Here is what I'm doing:
Observable.from(fileUrls)
.compose(downloadAndPersistFiles())
.compose(processPersistedData())
.subscribe()
fileUrls is the set of files urls. downloadAndPersistFiles extracts data from downloaded files and persist them into a local db. It returns an event every time when I successfully downloaded and persisted file's data. Moreover, I use Schedulers.io() to spin up a pool of threads to download those files as fast as possible.
private <T> Observable.Transformer<T, T> downloadAndPersistFiles() {
return observable -> observable
.flatMap(fileUrls -> Observable.from(fileUrls)
.subscribeOn(Schedulers.io())
.compose(download())
.compose(saveToDb());
}
For every successfully downloaded and processed file, I run an extra task, which is basically a set of queries against db to extract an additional data.
private <T> Observable.Transformer<T, T> processPersistedData() {
return observable -> observable
//modified place - debounce, throttleFirst, throttleLast etc
.flatMap(file -> Observable.from(tasks)
.compose(runQueryToExtractData())
.toList()
.flatMap(ignored -> Observable.just(file)));
}
I'm aware it doesn't scale well, data set in the database grows so queries take more and more time.
processPersistedData is invoked for every event from downloadAndPersistFiles (it uses a pool of threads), so at some point, there are a few processPersistedData operations running in parallel and I want to limit it to one only.
Here is what I've tried so far:
debounce with timeout - it adds an extra delay after every downloaded file and if downloading files takes less time than timeout, the stream will starve until there is a file big enough that its downloading and persisting
take longer
throttleLast - it adds an extra delay after every downloaded file because I need to wait until time window ends
throttleFirst - no delay for the first file, but I may miss a few last events - the best solution I found so far. The main problem I have here is that I can't synchronise downloading files and running queries - at the beginning queries are super fast, so I want to use a short timeout as possible, but over time they may take over 10-20s, so obviously I would like to slow down at this time. Moreover it doesn't prevent running two
debounce with selector - it sounds perfect! I could use processPersistedData as a selector, which will debounce all events when processPersistedData is running and consume any new events as soon as it finished but after I've tried it processPersistedData ran every time - new stream processPersistedData like was created for every event.
Do you have any idea other ideas how this problem could be the approach? Or did I miss when I've tried debounce with selector?
The flatMap() operator takes an additional parameter that constrains the number of parallel operations.
private <T> Observable.Transformer<T, T> processPersistedData() {
return observable -> observable
.flatMap(input -> Observable.from(tasks)
//modified place - debounce, throttleFirst, throttleLast etc
.compose(runQueryToExtractData())
.toList()
.flatMap(ignored -> Observable.just(input)), 1);
}
The 1 indicates that flatMap() will only process a single item at a time.
As an aside, where you have compose(runQueryToExtractData()), you might want to use Completable instead.
Related
For the first time I want to retrieve data from server cache it and next times show data on UI from local storage and request from server and update local storage and UI as
I have tried
(getCachedData()).concatWith(getRemoteData())
getCachedData returns Single
return apiSeResource.getData()
.doAfterSuccess { response ->
saveData(response.body())
}
}
.onErrorReturn {
return#onErrorReturn emptyList()
}
}```
The problem with `concat` is that the subsequent observable doesn't even start until the first Observable completes. That can be a problem. We want all observables to start simultaneously but produce the results in a way we expect.
I can use `concatEager` : It starts both observables but buffers the result from the latter one until the former Observable finishes.
Sometimes though, I just want to start showing the results immediately.
I don't necessarily want to "wait" on any Observable. In these situations, we could use the `merge` operator.
However the problem with merge is: if for some strange reason an item is emitted by the cache or slower observable after the newer/fresher observable, it will overwrite the newer content.
So none of mentioned above solution is not proper ,what is your solution?
Create 2 data sources one local data source and one remote and use the flatMap for running the Obervables. You can publish the data from the cache and when u get data from remote save data to cache and publish.
Or you can also try Observable.merge(dataRequestOne, dataRequestTwo) . run both the Observables on different threads
I am writing a sample app, that processes the bitmap. The process can be controlled by a slider, so when the slider position is changed, I generate another bitmap.
When the user drags the slider, it emits some 10-20 events per second. Processing the bitmap takes about 1 second, so the processing queue becomes quickly stuck with requests.
It seems like a good backpressure example to me, but I couldn't figure out how to use stuff like Flowable and BackpressureStrategy to handle it properly. Moreover, I couldn't make this small sample work:
val pubsub = PublishSubject.create<Int>()
pubsub
.toFlowable(BackpressureStrategy.LATEST)
.observeOn(computation())
.subscribe {
Timber.d("consume %d - %s", it, Thread.currentThread().name)
Thread.sleep(3000)
}
for (i in 0 .. 1000) {
Timber.d("emit %d - %s", i, Thread.currentThread().name)
pubsub.onNext(i)
}
Well, I expect this code to emit 1000 integers through PublishSubject, but as long as processing each takes 3 seconds, 999 of integers should be dropped, only "0" and "1000" should be processed...
But in the logs I see, that all my integers are slowly processed, one by one, and the backpressure strategy is ignored. Actually, toFlowable(...) expression seems to do nothing. With or without backpressure, I see 1000 emissions followed by the several minutes of consumption.
What am I missing here? How can I drop the intermediate elements and consume only the latest available?
solved:
observeOn(computation()) is actually observeOn(computation(), delayErrors = false, bufferSize = 128). To see real backpressure, decrease the bufferSize, when you call observeOn(...)
This might be related to observeOn(computation()). Depending on the backing thread, this might be throttled automatically. The emission of the items is queued. Therefore there's no backpressure on the Flowable.
Try putting these thread changes before toFlowable(LATEST) or use a different Scheduler which is not as forgiving or put even more items to pubsub.
Also you could use observeOn(Scheduler scheduler, boolean, int) to enforce a bufferSize.
I am fairly new to rxJava, trying stuff by my own. I would like to get some advice if I'm doing it right.
Usecase: On the first run of my app, after a successful login I have to download and save in a local database several dictionaries for the app to run with. The user has to wait till the downloading process finishes.
Current solution: I am using retrofit 2 with rxjava adapter in order to get the data. I am bundling all Observables into one using the zip operator. After all downloads are done the callback triggers and saving into database begins.
Nothing speaks better than some code:
Observable<List<OrderType>> orderTypesObservable = backendService.getOrderTypes();
Observable<List<OrderStatus>> orderStatusObservable = mockBackendService.getOrderStatuses();
Observable<List<Priority>> prioritiesObservable = backendService.getPriorities();
return Observable.zip(orderTypesObservable,
orderStatusObservable,
prioritiesObservable,
(orderTypes, orderStatuses, priorities) -> {
orderTypeDao.deleteAll();
orderTypeDao.insertInTx(orderTypes);
orderStatusDao.deleteAll();
orderStatusDao.insertInTx(orderStatuses);
priorityDao.deleteAll();
priorityDao.insertInTx(priorities);
return null;
});
Questions:
Should I use the zip operator or is there a better one to fit my cause?
It seems a bit messy doing it this way. This is only a part of the code, I have currently 12 dictionaries to load. Is there a way to refactor it?
I would like to insert a single dictionary data as soon as it finishes downloading and have a retry mechanism it the download fails. How can I achieve that?
I think in your case it's better to use Completable, because for you matter only tasks completion.
Completable getAndStoreOrderTypes = backendService.getOrderTypes()
.doOnNext(types -> *store to db*)
.toCompletable();
Completable getAndStoreOrderStatuses = backendService.getOrderStatuses()
.doOnNext(statuses -> *store to db*)
.toCompletable();
Completable getAndStoreOrderPriorities = backendService.getOrderPriorities()
.doOnNext(priorities -> *store to db*)
.toCompletable();
return Completable.merge(getAndStoreOrderTypes,
getAndStoreOrderStatuses,
getAndStoreOrderPriorities);
If you need serial execution - use Completable.concat() instead of merge()
a retry mechanism if the download fails
Use handy retry() operator
It is not good, to throw null value object into Rx Stream (in zip your return null, it is bad).
Try to not doing that.
In your case, you have 1 api call and 2 actions to save response into the database, so you can create the chain with flatMap.
It will look like:
backendService.getOrderTypes()
.doOnNext(savingToDatabaseLogic)
.flatMap(data -> mockBackendService.getOrderStatuses())
.doOnNext(...)
.flatMap(data -> backendService.getPriorities())
.doOnNext(...)
if you want to react on error situation, in particular, observable, you can add onErrorResumeNext(exception->Observable.empty()) and chain will continue even if something happened
Also, you can create something like BaseDao, which can save any Dao objects.
The case I'm into right now is quite hard to explain so I will write a simpler version just to explain the issue.
I have an Observable.from() which emits a sequence of files defined by an ArrayList of files. All of these files should be uploaded to a server. For that I have an function that does the job and returns an Observable.
Observable<Response> uploadFile(File file);
When I run this code it gets crazy, the Observable.from() emits all of the files and they are uploaded all at ones, or at least for a max of threads it can handle.
I want to have a max of 2 file uploads in parallel. Is there any operator that can handle this for me?
I tried buffer, window and some others but they seems to only emit two items together instead of having two parallel file uploads constantly. I also tried to set a max threads pool on the uploading part, but this cannot be used in my case.
There should be a simple operator for this right? Am I missing something?
I think all files are uploaded in parallel because you're using flatMap(), which executes all transformations simultaneously. Instead you should use concatMap(), which runs one transformation after another. And to run two parallel uploads you need to call window(2) on you files observable and then invoke flatMap() as you did in your code.
Observable<Response> responses =
files
.window(2)
.concatMap(windowFiles ->
windowFiles.flatMap(file -> uploadFile(file));
);
UPDATE:
I found a better solution, which does exactly what you want. There's an overload of flatMap() that accepts the max number of concurrent threads.
Observable<Response> responses =
files
.onBackpressureBuffer()
.flatMap(index -> {
return uploadFile(file).subscribeOn(Schedulers.io());
}, 2);
I'm still fairly new to RxJava and I'm using it in an Android application. I've read a metric ton on the subject but still feel like I'm missing something.
I have the following scenario:
I have data stored in the system which is accessed via various service connections (AIDL) and I need to retrieve data from this system (1-n number of async calls can happen). Rx has helped me a ton in simplifying this code. However, this entire process tends to take a few seconds (upwards of 5 seconds+) therefore I need to cache this data to speed up the native app.
The requirements at this point are:
Initial subscription, the cache will be empty, therefore we have to wait the required time to load. No big deal. After that the data should be cached.
Subsequent loads should pull the data from cache, but then the data should be reloaded and the disk cache should be behind the scenes.
The Problem: I have two Observables - A and B. A contains the nested Observables that pull data from the local services (tons going on here). B is much simpler. B simply contains the code to pull the data from disk cache.
Need to solve:
a) Return a cached item (if cached) and continue to re-load the disk cache.
b) Cache is empty, load the data from system, cache it and return it. Subsequent calls go back to "a".
I've had a few folks recommend a few operations such as flatmap, merge and even subjects but for some reason I'm having trouble connecting the dots.
How can I do this?
Here are a couple options on how to do this. I'll try to explain them as best I can as I go along. This is napkin-code, and I'm using Java8-style lambda syntax because I'm lazy and it's prettier. :)
A subject, like AsyncSubject, would be perfect if you could keep these as instance states in memory, although it sounds like you need to store these to disk. However, I think this approach is worth mentioning just in case you are able to. Also, it's just a nifty technique to know.
AsyncSubject is an Observable that only emits the LAST value published to it (A Subject is both an Observer and an Observable), and will only start emitting after onCompleted has been called. Thus, anything that subscribes after that complete will receive the next value.
In this case, you could have (in an application class or other singleton instance at the app level):
public class MyApplication extends Application {
private final AsyncSubject<Foo> foo = AsyncSubject.create();
/** Asynchronously gets foo and stores it in the subject. */
public void fetchFooAsync() {
// Gets the observable that does all the heavy lifting.
// It should emit one item and then complete.
FooHelper.getTheFooObservable().subscribe(foo);
}
/** Provides the foo for any consumers who need a foo. */
public Observable<Foo> getFoo() {
return foo;
}
}
Deferring the Observable. Observable.defer lets you wait to create an Observable until it is subscribed to. You can use this to allow the disk cache fetch to run in the background, and then return the cached version or, if not in cache, make the real deal.
This version assumes that your getter code, both cache fetch and non- catch creation, are blocking calls, not observables, and the defer does work in the background. For example:
public Observable<Foo> getFoo() {
Observable.defer(() -> {
if (FooHelper.isFooCached()) {
return Observable.just(FooHelper.getFooFromCacheBlocking());
}
return Observable.just(FooHelper.createNewFooBlocking());
}).subscribeOn(Schedulers.io());
}
Use concatWith and take. Here we assume our method to get the Foo from the disk cache either emits a single item and completes or else just completes without emitting, if empty.
public Observable<Foo> getFoo() {
return FooHelper.getCachedFooObservable()
.concatWith(FooHelper.getRealFooObservable())
.take(1);
}
That method should only attempt to fetch the real deal if the cached observable finished empty.
Use amb or ambWith. This is probably one the craziest solutions, but fun to point out. amb basically takes a couple (or more with the overloads) observables and waits until one of them emits an item, then it completely discards the other observable and just takes the one that won the race. The only way this would be useful is if it's possible for the computation step of creating a new Foo to be faster than fetching it from disk. In that case, you could do something like this:
public Observable<Foo> getFoo() {
return Observable.amb(
FooHelper.getCachedFooObservable(),
FooHelper.getRealFooObservable());
}
I kinda prefer Option 3. As far as actually caching it, you could have something like this at one of the entry points (preferably before we're gonna need the Foo, since as you said this is a long-running operation) Later consumers should get the cached version as long as it has finished writing. Using an AsyncSubject here may help as well, to make sure we don't trigger the work multiple times while waiting for it to be written. The consumers would only get the completed result, but again, that only works if it can be reasonably kept around in memory.
if (!FooHelper.isFooCached()) {
getFoo()
.subscribeOn(Schedulers.io())
.subscribe((foo) -> FooHelper.cacheTheFoo(foo));
}
Note that, you should either keep around a single thread scheduler meant for disk writing (and reading) and use .observeOn(foo) after .subscribeOn(...), or otherwise synchronize access to the disk cache to prevent concurrency issues.
I’ve recently published a library on Github for Android and Java, called RxCache, which meets your needs about caching data using observables.
RxCache implements two caching layers -memory and disk, and it counts with several annotations in order to configure the behaviour of every provider.
It is highly recommended to use with Retrofit for data retrieved from http calls. Using lambda expression, you can formulate expression as follows:
rxCache.getUser(retrofit.getUser(id), () -> true).flatmap(user -> user);
I hope you will find it interesting :)
Take a look at the project below. This is my personal take on things and I have used this pattern in a number of apps.
https://github.com/zsiegel/rxandroid-architecture-sample
Take a look at the PersistenceService. Rather than hitting the database (or MockService in the example project) you could simply have a local list of users that are updated with the save() method and just return that in the get().
Let me know if you have any questions.