RxJava - how to see Flowable and backpressure in action? - android

I am writing a sample app, that processes the bitmap. The process can be controlled by a slider, so when the slider position is changed, I generate another bitmap.
When the user drags the slider, it emits some 10-20 events per second. Processing the bitmap takes about 1 second, so the processing queue becomes quickly stuck with requests.
It seems like a good backpressure example to me, but I couldn't figure out how to use stuff like Flowable and BackpressureStrategy to handle it properly. Moreover, I couldn't make this small sample work:
val pubsub = PublishSubject.create<Int>()
pubsub
.toFlowable(BackpressureStrategy.LATEST)
.observeOn(computation())
.subscribe {
Timber.d("consume %d - %s", it, Thread.currentThread().name)
Thread.sleep(3000)
}
for (i in 0 .. 1000) {
Timber.d("emit %d - %s", i, Thread.currentThread().name)
pubsub.onNext(i)
}
Well, I expect this code to emit 1000 integers through PublishSubject, but as long as processing each takes 3 seconds, 999 of integers should be dropped, only "0" and "1000" should be processed...
But in the logs I see, that all my integers are slowly processed, one by one, and the backpressure strategy is ignored. Actually, toFlowable(...) expression seems to do nothing. With or without backpressure, I see 1000 emissions followed by the several minutes of consumption.
What am I missing here? How can I drop the intermediate elements and consume only the latest available?
solved:
observeOn(computation()) is actually observeOn(computation(), delayErrors = false, bufferSize = 128). To see real backpressure, decrease the bufferSize, when you call observeOn(...)

This might be related to observeOn(computation()). Depending on the backing thread, this might be throttled automatically. The emission of the items is queued. Therefore there's no backpressure on the Flowable.
Try putting these thread changes before toFlowable(LATEST) or use a different Scheduler which is not as forgiving or put even more items to pubsub.
Also you could use observeOn(Scheduler scheduler, boolean, int) to enforce a bufferSize.

Related

Collect flow but only any new values, not the currently existing value

Currently struggling with this one, and so far no combination of SharedFlow and StateFlow have worked.
I have a flow that might have already started with a value, or not.
Using that flow I want to collect any new values that are emitted after I start collecting.
At this moment all my attempts have always failed, no matter what I try it always gets the current value as soon as I start collecting.
An example of what I am trying to achieve:
Given a Flow (could be any type, Int is just for simplification)
with the following timeline: value 4 is emitted | value 2 is emitted | value 10 is emitted
I want to be able to do the following:
If I start collecting after value 4 has already been emitted, I want to only receive anything after that, in this case it would collect 2 and 10 once emitted
If I start collecting after value 2 then it would only receive the 10
If I start collecting before 4 then it would receive 4, 2 and 10
Tried SharedFlow and Stateflow, tried with replay = 0 and WhileSubscribed, no combination I could find would do what I am looking for.
The only workaround so far that I found was to locally register the time I start my .collect{ } and compare with the start time of the item I receive in the collect. In this case I have the object I am using has a specific origin time, but this workaround will not work for everything like the example above with Integers.
EDIT: Adding implementation example as requested for SharedFlow
This is tied to a Room database call that returns a Flow<MyObject>
MyFragment.kt
lifecycleScope.launch(Dispatchers.IO) {
viewModel.getMyObjectFlow.shareIn(
viewModel.viewModelScope, // also tried with fragment lifecyclescope
SharingStarted.WhileSubscribed(), // also tried with the other 2 options
replay = 0,
).collect{
...
}
}
You have a misconception of how flows work. They always emit only after you start collecting. They emit on-demand. Let's get this example:
val flow1 = flow {
println("Emitting 1")
emit(1)
delay(10.seconds)
println("Emitting 2")
emit(2)
}
delay(5.seconds)
println("Start collecting")
flow1.collect {
println("Collected: $it")
}
The output is:
Start collecting
Emitting 1
Collected: 1
not:
Emitting 1
Start collecting
Collected: 1
This is because flow starts emitting only after you start collecting it. Otherwise, it would have nowhere to emit.
Of course, there are flows which emit from some kind of a cache, queue or a buffer. For example shared flows do this. In that case it looks like you collect after emitting. But this is not really the case. Technically speaking, it works like this:
val buffer = listOf(1 , 2, 3)
val flow1 = flow {
buffer.forEach {
println("Emitting $it")
emit(it)
}
}
It still emits after you start collecting, but it just emits from the cache. Of course, the item was added to the cache before you started collecting, but this is entirely abstracted from you. You can't know why a flow emitted an item. From the collector perspective it always emitted just now, not in the past. Similarly, you can't know if a webserver read the data from the DB or a cache - this is abstracted from you.
Summing up: it is not possible to collect only new items from just any flow in an universal way. Flows in general don't understand the concept of "new items". They just emit, but you don't know why they do this. Maybe they somehow generate items on-the-fly, maybe they passively observe external events or maybe they re-transmit some items that they collected from another flow. You don't know that.
While developing your solution, you need to understand what was the source of items and develop your code accordingly. For example, if the source is a regular cold flow, then it never starts doing anything before you start collecting. If the source is a state flow, you can just drop the first item. If it is a shared flow or a flow with some replay buffer, then the situation is more complicated.
One possible approach would be to start collecting earlier than we need, initially ignore all collected items and at some point in time start processing them. But this is still far from perfect and it may not work as we expect.
It doesn't make sense to use shareIn at the use site like that. You're creating a shared Flow that cannot be shared because you don't store the reference for other classes to access and use.
Anyway, the problem is that you are creating the SharedFlow at the use site, so your shared flow only begins collecting from upstream when the fragment calls this code. If the upstream flow is cold, then you will be getting the first value emitted by the cold flow.
The SharedFlow should be created in the ViewModel and put in a property so each Fragment can collect from the same instance. You'll want to use SharingStarted.Eagerly to prevent the cold upstream flow from restarting from the beginning when there are new subscribers after a break.

How to keep track of the number of emits in flowable?

Let's say I have a flowable, that some view is subscribed to and it's listening to the changes. I would like to add a custom method based on only the first emit of the flowable, but also keeping the other methods that listen to the changes. What is the best way to approach it?
The naive approach I have is to duplicate the flowable and convert it to Single or Completable to get the results, but it seems redundant.
Thank you.
Use .take(1). BTW also make sure that flowable is shared (otherwise some observers will miss events).
I think you can use share operator for that. Share operator makes a Connectable Observable. And then Connectable Observable publishes items each subscribes.
val o = Flowable.fromArray(1, 2, 3, 4, 5)
.map {
println("heavy operation")
it + it
}
.share() // publish the changes
.subscribeOn(Schedulers.computation()) // for testing. change what you want
o.take(1).subscribe { println("Special work: $it") } // take one
o.subscribe { println("Normal work: $it") }
Result
heavy operation
Special work: 2
Normal work: 2
heavy operation
Normal work: 4
heavy operation
Normal work: 6
heavy operation
Normal work: 8
heavy operation
Normal work: 10

Avoiding same-pool deadlocks when using Flowable in Reactive Extensions

While subscribing to a Reactive Extensions Flowable stream, I noticed the stream halts/hangs (no more future items are emitted, and no error is returned) after 128 items have been returned.
val download: Flowable<DownloadedRecord> = sensor.downloadRecords()
download
.doOnComplete { Log.i( "TEST", "Finished!" ) }
.subscribe(
{ record ->
Log.i( "TEST", "Got record: ${record.record.id}; left: ${record.recordsLeft}" )
},
{ error ->
Log.i( "TEST", "Error while downloading records: $error" )
} )
Most likely, this is related to Reactive Extensions. I discovered the default buffer size of Flowable is set to 128; unlikely to be a coincidence.
While trying to understand what is happening, I ran into the following documentation on Flowable.subscribeOn.
If there is a create(FlowableOnSubscribe, BackpressureStrategy) type source up in the chain, it is recommended to have requestOn false to avoid same-pool deadlock because requests may pile up behind an eager/blocking emitter.
Although I do not quite understand what a same-pool deadlock is in this situation, it looks like something similar is happening to my stream.
1. What is a same-pool deadlock in Reactive Extensions? What would be a minimal code sample to recreate it (on Android)?
Currently at a loss, I tried applying .subscribeOn( Schedulers.io(), false ) before .subscribe, without really understanding what this does, but my stream still locks up after 128 items have been emitted.
2. How could I go about debugging this issue, and how/where can it be resolved?
What is a same-pool deadlock in Reactive Extensions?
RxJava uses single threaded executors in the standard schedulers. When a blocking or eager source is emitting items, it occupies this single thread and even though the downstream requests more, subscribeOn will schedule those requests behind the currently running/blocking code that then never gets notified about the new opportunities.
What would be a minimal code sample to recreate it (on Android)?
Why would you want code that deadlocks?
I tried applying .subscribeOn( Schedulers.io(), false )
What is your actual flow? You likely applied subscribeOn too far from the source and thus it has no effect. The most reliable is to put it right next to create.
How could I go about debugging this issue, and how/where can it be resolved?
Putting doOnNext and doOnRequest at various places and see where signals disappear.

How to debounce event's stream with long processing task in RxJava

I'm working on the app where we download a set of files and process all of them on a fly. Here is what I'm doing:
Observable.from(fileUrls)
.compose(downloadAndPersistFiles())
.compose(processPersistedData())
.subscribe()
fileUrls is the set of files urls. downloadAndPersistFiles extracts data from downloaded files and persist them into a local db. It returns an event every time when I successfully downloaded and persisted file's data. Moreover, I use Schedulers.io() to spin up a pool of threads to download those files as fast as possible.
private <T> Observable.Transformer<T, T> downloadAndPersistFiles() {
return observable -> observable
.flatMap(fileUrls -> Observable.from(fileUrls)
.subscribeOn(Schedulers.io())
.compose(download())
.compose(saveToDb());
}
For every successfully downloaded and processed file, I run an extra task, which is basically a set of queries against db to extract an additional data.
private <T> Observable.Transformer<T, T> processPersistedData() {
return observable -> observable
//modified place - debounce, throttleFirst, throttleLast etc
.flatMap(file -> Observable.from(tasks)
.compose(runQueryToExtractData())
.toList()
.flatMap(ignored -> Observable.just(file)));
}
I'm aware it doesn't scale well, data set in the database grows so queries take more and more time.
processPersistedData is invoked for every event from downloadAndPersistFiles (it uses a pool of threads), so at some point, there are a few processPersistedData operations running in parallel and I want to limit it to one only.
Here is what I've tried so far:
debounce with timeout - it adds an extra delay after every downloaded file and if downloading files takes less time than timeout, the stream will starve until there is a file big enough that its downloading and persisting
take longer
throttleLast - it adds an extra delay after every downloaded file because I need to wait until time window ends
throttleFirst - no delay for the first file, but I may miss a few last events - the best solution I found so far. The main problem I have here is that I can't synchronise downloading files and running queries - at the beginning queries are super fast, so I want to use a short timeout as possible, but over time they may take over 10-20s, so obviously I would like to slow down at this time. Moreover it doesn't prevent running two
debounce with selector - it sounds perfect! I could use processPersistedData as a selector, which will debounce all events when processPersistedData is running and consume any new events as soon as it finished but after I've tried it processPersistedData ran every time - new stream processPersistedData like was created for every event.
Do you have any idea other ideas how this problem could be the approach? Or did I miss when I've tried debounce with selector?
The flatMap() operator takes an additional parameter that constrains the number of parallel operations.
private <T> Observable.Transformer<T, T> processPersistedData() {
return observable -> observable
.flatMap(input -> Observable.from(tasks)
//modified place - debounce, throttleFirst, throttleLast etc
.compose(runQueryToExtractData())
.toList()
.flatMap(ignored -> Observable.just(input)), 1);
}
The 1 indicates that flatMap() will only process a single item at a time.
As an aside, where you have compose(runQueryToExtractData()), you might want to use Completable instead.

RxJava onBackpressureBuffer not emitting items

I've witnessed a weird behavior with onBackpressureBuffer, I'm not sure if it is a valid behavior or a bug.
I'm having a tcp call that is emitting items in a certain rate (using streaming and inputStream but that just for some info)
On top of it I've created an observable using create that will emit an item each time it is ready.
Let's call it messages().
Then I'm doing this:
messages()
.subscribeOn(Schedulers.io())
.observeOn(AndroidSchedulers.mainThread())
.subscribe({//do some work});
I've noticed using analytics tools that MissingBackPressureException is thrown rarely, so I've added onBackpressureBuffer to the call.
If I'm adding it after observeOn:
messages()
.subscribeOn(Schedulers.io())
.observeOn(AndroidSchedulers.mainThread())
.onBackpressureBuffer()
.subscribe({//do some work})
everyting works fine, but it means it will buffer only after it get's to the UI Main thread, so I prefered it to be like this:
messages()
.onBackpressureBuffer()
.subscribeOn(Schedulers.io())
.observeOn(AndroidSchedulers.mainThread())
.subscribe({//do some work});
And that where things start to get weird.
I've noticed the while messages() keeps emitting item, at some point they will stop being delivered to the subscriber.
More precisely after exactly 16 items, what is apparently happing is that the buffer will start holding items without passing them forward.
Once I cancel the messages() with some sort of timeout mechanism, it will cause messages() to emit onError() and the buffer will emit immediately all the items it has kept (they will be handled).
I've checked to see if it is the subscriber fault of doing too much work but it not, he is finished and still he doesn't get the items...
I've also tried using the request(n) method in the subscriber asking for one item after onNext() is finished but the buffer doesn't behave.
I suspect that the messaging system of the Main Android UI Thread with the backpressure causing this, but I can't explain why.
can someone explain why this is happening? is this a bug or a valid behaviour?
Tnx!
Not knowing how messages(), based on the behavior described, this is a similar same-pool deadlock as with this question
The workaround, what you didn't try, is to put the .onBackpressureBuffer between the subscribeOn and observeOn.
messages()
.subscribeOn(Schedulers.io())
.onBackpressureBuffer() // <---------------------
.observeOn(AndroidSchedulers.mainThread())
.subscribe({//do some work});
Question is different but answer comes down to the same thing.
The implementation of observeOn constructor: OperatorObserveOn(Scheduler scheduler, boolean delayError, int bufferSize):
public OperatorObserveOn(Scheduler scheduler, boolean delayError, int bufferSize) {
this.scheduler = scheduler;
this.delayError = delayError;
this.bufferSize = (bufferSize > 0) ? bufferSize : RxRingBuffer.SIZE;
}
The last line points to the buffer size.
The buffer size on Android is 16.
The solution is simply passing a bigger buffer size to observeOn(Scheduler scheduler, int bufferSize) operator:
messages()
.observeOn(AndroidSchedulers.mainThread(), {buffer_size})
Be careful not to put too high value, as Android has limited memory.

Categories

Resources