My app is collecting sensor values from the accelerometer with the highest possible sample rate (~200 Hz on my device) and saves the values inside a Room database. I also want to frequently update some graphs with the latest measurements, lets say a refresh rate of 5 times per second. Ever since the app also collects the linear acceleration (without g) also with ~200 Hz (so two sensors each with roughly 200Hz inserting values into the database) I noticed a strong decrease in the apps performance and I have a lag of a few seconds between collected acceleration values and them showing up in the plot.
From the profiler my guess is that the RxComputationThread is the bottleneck since it is active almost all the time due to the Flowables.
I use sample() to limit the receiver updates since my graphs do not need to update super often. This led to an acceptable performance, when I just collected one sensor. I saw that RxJava provides an interval() method to limit the emit frequency from an emitter side, but that does not seem to available to me ? (Unresolved reference).
Maybe someone has an idea how to improve the performance? I like the concepts of RxJava and Room in general and would like to stick with them, but I am pretty much stuck at this point.
Here is the code I use to observe the Room SQL table and update the graphs:
// Observe changes to the datasource and create a new subscription if necessary
sharedViewModel.dataSource.observe(viewLifecycleOwner, Observer { source ->
Log.d("TAG", "Change observed!")
when (source) {
"acc" -> {
val disposableDataSource =
sharedViewModel.lastSecondsAccelerations
.sample(200, TimeUnit.MILLISECONDS)
.onBackpressureDrop()
.subscribeOn(Schedulers.io())
.subscribe { lastMeasurements ->
Log.d("TAG", Thread.currentThread().name)
if (sharedViewModel.isReset.value == true && lastMeasurements.isNotEmpty()) {
val t =
lastMeasurements.map { (it.time.toDouble() * 1e-9) - (lastMeasurements.last().time.toDouble() * 1e-9) }
val accX = lastMeasurements.map { it.accX.toDouble() }
val accY = lastMeasurements.map { it.accY.toDouble() }
val accZ = lastMeasurements.map { it.accZ.toDouble() }
// Update plots
updatePlots(t, accX, accY, accZ)
}
}
compositeDisposable.clear()
compositeDisposable.add(disposableDataSource)
}
"lin_acc" -> {
val disposableDataSource =
sharedViewModel.lastSecondsLinAccelerations
.sample(200, TimeUnit.MILLISECONDS)
.onBackpressureDrop()
.subscribeOn(Schedulers.io())
.subscribe { lastMeasurements ->
Log.d("TAG", Thread.currentThread().name)
if (sharedViewModel.isReset.value == true && lastMeasurements.isNotEmpty()) {
val t =
lastMeasurements.map { (it.time.toDouble() * 1e-9) - (lastMeasurements.last().time.toDouble() * 1e-9) }
val accX = lastMeasurements.map { it.accX.toDouble() }
val accY = lastMeasurements.map { it.accY.toDouble() }
val accZ = lastMeasurements.map { it.accZ.toDouble() }
// Update plots
updatePlots(t, accX, accY, accZ)
}
}
compositeDisposable.clear()
compositeDisposable.add(disposableDataSource)
}
}
})
The query for getting the last 10 seconds of measurements
#Query("SELECT * FROM acc_measurements_table WHERE time > ((SELECT MAX(time) from acc_measurements_table)- 1e10)")
fun getLastAccelerations(): Flowable<List<AccMeasurement>>
Thanks for your comments, I figured out now, what the bottleneck was. The issue was the huge amount of insertion calls, not too surprising. But it is possible to improve the performance by using some kind of buffer to insert multiple rows at a time.
This is what I added, in case someone runs in the same situation:
class InsertHelper(private val repository: Repository){
var compositeDisposable = CompositeDisposable()
private val measurementListAcc: FlowableList<AccMeasurement> = FlowableList()
private val measurementListLinAcc: FlowableList<LinAccMeasurement> = FlowableList()
fun insertAcc(measurement: AccMeasurement) {
measurementListAcc.add(measurement)
}
fun insertLinAcc(measurement: LinAccMeasurement) {
measurementListLinAcc.add(measurement)
}
init {
val disposableAcc = measurementListAcc.subject
.buffer(50)
.subscribe {measurements ->
GlobalScope.launch {
repository.insertAcc(measurements)
}
measurementListAcc.remove(measurements as ArrayList<AccMeasurement>)
}
val disposableLinAcc = measurementListLinAcc.subject
.buffer(50)
.subscribe {measurements ->
GlobalScope.launch {
repository.insertLinAcc(measurements)
}
measurementListLinAcc.remove(measurements as ArrayList<LinAccMeasurement>)
}
compositeDisposable.add(disposableAcc)
compositeDisposable.add(disposableLinAcc)
}
}
// Dynamic list that can be subscribed on
class FlowableList<T> {
private val list: MutableList<T> = ArrayList()
val subject = PublishSubject.create<T>()
fun add(value: T) {
list.add(value)
subject.onNext(value)
}
fun remove(value: ArrayList<T>) {
list.removeAll(value)
}
}
I basically use a dynamic list to buffer a few dozens measurement samples, then insert them as whole in the Room Database and remove them from the dynamic list. Here is also some information why batch insertion is faster: https://hackernoon.com/squeezing-performance-from-sqlite-insertions-with-room-d769512f8330
Im still quite new to Android Development, so if you see some mistakes or have suggestions, I appreciate every comment :)
Related
For start I must say I am begginer in RxJava.
Data class:
#Entity(tableName = "google_book")
data class GoogleBook (
#PrimaryKey(autoGenerate = true) val id: Int=0,
val items: ArrayList<VolumeInfo>)
data class VolumeInfo(val volumeInfo: BookInfo){
data class BookInfo(val title: String, val publisher: String, val description: String, val imageLinks: ImageLinks?)
data class ImageLinks(val smallThumbnail: String?)
}
Function which helps me save data to database:
fun searchBooks(query: String) {
searchJob?.cancel()
searchJob = viewModelScope.launch {
val text = query.trim()
if (text.isNotEmpty()) {
bookRepository.getBooksFromApi(query)
.map { t ->
t.items.map {
it.volumeInfo.imageLinks?.smallThumbnail?.filter { x -> x != null }
}
t
}
.subscribeOn(Schedulers.io())
.observeOn(AndroidSchedulers.mainThread())
.subscribe { x ->
x?.let { googleBook ->
searchJob?.cancel()
searchJob = viewModelScope.launch {
bookRepository.deleteGoogleBook()
bookRepository.insertGoogleBook(googleBook)
}
} ?: kotlin.run {
Log.d(TAG, "observeTasks: Error")
}
}
}
}
}
As seen I want to filter list within GoogleBook object by image parameter but It doesnt work. I cannot add filtering for data class ImageLinks so I have no Idea how can I make it right
I am asking mostly about this part:
.map { t ->
t.items.map {
it.volumeInfo.imageLinks?.smallThumbnail?.filter { x -> x != null }
}
t
}
Thanks for reading
welcome to RxJava, you gonna love it.
As far as I can tell the issue with your filtering simply relies here:
.map { t ->
t.items.map {
it.volumeInfo.imageLinks?.smallThumbnail?.filter { x -> x != null })
} // this returns you a new list filtered list here, but does not modify the original one
t // but you return the same data object here, it is not modified at all
}
// also consider naming it bookInfo if it is actually a bookInfo
What you should do is make a copy of your object with the filtered elements, something like this:
fun filterGoogleBookBySmallThumbNail(googleBook: GoogleBook): GoogleBook {
val filteredItems = googleBook.items.filter { it.volumeInfo.imageLinks?.smallThumbnail == null }
return googleBook.copy(items = ArrayList(filteredItems)) // now a new googleBook item is created with the filtered elements
}
// snippet to adjust then
bookRepository.getBooksFromApi(query)
.map { googleBook -> filterGoogleBookBySmallThumbNail(googleBook) }
//...
Some additional notes / suggestions I have:
I don't see you actually disposing of the subscription of the Observable.
bookRepository.getBooksFromApi(query) If this line returns an Observable, even if you cancel the job, you will be still observing that Observable. If it returns a Single then you are in luck, because after one element it is disposed.
To properly dispose, in cancellation you would have to do something like this(still i would recommend the other two rather, just wanted to note the not disposing):
searchJob = viewModelScope.launch {
val text = query.trim()
if (text.isNotEmpty()) {
val disposable = bookRepository.getBooksFromApi(query)
//...
.subscribe { x ->
//...
}
try {
awaitCancellation() // this actually suspends the coroutine until it is cancelled
} catch (cancellableException: CancellationException) {
disposable.dispose() // this disposes the observable subscription
// that way the coroutine stays alive as long as it's not cancelled, and at that point it actually cleans up the Rx Subscription
}
Seems wasteful that you start a new coroutine job just to do actions
If you want to go the Rx way, you could make the
bookRepository.deleteGoogleBook() and bookRepository.insertGoogleBook(googleBook) Completable, and setup the observable as:
bookRepository.getBooksFromApi(query)
//..
.flatMap {
bookRepository.deleteGoogleBook().andThen(bookRepository.insertGoogleBook(it)).andThen(Observable.just(it))
}
//..subscribeOn
.subscribe()
Seems weird you are mixing coroutine and RX this way
if you don't want to go full Rx, you may consider converting your Observable into a kotlin coroutine Flow, that would be easier to handle with coroutine cancellations and calling suspend functions.
I hope it's helpful
I have a connection to a Bluetooth device that emits data every 250ms
In my viewmodel I wish to subscribe to said data , run some suspending code (which takes approximatelly 1000ms to run) and then present the result.
the following is a simple example of what I'm trying to do
Repository:
class Repo() : CoroutineScope {
private val supervisor = SupervisorJob()
override val coroutineContext: CoroutineContext = supervisor + Dispatchers.Default
private val _dataFlow = MutableSharedFlow<Int>()
private var dataJob: Job? = null
val dataFlow: Flow<Int> = _dataFlow
init {
launch {
var counter = 0
while (true) {
counter++
Log.d("Repo", "emmitting $counter")
_dataFlow.emit(counter)
delay(250)
}
}
}
}
the viewmodel
class VM(app:Application):AndroidViewModel(app) {
private val _reading = MutableLiveData<String>()
val latestReading :LiveData<String>() = _reading
init {
viewModelScope.launch(Dispatchers.Main) {
repo.dataFlow
.map {
validateData() //this is where some validation happens it is very fast
}
.flowOn(Dispatchers.Default)
.forEach {
delay(1000) //this is to simulate the work that is done,
}
.flowOn(Dispatchers.IO)
.map {
transformData() //this will transform the data to be human readable
}
.flowOn(Dispatchers.Default)
.collect {
_reading.postValue(it)
}
}
}
}
as you can see, when data comes, first I validate it to make sure it is not corrupt (on Default dispatcher) then I perform some operation on it (saving and running a long algorithm that takes time on the IO dispatcher) then I change it so the application user can understand it (switching back to Default dispatcher) then I post it to mutable live data so if there is a subscriber from the ui layer they can see the current data (on the Main dispatcher)
I have two questions
a) If validateData fails how can I cancel the current emission and move on to the next one?
b) Is there a way for the dataFlow subscriber working on the viewModel to generate new threads so the delay parts can run in parallel?
the timeline right now looks like the first part, but I want it to run like the second one
Is there a way to do this?
I've tried using buffer() which as the documentation states "Buffers flow emissions via channel of a specified capacity and runs collector in a separate coroutine." but when I set it to BufferOverflow.SUSPEND I get the behaviour of the first part, and when I set it to BufferOverflow.DROP_OLDEST or BufferOverflow.DORP_LATEST I loose emissions
I have also tried using .conflate() like so:
repo.dataFlow
.conflate()
.map { ....
and even though the emissions start one after the other, the part with the delay still waits for the previous one to finish before starting the next one
when I use .flowOn(Dispatchers.Default) for that part , I loose emissions, and when I use .flowOn(Dispatchers.IO) or something like Executors.newFixedThreadPool(4).asCoroutineDispatcher() they always wait for the previous one to finish before starting a new one
Edit 2:
After about 3 hours of experiments this seems to work
viewModelScope.launch(Dispatchers.Default) {
repo.dataFlow
.map {
validateData(it)
}
.flowOn(Dispatchers.Default)
.map {
async {
delay(1000)
it
}
}
.flowOn(Dispatchers.IO) // NOTE (A)
.map {
val result = it.await()
transformData(result)
}
.flowOn(Dispatchers.Default)
.collect {
_readings.postValue(it)
}
}
however I still haven't figured out how to cancel the emission if validatedata fails
and for some reason it only works if I use Dispatchers.IO , Executors.newFixedThreadPool(20).asCoroutineDispatcher() and Dispatchers.Unconfined where I put note (A), Dispatchers.Main does not seem to work (which I expected) but Dispatchers.Default also does not seem to work and I don't know why
First question: Well you cannot recover from an exception in a sense of continuing
the collection of the flow, as per docs "Flow collection can complete with an exception when an emitter or code inside the operators throw an exception." therefore once an exception has been thrown the collection is completed (exceptionally) you can however handle the exception by either wrapping your collection inside try/catch block or using the catch() operator.
Second question: You cannot, while the producer (emitting side) can be made concurrent
by using the buffer() operator, collection is always sequential.
As per your diagram, you need fan out (one producer, many consumers), you cannot
achieve that with flows. Flows are cold, each time you collect from them, they start
emitting from the beginning.
Fan out can be achieved using channels, where you can have one coroutine producing
values and many coroutines that consume those values.
Edit: Oh you meant the validation failed not the function itself, in that case you can use the filter() operator.
The BroadcastChannel and ConflatedBroadcastChannel are getting deprecated. SharedFlow cannot help you in your use case, as they emit values in a broadcast fashion, meaning producer waits until all consumers consume each value before producing the next one. That is still sequential, you need parallelism. You can achieve it using the produce() channel builder.
A simple example:
val scope = CoroutineScope(Job() + Dispatchers.IO)
val producer: ReceiveChannel<Int> = scope.produce {
var counter = 0
val startTime = System.currentTimeMillis()
while (isActive) {
counter++
send(counter)
println("producer produced $counter at ${System.currentTimeMillis() - startTime} ms from the beginning")
delay(250)
}
}
val consumerOne = scope.launch {
val startTime = System.currentTimeMillis()
for (x in producer) {
println("consumerOne consumd $x at ${System.currentTimeMillis() - startTime}ms from the beginning.")
delay(1000)
}
}
val consumerTwo = scope.launch {
val startTime = System.currentTimeMillis()
for (x in producer) {
println("consumerTwo consumd $x at ${System.currentTimeMillis() - startTime}ms from the beginning.")
delay(1000)
}
}
val consumerThree = scope.launch {
val startTime = System.currentTimeMillis()
for (x in producer) {
println("consumerThree consumd $x at ${System.currentTimeMillis() - startTime}ms from the beginning.")
delay(1000)
}
}
Observe production and consumption times.
I saw this but I'm not sure how to implement it or if this is the same issue, I have a mediator live data that updates when either of its 2 source live datas update or when the underlying data (Room db) updates, it seems to work fine but if the data updates a lot it refreshes a lot in quick succession and I get an error
Cannot run invalidation tracker. Is the db closed?
Cannot access database on the main thread since it may potentially lock the UI for a long period of time
this doesn't happen everytime, only when there are a lot of updates to the database in very quick succession heres the problem part of the view model,
var search: MutableLiveData<String> = getSearchState()
val filters: MutableLiveData<MutableSet<String>> = getCurrentFiltersState()
val searchPokemon: LiveData<PagingData<PokemonWithTypesAndSpeciesForList>>
val isFiltersLayoutExpanded: MutableLiveData<Boolean> = getFiltersLayoutExpanded()
init {
val combinedValues =
MediatorLiveData<Pair<String?, MutableSet<String>?>?>().apply {
addSource(search) {
value = Pair(it, filters.value)
}
addSource(filters) {
value = Pair(search.value, it)
}
}
searchPokemon = Transformations.switchMap(combinedValues) { pair ->
val search = pair?.first
val filters = pair?.second
if (search != null && filters != null) {
searchAndFilterPokemonPager(search, filters.toList())
} else null
}.distinctUntilChanged()
}
#SuppressLint("DefaultLocale")
private fun searchAndFilterPokemonPager(search: String, filters: List<String>): LiveData<PagingData<PokemonWithTypesAndSpeciesForList>> {
return Pager(
config = PagingConfig(
pageSize = 20,
enablePlaceholders = false,
maxSize = 60
)
) {
if (filters.isEmpty()){
searchPokemonForPaging(search)
} else {
searchAndFilterPokemonForPaging(search, filters)
}
}.liveData.cachedIn(viewModelScope)
}
#SuppressLint("DefaultLocale")
private fun getAllPokemonForPaging(): PagingSource<Int, PokemonWithTypesAndSpecies> {
return repository.getAllPokemonWithTypesAndSpeciesForPaging()
}
#SuppressLint("DefaultLocale")
private fun searchPokemonForPaging(search: String): PagingSource<Int, PokemonWithTypesAndSpeciesForList> {
return repository.searchPokemonWithTypesAndSpeciesForPaging(search)
}
#SuppressLint("DefaultLocale")
private fun searchAndFilterPokemonForPaging(search: String, filters: List<String>): PagingSource<Int, PokemonWithTypesAndSpeciesForList> {
return repository.searchAndFilterPokemonWithTypesAndSpeciesForPaging(search, filters)
}
the error is actually thrown from the function searchPokemonForPaging
for instance it happens when the app starts which does about 300 writes but if I force the calls off the main thread by making everything suspend and use runBlocking to return the Pager it does work and I don't get the error anymore but it obviously blocks the ui, so is there a way to maybe make the switchmap asynchronous or make the searchAndFilterPokemonPager method return a pager asynchronously? i know the second is technically possible (return from async) but maybe there is a way for coroutines to solve this
thanks for any help
You can simplify combining using combineTuple (which is available as a library that I wrote for this specific purpose) (optional)
Afterwards, you can use the liveData { coroutine builder to move execution to background thread
Now your code will look like
val search: MutableLiveData<String> = getSearchState()
val filters: MutableLiveData<Set<String>> = getCurrentFiltersState()
val searchPokemon: LiveData<PagingData<PokemonWithTypesAndSpeciesForList>>
val isFiltersLayoutExpanded: MutableLiveData<Boolean> = getFiltersLayoutExpanded()
init {
searchPokemon = combineTuple(search, filters).switchMap { (search, filters) ->
liveData {
val search = search ?: return#liveData
val filters = filters ?: return#liveData
withContext(Dispatchers.IO) {
emit(searchAndFilterPokemonPager(search, filters.toList()))
}
}
}.distinctUntilChanged()
}
I have a flow like this:
fun createRawDataFlow() = callbackFlow<String> {
SensorProProvider.getInstance(this#MainActivity).registerDataCallback { sensorProDeviceInfo, bytes ->
val dataString = bytes.map { it.toString() }.reduce { acc, s -> "$acc, $s" }
val hexString = HEXUtils.byteToHex(bytes)
Log.e("onDataReceived", "deviceInfo: ${sensorProDeviceInfo.deviceIdentify}, dataSize:${bytes.size}, data:$dataString")
offer(hexString)
}
awaitClose { }
}
GlobalScope.launch(Dispatchers.IO) {
createRawDataFlow()
.map {
Log.e("onDataReceived", "map2: ${Thread.currentThread().name}")
// what I want is collecting 10 emits of sensor's data, and return a list of them
// arraylistOf<String>(/* 10 hexStrings here*/)
}
.flowOn(Dispatchers.IO)
.collect {
Log.e("onDataReceived", "thread: ${Thread.currentThread().name}, hexData:$it")
}
}
Just like the comment in the code. I wanna collect 10 hex strings from the flow because of these strings come from the same period of time, and then pack them in an array list for return. How can I achieve this? Is there any operator similar to map to do this? Btw, forgive my poor English.
If you need a batch collection, and you do not wanna cancel the original flow, you could adjust your emitting flow function in a way that it holds a cache for the values.
/*
* Returns a list of at least [batchSize] integers.
*/
fun aFlow(cacheSize: Int = 10): Flow<List<Int>> {
var counter: Int = 0
val cache: MutableList<Int> = mutableListOf()
return flow {
while(currentCoroutineContext().isActive) {
cache.add(counter++)
if (cache.size >= cacheSize) {
emit(cache)
cache.clear()
}
delay(500L) // the delay is just to simulate incoming sensor data
}
}
}
Generic Solution
To make this a bit more generic I created a generic extension function on flow you can apply to any Flow you wanna have a batch list returned.
Consider we have an infiniteFlow of integers:
fun infiniteFlow(): Flow<Int> {
var counter: Int = 0
return flow {
while (currentCoroutineContext().isActive) {
emit(counter++)
delay(250L) // the delay is just to simulate incoming sensor data
}
}
}
And this batch extension function:
/**
* Maps the Flow<T> to Flow<List<T>>. The list size is at least [batchSize]
*/
fun <T> Flow<T>.batch(batchSize: Int = 10): Flow<List<T>> {
val cache: MutableList<T> = mutableListOf()
return map {
cache.apply { add(it) }
}.filter { it.size >= batchSize }
.map {
mutableListOf<T>().apply { // copy the list and clears the cache
addAll(cache)
cache.clear()
}
}
}
Note: This is just an example. It is not optimized or tested for edge-cases!
You can then use this function like:
infiniteFlow().batch(batchSize = 12).collect { println(it) }
First you apply take
fun <T> Flow<T>.take(count: Int): Flow<T>
Returns a flow that contains first count elements. When count elements are consumed, the original flow is cancelled. Throws IllegalArgumentException if count is not positive.
Then you apply toCollection
suspend fun <T, C : MutableCollection<in T>> Flow<T>.toCollection(destination: C): C
Collects given flow into a destination
Between take and toCollection you can place other operations if you need. This is how it looks all together:
val hexList: List<String> = createRawDataFlow().take(10) // take first 10 items
.map { ... } // do some (optional) mapping
.toCollection(mutableListOf())
// after the flow is cancelled (all items consumed) you have the elements in hexList
Documentation
Flow
I'm investigating the use of Kotlin Flow within my current Android application
My application retrieves its data from a remote server via Retrofit API calls.
Some of these API's return 50,000 data items in 500 item pages.
Each API response contains an HTTP Link header containing the Next pages complete URL.
These calls can take up to 2 seconds to complete.
In an attempt to reduce the elapsed time I have employed a Kotlin Flow to concurrently process each page
of data while also making the next page API call.
My flow is defined as follows:
private val persistenceThreadPool = Executors.newFixedThreadPool(3).asCoroutineDispatcher()
private val internalWorkWorkState = MutableStateFlow<Response<List<MyPage>>?>(null)
private val workWorkState = internalWorkWorkState.asStateFlow()
private val myJob: Job
init {
myJob = GlobalScope.launch(persistenceThreadPool) {
workWorkState.collect { page ->
if (page == null) {
} else managePage(page!!)
}
}
}
My Recursive function is defined as follows that fetches all pages:-
private suspend fun managePages(accessToken: String, response: Response<List<MyPage>>) {
when {
result != null -> return
response.isSuccessful -> internalWorkWorkState.emit(response)
else -> {
manageError(response.errorBody())
result = Result.failure()
return
}
}
response.headers().filter { it.first == HTTP_HEADER_LINK && it.second.contains(REL_NEXT) }.forEach {
val parts = it.second.split(OPEN_ANGLE, CLOSE_ANGLE)
if (parts.size >= 2) {
managePages(accessToken, service.myApiCall(accessToken, parts[1]))
}
}
}
private suspend fun managePage(response: Response<List<MyPage>>) {
val pages = response.body()
pages?.let {
persistResponse(it)
}
}
private suspend fun persistResponse(myPage: List<MyPage>) {
val myPageDOs = ArrayList<MyPageDO>()
myPage.forEach { page ->
myPageDOs.add(page.mapDO())
}
database.myPageDAO().insertAsync(myPageDOs)
}
My numerous issues are
This code does not insert all data items that I retrieve
How do complete the flow when all data items have been retrieved
How do I complete the GlobalScope job once all the data items have been retrieved and persisted
UPDATE
By making the following changes I have managed to insert all the data
private val persistenceThreadPool = Executors.newFixedThreadPool(3).asCoroutineDispatcher()
private val completed = CompletableDeferred<Int>()
private val channel = Channel<Response<List<MyPage>>?>(UNLIMITED)
private val channelFlow = channel.consumeAsFlow().flowOn(persistenceThreadPool)
private val frank: Job
init {
frank = GlobalScope.launch(persistenceThreadPool) {
channelFlow.collect { page ->
if (page == null) {
completed.complete(totalItems)
} else managePage(page!!)
}
}
}
...
...
...
channel.send(null)
completed.await()
return result ?: Result.success(outputData)
I do not like having to rely on a CompletableDeferred, is there a better approach than this to know when the Flow has completed everything?
You are looking for the flow builder and Flow.buffer():
suspend fun getData(): Flow<Data> = flow {
var pageData: List<Data>
var pageUrl: String? = "bla"
while (pageUrl != null) {
TODO("fetch pageData from pageUrl and change pageUrl to the next page")
emitAll(pageData)
}
}
.flowOn(Dispatchers.IO /* no need for a thread pool executor, IO does it automatically */)
.buffer(3)
You can use it just like a normal Flow, iterate, etc. If you want to know the total length of the output, you should calculate it on the consumer with a mutable closure variable. Note you shouldn't need to use GlobalScope anywhere (ideally ever).
There are a few ways to achieve the desired behaviour. I would suggest to use coroutineScope which is designed specifically for parallel decomposition. It also provides good cancellation and error handling behaviour out of the box. In conjunction with Channel.close behaviour it makes the implementation pretty simple. Conceptually the implementation may look like this:
suspend fun fetchAllPages() {
coroutineScope {
val channel = Channel<MyPage>(Channel.UNLIMITED)
launch(Dispatchers.IO){ loadData(channel) }
launch(Dispatchers.IO){ processData(channel) }
}
}
suspend fun loadData(sendChannel: SendChannel<MyPage>){
while(hasMoreData()){
sendChannel.send(loadPage())
}
sendChannel.close()
}
suspend fun processData(channel: ReceiveChannel<MyPage>){
for(page in channel){
// process page
}
}
It works in the following way:
coroutineScope suspends until all children are finished. So you don't need CompletableDeferred anymore.
loadData() loads pages in cycle and posts them into the channel. It closes the channel as soon as all pages have been loaded.
processData fetches items from the channel one by one and process them. The cycle will finish as soon as all the items have been processed (and the channel has been closed).
In this implementation the producer coroutine works independently, with no back-pressure, so it can take a lot of memory if the processing is slow. Limit the buffer capacity to have the producer coroutine suspend when the buffer is full.
It might be also a good idea to use channels fan-out behaviour to launch multiple processors to speed up the computation.