I have a situation where I need to dispatch an indeterminate number of network calls known only at runtime. Each call returns a list. As each is returned, I need to combine these lists in to a single merged list. I am using coroutines to do this.
The problem I am having relates to the fact that I do not know how many network calls the app will need to make. To address this, I am using a loop to iterate over the list of calls at runtime:
private suspend fun fetchData(params: List<Interval>): List<Item> {
val smallLists = mutableListOf<Deferred<List<Item>>>()
val merged = mutableListOf<List<Item>>()
for (index in 0 until params.size) {
val param = params[index]
// loop stop iterating after this call is dispatched
smallLists[index] = CoroutineScope(Dispatchers.IO).async {
fetchList(param)
}
}
for (index in 0 until smallLists.size) {
merged[index] = smallLists[index].await()
}
return merged.flatMap { it.toList() }
}
private fun fetchList(param: Interval) : List<Item> {
return dataSource.fetchData(param)
}
What happens in this code is that it enters the first loop. The params list is correct. It dispatches the first query, and this query returns (I can see this via a Charles proxy).
But this is where everything just dies. The app does nothing with the network response and the loop terminates (i.e. there is no second iteration of the loop).
I know that everything else is intact because I have an alternate version that does not include looping. It just does two queries, awaits their results, and returns the combined list. It works fine, except that it won't handle a dynamic runtime situation:
private suspend fun fetchData(params: List<Interval>): List<Item> {
val list1 = CoroutineScope(Dispatchers.IO).async {
fetchList(params[0])
}
val list2 = CoroutineScope(Dispatchers.IO).async {
fetchList(params[1])
}
return list1.await() + list2.await()
}
Probably a simple solution here, but I don't see it. Any help is appreciated.
This is not correct:
smallLists[index] = CoroutineScope(Dispatchers.IO).async {
fetchList(param)
}
Your smallLists is empty, so you can't access index index. Change it like this
smallLists.add(CoroutineScope(Dispatchers.IO).async {
fetchList(param)
}
)
Note that you can call awaitAll() on your list of asyncs as well, to simplify your code:
private suspend fun fetchData(params: List<Interval>): List<Item> {
val smallLists = mutableListOf<Deferred<List<Item>>>()
for (index in 0 until params.size) {
val param = params[index]
// loop stop iterating after this call is dispatched
smallLists.add(CoroutineScope(Dispatchers.IO).async {
fetchList(param)
}
})
val merged = smallLists.awaitAll()
return merged.flatMap { it.toList() }
}
Related
I have still a little bit of trouble putting all information together about the thread-safety of using coroutines to launch network requests.
Let's say we have following use-case, there is a list of users we get and for each of those users, I will do some specific check which has to run over a network request to the API, giving me some information back about this user.
The userCheck happens inside a library, which doesn't expose suspend functions but rather still uses a callback.
Inside of this library, I have seen code like this to launch each of the network requests:
internal suspend fun <T> doNetworkRequest(request: suspend () -> Response<T>): NetworkResult<T> {
return withContext(Dispatchers.IO) {
try {
val response = request.invoke()
...
According to the documentation, Dispatchers.IO can use multiple threads for the execution of the code, also the request function is simply a function from a Retrofit API.
So what I did is to launch the request for each user, and use a single resultHandler object, which will add the results to a list and check if the length of the result list equals the length of the user list, if so, then all userChecks are done and I know that I can do something with the results, which need to be returned all together.
val userList: List<String>? = getUsers()
val userCheckResultList = mutableListOf<UserCheckResult>()
val handler = object : UserCheckResultHandler {
override fun onResult(
userCheckResult: UserCheckResult?
) {
userCheckResult?.let {
userCheckResultList.add(
it
)
}
if (userCheckResultList.size == userList?.size) {
doSomethingWithResultList()
print("SUCCESS")
}
}
}
userList?.forEach {
checkUser(it, handler)
}
My question is: Is this implementation thread-safe? As far as I know, Kotlin objects should be thread safe, but I have gotten feedback that this is possibly not the best implementation :D
But in theory, even if the requests get launched asynchronous and multiple at the same time, only one at a time can access the lock of the thread the result handler is running on and there will be no race condition or problems with adding items to the list and comparing the sizes.
Am I wrong about this?
Is there any way to handle this scenario in a better way?
If you are executing multiple request in parallel - it's not. List is not thread safe. But it's simple fix for that. Create a Mutex object and then just wrap your operation on list in lock, like that:
val lock = Mutex()
val userList: List<String>? = getUsers()
val userCheckResultList = mutableListOf<UserCheckResult>()
val handler = object : UserCheckResultHandler {
override fun onResult(
userCheckResult: UserCheckResult?
) {
lock.withLock {
userCheckResult?.let {
userCheckResultList.add(
it
)
}
if (userCheckResultList.size == userList?.size) {
doSomethingWithResultList()
print("SUCCESS")
}
}
}
}
userList?.forEach {
checkUser(it, handler)
}
I have to add that this whole solution seems very hacky. I would go completely other route. Run all of your requests wrapping those in async { // network request } which will return Deferred object. Add this object to some list. After that wait for all of those deferred objects using awaitAll(). Like that:
val jobs = mutableListOf<Job>()
userList?.forEach {
// i assume checkUser is suspendable here
jobs += async { checkUser(it, handler) }
}
// wait for all requests
jobs.awaitAll()
// After that you can access all results like this:
val resultOfJob0 = jobs[0].getCompleted()
I have a flow like this:
fun createRawDataFlow() = callbackFlow<String> {
SensorProProvider.getInstance(this#MainActivity).registerDataCallback { sensorProDeviceInfo, bytes ->
val dataString = bytes.map { it.toString() }.reduce { acc, s -> "$acc, $s" }
val hexString = HEXUtils.byteToHex(bytes)
Log.e("onDataReceived", "deviceInfo: ${sensorProDeviceInfo.deviceIdentify}, dataSize:${bytes.size}, data:$dataString")
offer(hexString)
}
awaitClose { }
}
GlobalScope.launch(Dispatchers.IO) {
createRawDataFlow()
.map {
Log.e("onDataReceived", "map2: ${Thread.currentThread().name}")
// what I want is collecting 10 emits of sensor's data, and return a list of them
// arraylistOf<String>(/* 10 hexStrings here*/)
}
.flowOn(Dispatchers.IO)
.collect {
Log.e("onDataReceived", "thread: ${Thread.currentThread().name}, hexData:$it")
}
}
Just like the comment in the code. I wanna collect 10 hex strings from the flow because of these strings come from the same period of time, and then pack them in an array list for return. How can I achieve this? Is there any operator similar to map to do this? Btw, forgive my poor English.
If you need a batch collection, and you do not wanna cancel the original flow, you could adjust your emitting flow function in a way that it holds a cache for the values.
/*
* Returns a list of at least [batchSize] integers.
*/
fun aFlow(cacheSize: Int = 10): Flow<List<Int>> {
var counter: Int = 0
val cache: MutableList<Int> = mutableListOf()
return flow {
while(currentCoroutineContext().isActive) {
cache.add(counter++)
if (cache.size >= cacheSize) {
emit(cache)
cache.clear()
}
delay(500L) // the delay is just to simulate incoming sensor data
}
}
}
Generic Solution
To make this a bit more generic I created a generic extension function on flow you can apply to any Flow you wanna have a batch list returned.
Consider we have an infiniteFlow of integers:
fun infiniteFlow(): Flow<Int> {
var counter: Int = 0
return flow {
while (currentCoroutineContext().isActive) {
emit(counter++)
delay(250L) // the delay is just to simulate incoming sensor data
}
}
}
And this batch extension function:
/**
* Maps the Flow<T> to Flow<List<T>>. The list size is at least [batchSize]
*/
fun <T> Flow<T>.batch(batchSize: Int = 10): Flow<List<T>> {
val cache: MutableList<T> = mutableListOf()
return map {
cache.apply { add(it) }
}.filter { it.size >= batchSize }
.map {
mutableListOf<T>().apply { // copy the list and clears the cache
addAll(cache)
cache.clear()
}
}
}
Note: This is just an example. It is not optimized or tested for edge-cases!
You can then use this function like:
infiniteFlow().batch(batchSize = 12).collect { println(it) }
First you apply take
fun <T> Flow<T>.take(count: Int): Flow<T>
Returns a flow that contains first count elements. When count elements are consumed, the original flow is cancelled. Throws IllegalArgumentException if count is not positive.
Then you apply toCollection
suspend fun <T, C : MutableCollection<in T>> Flow<T>.toCollection(destination: C): C
Collects given flow into a destination
Between take and toCollection you can place other operations if you need. This is how it looks all together:
val hexList: List<String> = createRawDataFlow().take(10) // take first 10 items
.map { ... } // do some (optional) mapping
.toCollection(mutableListOf())
// after the flow is cancelled (all items consumed) you have the elements in hexList
Documentation
Flow
I'm investigating the use of Kotlin Flow within my current Android application
My application retrieves its data from a remote server via Retrofit API calls.
Some of these API's return 50,000 data items in 500 item pages.
Each API response contains an HTTP Link header containing the Next pages complete URL.
These calls can take up to 2 seconds to complete.
In an attempt to reduce the elapsed time I have employed a Kotlin Flow to concurrently process each page
of data while also making the next page API call.
My flow is defined as follows:
private val persistenceThreadPool = Executors.newFixedThreadPool(3).asCoroutineDispatcher()
private val internalWorkWorkState = MutableStateFlow<Response<List<MyPage>>?>(null)
private val workWorkState = internalWorkWorkState.asStateFlow()
private val myJob: Job
init {
myJob = GlobalScope.launch(persistenceThreadPool) {
workWorkState.collect { page ->
if (page == null) {
} else managePage(page!!)
}
}
}
My Recursive function is defined as follows that fetches all pages:-
private suspend fun managePages(accessToken: String, response: Response<List<MyPage>>) {
when {
result != null -> return
response.isSuccessful -> internalWorkWorkState.emit(response)
else -> {
manageError(response.errorBody())
result = Result.failure()
return
}
}
response.headers().filter { it.first == HTTP_HEADER_LINK && it.second.contains(REL_NEXT) }.forEach {
val parts = it.second.split(OPEN_ANGLE, CLOSE_ANGLE)
if (parts.size >= 2) {
managePages(accessToken, service.myApiCall(accessToken, parts[1]))
}
}
}
private suspend fun managePage(response: Response<List<MyPage>>) {
val pages = response.body()
pages?.let {
persistResponse(it)
}
}
private suspend fun persistResponse(myPage: List<MyPage>) {
val myPageDOs = ArrayList<MyPageDO>()
myPage.forEach { page ->
myPageDOs.add(page.mapDO())
}
database.myPageDAO().insertAsync(myPageDOs)
}
My numerous issues are
This code does not insert all data items that I retrieve
How do complete the flow when all data items have been retrieved
How do I complete the GlobalScope job once all the data items have been retrieved and persisted
UPDATE
By making the following changes I have managed to insert all the data
private val persistenceThreadPool = Executors.newFixedThreadPool(3).asCoroutineDispatcher()
private val completed = CompletableDeferred<Int>()
private val channel = Channel<Response<List<MyPage>>?>(UNLIMITED)
private val channelFlow = channel.consumeAsFlow().flowOn(persistenceThreadPool)
private val frank: Job
init {
frank = GlobalScope.launch(persistenceThreadPool) {
channelFlow.collect { page ->
if (page == null) {
completed.complete(totalItems)
} else managePage(page!!)
}
}
}
...
...
...
channel.send(null)
completed.await()
return result ?: Result.success(outputData)
I do not like having to rely on a CompletableDeferred, is there a better approach than this to know when the Flow has completed everything?
You are looking for the flow builder and Flow.buffer():
suspend fun getData(): Flow<Data> = flow {
var pageData: List<Data>
var pageUrl: String? = "bla"
while (pageUrl != null) {
TODO("fetch pageData from pageUrl and change pageUrl to the next page")
emitAll(pageData)
}
}
.flowOn(Dispatchers.IO /* no need for a thread pool executor, IO does it automatically */)
.buffer(3)
You can use it just like a normal Flow, iterate, etc. If you want to know the total length of the output, you should calculate it on the consumer with a mutable closure variable. Note you shouldn't need to use GlobalScope anywhere (ideally ever).
There are a few ways to achieve the desired behaviour. I would suggest to use coroutineScope which is designed specifically for parallel decomposition. It also provides good cancellation and error handling behaviour out of the box. In conjunction with Channel.close behaviour it makes the implementation pretty simple. Conceptually the implementation may look like this:
suspend fun fetchAllPages() {
coroutineScope {
val channel = Channel<MyPage>(Channel.UNLIMITED)
launch(Dispatchers.IO){ loadData(channel) }
launch(Dispatchers.IO){ processData(channel) }
}
}
suspend fun loadData(sendChannel: SendChannel<MyPage>){
while(hasMoreData()){
sendChannel.send(loadPage())
}
sendChannel.close()
}
suspend fun processData(channel: ReceiveChannel<MyPage>){
for(page in channel){
// process page
}
}
It works in the following way:
coroutineScope suspends until all children are finished. So you don't need CompletableDeferred anymore.
loadData() loads pages in cycle and posts them into the channel. It closes the channel as soon as all pages have been loaded.
processData fetches items from the channel one by one and process them. The cycle will finish as soon as all the items have been processed (and the channel has been closed).
In this implementation the producer coroutine works independently, with no back-pressure, so it can take a lot of memory if the processing is slow. Limit the buffer capacity to have the producer coroutine suspend when the buffer is full.
It might be also a good idea to use channels fan-out behaviour to launch multiple processors to speed up the computation.
I have a method that checks if a list contains a user or not. For some reason it always returns false, even though the user is in the list. The function does work, I know it does find the user, just not sure why it doesn't return anything else but false.
I know it works because I have another method with this code snippet in to check if the user is in the list and remove or add them. That works so I know it is pulling the list.
Method:
fun checkUserChatChannel(channelId: String): Boolean {
var list = mutableListOf<String>()
val currentUserId = FirebaseAuth.getInstance().currentUser!!.uid
var bool = false
chatChannelsCollectionRef.document(channelId).get().addOnSuccessListener {
if (it.exists()) {
list = it.get("userIds") as MutableList<String>
if(list.contains(currentUserId)){
bool = true
}
}
}
return bool
}
Calling:
boolean inOut = FirestoreUtil.INSTANCE.checkUserChatChannel(moduleId);
if(!inOut)
{
JLChat.setText("Join");
}
else
{
JLChat.setText("Leave");
}
But the button will always remain on "Join":
This is the other method that I use so I know the code works to remove a user from the list(Nothing to do with question):
fun LeaveGroup(channelId: String) {
currentUserDocRef.collection("engagedChatChannels").document(channelId)
.delete()
var list = mutableListOf<String>()
val currentUserId = FirebaseAuth.getInstance().currentUser!!.uid
chatChannelsCollectionRef.document(channelId).get().addOnSuccessListener {
if (it.exists()) {
list = it.get("userIds") as MutableList<String>
if(list.contains(currentUserId)){
list.remove(currentUserId)
}
}
val newChannel = chatChannelsCollectionRef.document(channelId)
newChannel.set(GChatChannel(channelId,list))
}
}
Firebase APIs are asynchronous, meaning that onSucces() function returns immediately after it's invoked, and the callback from the Task it returns, will be called some time later. There are no guarantees about how long it will take. So it may take from a few hundred milliseconds to a few seconds before that data is available. Because that method returns immediately, the value of your bool variable you're trying to return, will not have been populated from the callback yet.
Basically, you're trying to return a value synchronously from an API that's asynchronous. That's not a good idea. You should handle the APIs asynchronously as intended.
A quick solve for this problem would be to use the value of your bool variable only inside the callback. This also means that you need to change your checkUserChatChannel instead of returning a boolean to return void, otherwise I recommend you see the last part of my anwser from this post in which I have explained how it can be done using a custom callback.
I need to implement a search on a large data set that can take some time to complete on mobile devices. So I want to display each matching result as soon as it becomes available.
I need to fetch all available data from a data store that decides whether to get them from network or from the device. This call is an Observable. As soon as the data from that Observable becomes available I want to loop over it, apply a search predicate and notify any Observers for any match found.
So far my idea was to use a PublishSubject to subscribe to and call its onNext function every time the search finds a new match. However I can't seem to get the desired behavior to work.
I'm using MVVM + Android Databinding and want to display every matched entry in a RecyclerView so for every onNext event that is received by the observing viewModel I have to call notifyItemRangeInserted on the RecyclerView's adapter.
class MySearch(val dataStore: MyDataStore) {
private val searchSubject = PublishSubject.create<List<MyDto>>()
fun findEntries(query: String): Observable<List<MyDto>> {
return searchSubject.doOnSubscribe {
// dataStore.fetchAll returns an Observable<List<MyDto>>
dataStore.fetchAll.doOnNext {
myDtos -> if (query.isNotBlank()) {
search(query, myDtos)
} else {
searchSubject.onNext(myDtos)
}
}.subscribe(searchSubject)
}
}
private fun(query: String, data: List<MyDto>) {
data.forEach {
if (it.matches(query)) {
// in real life I cache a few results and don't send each single item
searchSubject.onNext(listOf(it))
}
}
}
fun MyDto.matches(query: String): Boolean // stub
}
-
class MyViewModel(val mySearch: MySearch, val viewNotifications: Observer<Pair<Int, Int>>): BaseObservable() {
var displayItems: List<MyItemViewModel> = listOf()
fun loadData(query: String): Subscription {
return mySearch.findEntries(query)
.observeOn(AndroidSchedulers.mainThread())
.doOnNext(this::onSearchResult)
.doOnCompleted(viewNotifications::onCompleted)
.doOnError(viewNotifications::onError)
.subscribe()
}
private fun onSearchResult(List<MyDto> data) {
val lastIndex = displayItems.lastIndex
displayItems = data.map { createItem(it) }
notifyChange()
viewNotifications.onNext(Pair(lastIndex, data.count()))
}
private fun createItem(dto: MyDto): MyItemViewModel // stub
}
The problem I have with the above code is that with an empty query MyViewModel::onSearchResult is called 3 times in a row and when the query is not empty MyViewModel::onSearchResult isn't called at all.
I suspect the problem lies somewhere in the way I have nested the Observables in findEntries or that I'm subscribing wrong / getting data from a wrong thread.
Does anyone have an idea about this?