Background
I try to use Google's new Firebase services, for A/B testing. For this, we need to use both Firebase Analytics and Firebase RemoteConfig.
The problem
Using FireBase RemoteConfig, I wanted to get the variables from the server (which have different value per variant of each experiment), but it seems that on some devices it gets stuck there, not calling its callback (OnCompleteListener.onComplete) .
I used about the same code as on the samples (here) :
// init:
boolean isDebug = ... ;
mFirebaseRemoteConfig = FirebaseRemoteConfig.getInstance();
FirebaseRemoteConfigSettings configSettings = new FirebaseRemoteConfigSettings.Builder().setDeveloperModeEnabled(isDebug).build();
mFirebaseRemoteConfig.setConfigSettings(configSettings);
final HashMap<String, Object> defaults = new HashMap<>();
for (...)
defaults.put(...);
mFirebaseRemoteConfig.setDefaults(defaults);
//fetching the variables:
long cacheExpiration = isDebug ? 0 : java.util.concurrent.TimeUnit.HOURS.toSeconds(1);
mFirebaseRemoteConfig.fetch(cacheExpiration).addOnCompleteListener(new OnCompleteListener<Void>() {
#Override
public void onComplete(#NonNull Task<Void> task) {
//on some devices, I never get here at all
if (task.isSuccessful()) {
mFirebaseRemoteConfig.activateFetched();
final FirebaseAnalytics firebaseAnalytics = FirebaseAnalytics.getInstance(context);
for (...) {
String experimentVariantValue = mFirebaseRemoteConfig.getString(...);
firebaseAnalytics.setUserProperty(..., experimentVariantValue);
}
} else {
}
}
});
Thing is, the callback doesn't get called, but only on some devices:
Nexus 5 with Android 6.0.1 : almost always succeeds.
Nexus 4 with Android 5.1.1 and LG G2 with Android 4.2.2 : almost always freeze (meaning they don't get into the callback)
I've also found that when it does work, it works in the near sessions afterwards.
The question
Why does it occur? What can I do to solve this?
Just to add to what Cachapa just posted, the bug happens when the fetch() is called too early. We found two solutions to the problem (both work but not satisfactory) - call the fetch() from an onResume, or add a 3 seconds delay before actually issuing the fetch().
And about the "it works in the near sessions afterwards", once you get the callback and call activateFetched(), all sessions will get the correct values.
Update
The 3 second delay was from the Activity.onCreate(). After checking further, it only worked on one device - Nexus 4. It didn't do the trick on a Samsung S3 nor on a Moto X Pure.
We also checked on a Samsung S7, there it worked without any delay - the problem never manifested at all.
We are discussing it in a mail correspondence with Firebase support. I will update here when they get back to me..
Update 2
Firebase team claim this is solved in GPSv9.4 and I think this time they're right. They also claimed it was solved in 9.3, but then my tests disproved it. Now (after updating our dependency on gms to v9.4), I get the callbacks correctly on my dev devices. However, I still get indications that not all of our production devices are getting the correct config. My assumption is that devices that didn't update the GPS to the latest version are still faulty, but I'm not sure..
It seems to be some sort of race condition in the Firebase initialization code, according to this answer: https://stackoverflow.com/a/37664946
I tried quite a few of the posted workarounds and nothing worked reliably enough for my satisfaction. It seems that the Firebase devs are aware of the problem though, so the problem will probably be fixed soon.
Why does it occur? What can I do to solve this?
I am guessing the reason why the callbacks do not get called is because Firebase Remote Config may have several issues and those are not solved just yet.
Below is a list of things that my team and I have found so far that may be considered as the issues of Remote Config.
fetch() does not call its callbacks if the method gets called too early after app initialization.
(FIXED in 9.2.1) Gradle build process failed with "minifyEnabled" - Proguard
Debug build and Release build may affect how Remote Config behaves..?
Above list is some of the issues regarding Remote Config that my teammates and I have found so far. The first two of them are from Google Search, and the last one, "Debug build and Release build may affect how Remote Config behaves..?" is from our observation after testing Remote Config for a while, so we are not sure about it yet.
I am not sure if it SOLVES your problem, but if your issue is related to the first issue of what I have listed above, due to fetch() is called too early, then you may try calling fetch() with postDelayed which we have tried and it made better chance for Remote Config to successfully call its listeners, but overall, in my personal opinion, Remote Config is just not fully ready yet for production release..
Related
I'm seeing a crash come through Crashlytics that I'm unable to reproduce or locate the cause of. The crash only ever happens on Google Pixel devices running Android 12, and the crash always happens in the background.
This is the crash log from Crashlytics:
Fatal Exception: android.app.RemoteServiceException$CannotDeliverBroadcastException: can't deliver broadcast
at android.app.ActivityThread.throwRemoteServiceException(ActivityThread.java:1939)
at android.app.ActivityThread.access$2700(ActivityThread.java:256)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2190)
at android.os.Handler.dispatchMessage(Handler.java:106)
at android.os.Looper.loopOnce(Looper.java:201)
at android.os.Looper.loop(Looper.java:288)
at android.app.ActivityThread.main(ActivityThread.java:7870)
at java.lang.reflect.Method.invoke(Method.java)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1003)
I've looked at similar questions (like this and this) but Crashlytics is showing that these users all have plenty of free memory, and nowhere in our codebase are we calling registerReceiver or sendBroadcast so the solutions in that second question aren't any help.
Based on limited logs I'm pretty sure the crash happens when the user receives a push notification, but I have a Google Pixel 4a running Android 12 and I haven't been able to reproduce it at all when sending myself notifications.
We have a custom FirebaseMessagingService to listen for notifications that we register in the Manifest and a couple of BroadcastReceivers that listen for geofencing updates and utilize WorkManager to do some work when a transition is detected. The only thing that's changed with any of those recently is we updated WorkManager to initialize itself using Android's app startup library, but I'm not sure if that's even relevant since the crash logs give me no information, and if there was a problem with our implementation it wouldn't limit itself to just Pixel devices running Android 12.
Has anyone see this before or is there a bug exclusively on Pixel devices that run Android 12? I've spent hours digging into this and am at a complete loss.
With reference to Android 13, rather than earlier issues on 12, there is a Google tracker issue here, which at the time of writing is assigned but awaiting a meaningful response.
A summary of the issue is that it only occurs on 13, and only on Pixel devices.
CommonsWare has a blog entry on this here, and the only other clue I found anywhere was in the changelog for GrapheneOS, here, which has this line entry:
Sandboxed Google Play compatibility layer: don't report CannotDeliverBroadcastException to the user
We use this Play library and experience the fault, so it's possible Graphene have encountered this and had to put in an OS fix.
Update:
I tentatively believe that we as an app have suppressed this issue, and stopped it polluting our stats.
We set an exception handler to absorb it, which is what GrapheneOS are doing - credit to them.
class CustomUncaughtExceptionHandler(
private val uncaughtExceptionHandler: Thread.UncaughtExceptionHandler) : Thread.UncaughtExceptionHandler {
override fun uncaughtException(thread: Thread, exception: Throwable) {
if (shouldAbsorb(exception)) {
return
}
uncaughtExceptionHandler.uncaughtException(thread, exception)
}
/**
* Evaluate whether to silently absorb uncaught crashes such that they
* don't crash the app. We generally want to avoid this practice - we would
* rather know about them. However in some cases there's nothing we can do
* about the crash (e.g. it is an OS fault) and we would rather not have them
* pollute our reliability stats.
*/
private fun shouldAbsorb(exception: Throwable): Boolean {
return when (exception::class.simpleName) {
"CannotDeliverBroadcastException" -> true
else -> false
}
}
}
We have to operate off class name strings because the CannotDeliverBroadcastException class is hidden and not available to us.
We install this handler early in the Application.onCreate() method, like this:
val defaultUncaughtExceptionHandler = Thread.getDefaultUncaughtExceptionHandler()
Thread.setDefaultUncaughtExceptionHandler(
CustomUncaughtExceptionHandler(defaultUncaughtExceptionHandler)
)
I might be a bit premature with this, but so far this has resulted in none of these crashes appearing in Play Console. A few did appear in our other reporting platform, where what is/isn't reported has always varied.
To be clear, I'm not suggesting this is a good approach, or one that you should necessarily take. It requires a client release and it risks masking exceptions not relating to this root cause. Fixing or ignoring this issue at the point of collection is Google's responsibility. However, it has seemingly stopped the impact on our headline reliability statistics, so I thought I'd share it as a possibility.
Our team has noticed a significant downtrend in this crash over the past few months. Possible Google has begun to roll out a fix for Pixel devices. We also have the same crash happening for Pixels running Android 13 and it's also seeing a downtrend. Hopefully others are seeing this as well.
I'm not sure if this will be helpful to anyone, but once we removed WorkManager (which was being initialized via the App Startup library) the crash stopped happening. This was removed alongside a bunch of other code, so I can't say for sure if WorkManager was the problem, if the App Startup library was the problem, or if something else that we removed fixed it.
I received a pre-launch report of my app showing the same problem:
android.app.RemoteServiceException$CannotDeliverBroadcastException: can't deliver broadcast
Device: google Redfin 64-bit only
Android version: Android 13 (SDK 33)
The solution proposed by Rob Pridham fixed the problem. Just in case someone prefers to use Java...
This is the CustomUncaughtExceptionHandler class added to MainActivity:
public static final class CustomUncaughtExceptionHandler implements Thread.UncaughtExceptionHandler {
private final Thread.UncaughtExceptionHandler uncaughtExceptionHandler;
public void uncaughtException(#NotNull Thread thread, #NotNull Throwable exception) {
Intrinsics.checkNotNullParameter(thread, "thread");
Intrinsics.checkNotNullParameter(exception, "exception");
if (!this.shouldAbsorb(exception)) {
this.uncaughtExceptionHandler.uncaughtException(thread, exception);
}
}
private boolean shouldAbsorb(Throwable exception) {
String var10000 = Reflection.getOrCreateKotlinClass(exception.getClass()).getSimpleName();
if (var10000 != null) {
return "CannotDeliverBroadcastException".equals(var10000);
}
return false;
}
public CustomUncaughtExceptionHandler(#NotNull Thread.UncaughtExceptionHandler uncaughtExceptionHandler) {
super();
Intrinsics.checkNotNullParameter(uncaughtExceptionHandler, "uncaughtExceptionHandler");
this.uncaughtExceptionHandler = uncaughtExceptionHandler;
}
}
This is the code added to Oncreate:
Thread.UncaughtExceptionHandler uncaughtExceptionHandler = Thread.getDefaultUncaughtExceptionHandler();
assert uncaughtExceptionHandler != null;
Thread.setDefaultUncaughtExceptionHandler(new CustomUncaughtExceptionHandler(uncaughtExceptionHandler));
I call a Firebase Cloud Function from my app and most of the times it works normally, but from time to time (I would say twice a day) an invocation, after a long time, produces an "Internal Error" exception.
Apparently the invocation is not even done, because no record of the function's execution is registered in the Firebase logs on the server. To confirm this, I tried to call another very simple cloud function (see below) just after the ocurrence and this invocation also fails.
const functions = require('firebase-functions');
module.exports = functions.https.onCall((whatever, context) => {
return true;
});
This situation happens only when the app is running on Android devices.
I´ve already tried to implement the retry strategy suggested here: https://cloud.google.com/storage/docs/retry-strategy, to no avail. I´ve implemented a timer to forcibly cancel the task, so the retry strategy could be implemented, but no matter how many times I repeat the invocation, it always fails.
I suspect the connection with the Firebase server is somehow being lost. How can I force a reconnection? Is there anything else I can do to solve this problem?
The app is still under development (the load on the Firebase servers is minimal and sporadic), but I´m afraid I won't be able to launch it until I solve this problem.
We're in this situation where our clients use our mobile application offline 95% of the time. At the end of their work day, when they get back to the office, they synchronize all data with our servers while they have network connectivity.
We have ACRA set up with the AcraHttpSender plugin to attempt sending us the crash reports directly, however this usually fails because they're using the application offline and ACRA stores the reports instead.
From what I understand the pending reports will only be sent by ACRA when the application is restarted, through ACRA.init. The problem is the users have no reason to restart the application at the end of their work day (while they have network connectivity). I have to stress that the users are complete tech illiterates, our clients made that clear to us.
So, we would really need to be able to tell ACRA to send us any pending crash reports it has during the short time network connectivity is available. Without user interaction of any kind. I was thinking maybe in the onCreate function of our main activity.
However I've been looking at documentation and other people asking the same question for a while and haven't found anything obvious. Is this possible?
EDIT: This is the current working code with the suggestion made by #F43nd1r and #CommonsWare. It wasn't working for me with 5.4.0, but with 5.5.1 it is.
Gradle
def acraVersion = '5.5.1'
implementation "ch.acra:acra-core-ktx:$acraVersion"
implementation "ch.acra:acra-http:$acraVersion"
implementation "ch.acra:acra-advanced-scheduler:$acraVersion"
implementation "ch.acra:acra-toast:$acraVersion"
Initialization
initAcra {
setBuildConfigClass(BuildConfig::class.java)
setReportFormat(StringFormat.JSON)
plugin<ToastConfigurationBuilder> {
setResText(R.string.acra_crash_text)
setLength(Toast.LENGTH_LONG)
setEnabled(true)
}
plugin<HttpSenderConfigurationBuilder> {
setUri("${BuildConfig.protocol}://${BuildConfig.host}/${BuildConfig.codemrc}/acra")
setHttpMethod(HttpSender.Method.POST)
setBasicAuthLogin("acra")
setBasicAuthPassword("******")
setEnabled(true)
}
plugin<SchedulerConfigurationBuilder> {
setRequiresNetworkType(JobInfo.NETWORK_TYPE_ANY)
setRestartAfterCrash(true)
setResReportSendSuccessToast(R.string.acra_report_sent_text)
setEnabled(true)
}
}
// Turn this on to obtain more messages in the log to debug ACRA
ACRA.DEV_LOGGING = BuildConfig.DEBUG
As #CommonsWare stated in the comments, AdvancedSenderScheduler is the way to go.
Example usage:
implementation "ch.acra:acra-advanced-scheduler:5.5.1"
#AcraScheduler(requiresNetworkType = JobInfo.NETWORK_TYPE_UNMETERED,
requiresBatteryNotLow = true)
In case you're not satisfied with AdvancedSenderScheduler options, you could also register your own SenderScheduler, but that should rarely be necessary.
My retrofit interface is:
#GET("vocab/word/details")
Call<EnglishWord> getWord(#Query("word_id") int id);
Call:
Call<EnglishWord> call = getSingleWord.getWord(id);
Log.d("WORDID",String.valueOf(id));
WordId is logged in the next line, but the call is:
vocab/word/details?word_id=0
Similarly for another put request same thing happens, word_id is passed 0 while I can confirm that the id is passed to the call(through logs).
Weird part is this problem is random, I have only seen it in marshmallow and only a few time. Any help would be appreciated.
After debugging a lot and not having a clue to my problem I tried disabling Instant run, and it works fine now.
Disabling the instant run solved the problem(hopefully). I have seen other posts (from Jan 2016) where the instant run was causing the problem, wonder why it's not fixed yet.
Problem: onRoomCreated returns STATUS_NETWORK_ERROR_NO_DATA 5% of the times, for no reason that we can figure out.
The game worked well for about 16 months of development (no problem of missing "permissions" in the manifest or the Google Play Console) but starting with Google Play Service 29 (that's "allegedly"... it could be unrelated), this unpredictable behaviour started, and it is blocking any further attempt to create a room (same wrong statusCode over again, even after restarting the game).
The only way to make it work again is either to restart the game after 10-15 minutes or to restart the device (usually works but not always).
The problem is the same on 3 different devices (no emulators here).
What we found about this problem (on SO and elsewhere) is that it could be related to NOT leaving the room (RealTimeMultiplayer.leave(...)) before trying to create a new one. So we are waiting AT LEAST the end of "onLeftRoom" (plus 3 sec, just in case) before trying to create a new room. To no avail.
Obviously we are following the recommended guidelines: instantiate GoogleApiClient in onCreate, .connect in onStart, .disconnect in onStop (even if .connect is on its way)...
Also notice that, because it's supposed to be a "NETWORK_ERROR", we are validating the Internet connection (with a ping) before each attempt to create a room.
Please, if you have ANY info about this issue, or if you know how to make Google Play create a room after this statusCode WITHOUT restarting the device, please let us know because until then our release date if forever postponed. Thanks a lot.
[UPDATE 2]: app was rewritten to stop making automated room creations (hence less frequent requests), and... nothing... still the same bug/problem/you-name-it ... over and over ... then Google Play updated the app "Google Play Games" (and maybe "services") and it kinda worked: only one STATUS_NETWORK_ERROR_NO_DATA in 2 weeks... now I'm going to sleep for 2 months because I'm [redacted].
[UPDATE]: according to this post, STATUS_NETWORK_ERROR_NO_DATA is used to limit the frequency of requests. Because we do automate room creations and closings this could be the definitive answer. I'll update once again when it's validated or not.
I leave the rest of this answer because, although it's not directly related, not closing rooms also induces errors.[END OF UPDATE]
Here's the answer to this unpredictable behavior (didn't come from SO but some of you might still be interested): in "onStart" and "onStop", all calls/requests to Google Play Services (with mGoogleApiClient or not) must be made BEFORE "super.onStart();" and "super.onStop();".
Otherwise what comes after "super.onStop();" will be interrupted before it's over (with some randomness added by device's speed and load), which means that any call to ".leave" an open room will fail, and then prevent the creation of a new room (hence the STATUS_NETWORK_ERROR_NO_DATA error).
Also notice that the variable mGoogleApiClient should not be declared as static, and remember to call "mGoogleApiClient.disconnect();" in "onStop" (the example "ButtonClicker" doesn't do that when it's recommended elsewhere).