I have an Android app Activity which fills a RecyclerView with some content from a DB hosted in Google's GCP CloudSQL. The data is served via a python flask app run by Google's AppEngine.
My problem is that some step or steps along the way from sending the GET request to having the RecyclerView inflated and full with the content introduce a heavy delay. The process takes about 10 to 12 seconds.
I have trouble understanding where the problem lies. I have tried the following and haven't been able to isolate a candidate for the delay. Taking into account that the Android app runs on an Android Studio emulator on my localhost:
If I run my flask app locally on my computer, but still request the data from the CloudSQL DB, the process is fast. So, it would seem that the problem is neither the DB nor the Android app RecyclerView inflating step (therefore, it must be the AppEngine flask app).
But if I run both the DB and the flask web server both on GCP and request the data via my web browser, I also get the data (JSON) fast. So, it would seem that the flask app hosted on GCP's AppEngine is also fine.
So, if according to the above tests all the three individual elements, the DB, the AppEngine Flask app and the RecyclerView inflation, all seem to behave good in terms of speed, why is it the chained process so slow in my app?
Most information out there for similar problems ascribe the slow response to AppEngine's cold start, but after many attempts, I am starting to think that this might not be my problem. Aside from the fact that, when I request the DB content via a web browser I get the response decently fast, I have checked and/or tested:
Reducing number of items in the list.
Setting the minimum number of instances to 1 + have enabled warmup requests (and my app processes them).
Setting the minimum number of idle instances to 1.
Use a WSGI production server (waitress) instead of the development flask service.
the AppEngine is located in "right" GCP Zone (europe-west-3). By "right" I mean the geographically closest to me, and the same region in which the CloudSQL DB is hosted.
Have set a "keep-alive" refresh cron job every minute to ensure no instance has to start cold.
Have tried going to manual scaling with one fixed instance.
None of the above solved the problem or reduced the total waiting time in the app. According to the AppEngine Dashboard, the loading latency of many requests is taking between 2 and 4 seconds, which is not the 10-12 seconds I have to wait on the Android app, but still seems abnormally, long taking into account all the measures in place for avoiding cold start (and, again, the fact that retrieving the DB info via web browser works at normal speed). This makes me think that either I have not successfully solved the cold start thing, or the latency problem lies elsewhere.
I am lost, and I do not know where to continue looking for issues. I would appreciate getting some tips in the right direction before I have to implement an in-device DB cache.
EDIT
Below there is a summary of HTTP request latencies to the web server (/refresh is the instance keep-alive resource, and /allrecords is an actual working endpoint). As it can be seen, the latencies are quite OK (which matches the good speed when retrieving the data via web browser).
I am quite confident the problem does not have to do with AppEngine cold start, so one would think the problem must lie within the Android app, but if I do the DB request via a web server in my local machine, the Android app works at normal speed.
EDIT 2
Retrieving info from the web server in JSON format via the web browser of the emulated device also works fast. So it does not seem to be a problem of the emulated device with internet connection speed.
Related
I am deploying my Nodejs sample app to Google App Engine Flexible env and when I am using google app engine URL which is in the form appspot.com to hit my API, it is taking around 11 secs to send response from my mobile data, but other APIs are sending response in milisecs.
Also, the time delay is only happening when I am opening my android app and sending request to the server after that all requests are taking normal time, and again delay is coming when I again open the app and send request to the server.
Edit - I found that
This can be a caused when your application is still booting up or warming up instances to serve the request and can be called as loading latency. To avoid such scenarios you can implement health check handler like readiness check so that your application will only receive traffic when its ready
That's why I checked in my Logs that readiness check is performed sometimes around 1 sec
and sometimes around 200 ms
Can anyone please tell me is there anything wrong in warming up my instances because I don't think cold boot time is causing this problem.
Edit 2
I have also tried to set min_num_instances: 2 so that once loaded atleast my 2 instances will again not get boot up, but the thing is delay is again same.
Edit 3
runtime: nodejs
#vm: true
env: flex
automatic_scaling:
min_num_instances: 2
max_num_instances: 3
Edit 4
I am noticing a strange behaviour that when I am using this app Packet Capture to capture traffic, then all https requests (if I am not enabling SSL Proxying) and all Http requests are executing in milisecs whereas without using this app all Http/Https requests are taking 11-16 secs of delay.
I don't know how but is there any certificate kind of issue here?
Edit 5
Below I have attached Network Profiler where delay is coming 15 secs
Please Help
Depends on which App Engine you are using and how you setup the scaling, there's always a loading time if you don't have a ready instance to serve a request. But if you have readiness check to ensure your instance is ready (and not cold started for the request), then there shouldn't be a problem.
Can you find a loading request or any corresponding slow request in your logs? If not, then it's likely an issue with the app. If possible, instead of calling this API on your app, do it from two apps (one is already open, one is not). So you make calls from both apps and if you notice that the one that's already open is getting a response faster than the other one, that means that's a problem with the app itself. App Engine can't determine whether or not your app is pre-opened so any difference would be client side.
=== Additional information ===
In the your logs, there's no delay at all. The request enter Google and was processed within a few milliseconds. I am sure there's something application-side. Maybe your app is constructing the request URL (first request) from some other source that results in the delay? App Engine has no knowledge of whether or not your app is opened or not or whether it's sending a first request after being opened, it cannot act differently based on it. As long as your App Engine instance is ready and available, it will treat your request the same way regardless of whether or not it's your first request after the app is opened.
The issue is resolved now, it was happening because of network service provider which is Bharti Airtel, their DNS lookup was taking the time to resolve the hostname. After explicitly using alternative DNS like Google 8.8.8.8 the issue got completely resolved. Maybe it's a compatibility issue of Airtel with Google Cloud.
Last time I checked I remember having to put a warmup request handler so that Google would know that the instance is up and running and can be used to answer calls. Keep in mind that code has to be EXACTLY under the endpoint you specify in the handler under the yaml file. (Wouldn't be the first time someone forgets that)
Here are the docs https://cloud.google.com/appengine/docs/standard/python/configuring-warmup-requests this is python specific, but you can also check other languages like Go, Java, and such in the docs.
If the problem is client dependant (each time a new clients spawns and makes a call it gets the latency) then it is most likely, either a problem with the client app itself or with initialization, registration or DNS resolution.
You could also try to reproduce the requests with CURL or similar, and see if also with those you see the mentioned delay.
My app has a login screen that is shown on startup, but there are no ajax calls made to my server until after the login button is pressed. There is no code that makes a call to my server immediately when the app starts, so there really isn't any reason for it to be downloading a huge amount of data on startup right?
I am however using Urban Airship for push notifications, could it be that that? It starts almost instantly when using WiFi, but on a perfect 4G LTE connection it takes around 15-20 seconds.
Given we've established you are loading all of your HTML, CSS, font and image assets locally then startup time should not be affected by having to load any of those over the slower network.
The only other thing I can think of would be that one or more plugins in your application are trying to do network operations on startup, some of which may be failing because for example the plugin is misconfigured or just plain written incorrectly.
I would suggest you look at the plugins you are using as your next focus area for debugging.
I am trying to test my webservice vs apps vs network performance. I have 3 team who is finger pointing each other for the Bad performance of the app. Webservice team says that the XML service returns data less than half second and App is taking long time to process the XML payload. The App team says that the webservice is slow to respond and app is loading data at 1 by 10 seconds. The QA guys says that both are faster, but the net is slow. Is there any tools to test the Apps fast vs Webservice Fast vs Network.
I would suggest to divide the tests before you are going to whole system performance test.
You have verify the load on each component independently from other components.
For example you can to develop the simple request simulator which will works against the web service. The simulator may be sets on the same LAN as the web service in order to avoid the network problems. Then you can measure the service responses stand alone. The same you can do with the App. Then if the both will work fast the network probably your problem.
The good and flexible new performance tool that I recommend is gatlink.
For the app side, you could at first simply add logs in your application. One log just before the HTTP request, one just after the response, then another just before parsing your payload and one after the parsing.
In these logs, you could write the current timestamp.
On a JBoss server, lies a JSP. Lets call it takestoolong.jsp
It does some processing that takes up to 30-45 seconds. (Yes, I know it should be optimized).
Then it returns. The 30-45 seconds is deemed too long for user experience for obvious reasons. So Akamai and load balancers are brought in so that this time can be reduced by caching the result of the request. At some point however, the jsp return content will change, and the cache will timeout. How do you prevent users from again seeing the 20-45 second download time? In particular how to you configure Akamai so that it does not use ip or other factors but returns processed result to the android device/user without the 30+ second delay? How to configure Akamai for Android devices?
My suggestion would be to isolate the takestoolong.jsp from the user all together, so that they only ever see the cached result....
to do that you'd want a secondary process that makes the request to the takestoolong.jsp page (it could be a simple cron job that hits the service and writes the result to an html page) and then point the users (or Akamai at the server delivering the static fragment of HTML.
that way you can refresh the results without the user seeing a delay and even when the content does change until the moment that the write is committed the user will still see the old content but no delay
[FWIW used this approach to deal with a similar issue ... huge, horribly complex SQL query that had to grab data from SQL Server then run a bunch of sub-queries against a MySQL database and consolidate the response. By using the intermediary output page and relying on IIS and browser caching caching that the users sometimes had slightly more stale data that was the absolute truth but they never got exposed to the actual response time of the underlying query]
I have developed android apps, and have a web server application which serves REST style JSON, to the apps.
My apps are strongly dependent on that web services but as traffic gets higher, users' complaint started, as force close problems. I am not sure but maybe my server (AWS small instance) may not answer all requests correctly or in time.
I am planning to retry the web request when a problem on getting json response arise instead giving the error/net-connection alert.
I guess there are many developers who integrates apps with web services, so what is the good practice on handling network problems?
Or is the frequency of such network problems acceptable?
I take about 10-20 problem per day.
I have about 200.000+ web requests per day, for a AWS small instance (1.7 RAM), dedicated to server Tomcat. I analyze the logs there is no clue, no error log. Also the errors are spreaded.
You need to start with analyzing the problem, and determine the root cause or root causes of your issues. You always need to take into account that
a network connection might drop
a users switches from 3G / WiFi
the android devices "thinks" it's connected while in fact it's not
Also, be very sceptical when using the Android ConnectivityManager / NetworkInfo. Only trust it when it states that it is not connected. If it is connected, check it yourself (as sometimes, user is on a hotspot and the only connectivity he has is with a login page).
The application needs to handle all these scenarios properly. The way it's presented to the user depends on the use-case (do you want the user to be informed of the error, do you silently ignore it and just retry, ....)
In terms of retrying webservice connections, there are several ways to implement this :
exponential backoff
periodic rescheduling
event-driven triggering
retry-after moratorium intervals
You need to start by putting sufficient logging both on the client (Android) and on the server (AWS) so that you can analyze the issues and draw the proper conclusions.
I think the answer to your problem lies in the design of your android app.
You need to take into consideration the worst case scenario and redesign your application to take that into account and recover. Dealing with the chaos monkey - jeff atwood.
Personally I never allow an android app to be in a state where it needs to force close. For any or all network connection I assume that the connection is down, lossy, not all data can be retreived and (finally) up and working correctly.
That way my app will degenerate gracefully. If it needs web access it'll make an attempt in a background thread allowing the user to continue using the app, it will cache previous requests and will retry until it gets a connection or gives a nice toast to the end user.