This is a draft list of what we're thinking about measuring in Etsy's native apps.
Currently we're looking at how to measure these things with Espresso and Kif (or if each metric is even possible to measure in an automated way). We'd like to build internal dashboards and alerts around regressions in these metrics using automated tests. In the future, we'll want to measure most of these things with RUM too.
- App launch time - how long does it take between tapping the icon and being able to interact with the app?
- Time to complete critical flows - using automated testing, how long does it take a user to finish the checkout flow, etc.?
- Battery usage, including radio usage and GPS usage
- Peak memory allocation
- Frame rate - we need to figure out where we're dropping frames (and introducing scrolling jank). We should be able to dig into render, execute, and draw times.
- Memory leaks - using automated testing, can we find flows or actions that trigger a memory leak?
- An app version of Speed Index - visible completion of the above-the-fold screen over time.
- Time it takes for remote images to appear on the screen
- Time between tapping a link and being able to do something on the next screen
- Average time looking at spinners
- API performance
- Webview Performance
Hey Lara,
So, a couple more points wrt getting these statistics (again, talking about only the Android side of things)
To be clear, there's a lot of work here that needs to be done. Eventually, I'd love for these types of things to be rolled into Android's tooling.
In a super awesome happy fun fun world you'd be able to insert counters into your traceview/batterystats/heapmanager events that would correlate with the UIAutomator events, so that you could say "taking this action caused these performance events to occur"
From there, your CI tests become pretty straightforward. Design a typical user action flow, and for each user input (or system input, if you're using accelerator or something..). For each tooling system you're interested in gathering stats for, execute your test. A counter is associated with each event, for each file, and that way, at the end of it all, you can combine it into a single parsable timeline of performance across the test.
From there, standard data processing can help you understand flows/trends/commonality