Android Performance Optimization: Best Practices and Tools

Testing Tech Hub 2023-11-28 14:22 1500

This article summarizes the best practices and tools for optimizing Android app performance, covering topics such as render performance, understanding overdraw, VSYNC, GPU rendering, memory management, and battery optimization.

Introduction

In early 2015, Google launched a series of 16 brief videos on Android performance optimization best practices. These 3-5 minute videos aimed to help developers build faster and more efficient Android apps. The course not only explained the fundamental principles of performance issues in the Android system but also demonstrated how to utilize tools to identify performance problems and offered tips for enhancing performance. The primary topics covered include Android's rendering mechanism, memory and garbage collection, and power optimization. Here's a summary of these topics and recommendations.

1. Render Performance

The majority of performance issues, such as lag, experienced by users are primarily due to rendering performance. From a designer's perspective, they want the app to feature more animations, images, and other trendy elements to achieve a smooth user experience. However, the Android system might struggle to complete these complex rendering operations in a timely fashion. The Android system sends out a VSYNC signal every 16ms, which triggers the rendering of the UI. If each rendering is successful, it can reach the 60fps needed for a smooth display. To achieve 60fps, it means that most of the app's operations must be completed within 16ms.

If a specific operation takes 24ms, the system won't be able to perform normal rendering when it receives the VSYNC signal, leading to dropped frames. In this case, the user will see the same frame for 32ms.

Users may easily notice lag and stutter when the UI performs animations or while scrolling through a ListView, as these operations are relatively complex and susceptible to dropped frames, resulting in a laggy experience. Several factors can cause dropped frames, such as overly complex layouts that can't be rendered within 16ms, too many overlapping drawing elements on the UI, or excessive animation executions. All of these can lead to increased CPU or GPU load.

We can use tools to identify issues, such as HierarchyViewer to check if an Activity's layout is too complex, or enable Show GPU Overdraw and other options within the phone's developer settings for observation. Additionally, TraceView can be used to monitor CPU execution and more quickly pinpoint performance bottlenecks.

2. Understanding Overdraw

Overdraw occurs when a pixel on the screen is drawn multiple times within the same frame. In a multi-layered UI structure, if invisible UI elements are also being drawn, this can cause some pixel areas to be drawn multiple times. This wastes a significant amount of CPU and GPU resources.

In pursuit of more visually impressive designs, it's easy to fall into the trap of using an increasing number of stacked components to achieve these effects. This can easily lead to numerous performance issues. To achieve optimal performance, we must minimize the occurrence of overdraw.

Thankfully, we can enable the Show GPU Overdraw option in the developer settings on the phone to observe the overdraw situation on the UI.

Blue, light green, light red, and dark red represent four different levels of overdraw. Our goal is to minimize the red overdraw and see more blue areas.

Overdraw can sometimes be caused by excessive overlapping parts in your UI layout or by unnecessary overlapping backgrounds. For example, an Activity may have a background, then the layout inside it has its own background, and the child views each have their own backgrounds. By simply removing unnecessary background images, you can significantly reduce the red overdraw areas and increase the proportion of blue areas. This step can greatly improve the performance of the app.

3. Understanding VSYNC

To comprehend how an app is rendered, we need to understand how the phone hardware operates, and that requires knowing what VSYNC is.

Before explaining VSYNC, we need to understand two related concepts:

Refresh Rate: This represents the number of times the screen is refreshed in one second, which depends on the fixed parameters of the hardware, such as 60Hz.

Frame Rate: Represents the number of frames drawn by the GPU in one second, such as 30fps or 60fps.

The GPU renders graphic data, and the hardware is responsible for presenting the rendered content on the screen. They continuously work together.

Unfortunately, the refresh rate and frame rate can't always maintain the same pace. If the frame rate and refresh rate are inconsistent, tearing can easily occur (the top and bottom parts of the screen display content breaks, overlapping from different frames).

To understand the double and triple buffering mechanisms in image rendering, this concept is quite complex; please refer to http://source.android.com/devices/graphics/index.html and http://article.yeeyan.org/view/37503/304664.

Generally speaking, having a frame rate higher than the refresh rate is ideal. When the frame rate exceeds 60fps, the frame data generated by the GPU will be held back, waiting for the VSYNC refresh information. This ensures that there is new data to display every time the screen is refreshed. However, we often encounter situations where the frame rate is lower than the refresh rate.

In this case, the content displayed in some frames will be the same as the previous frame. The worst part is when the frame rate suddenly drops from above 60fps to below 60fps, causing lag, jank, hitching, and other unsmooth situations due to dropped frames. This is also the reason for poor user experience.

4. Tool: Profile GPU Rendering

Performance issues can be challenging, but luckily, we have tools to debug them. Open the developer options on your phone, select Profile GPU Rendering, and choose the On screen as bars option.

After selecting this, we can see detailed GPU rendering graphics information on the phone screen, specifically about the StatusBar, NavBar, and the active app's Activity area GPU rendering information.

As the interface refreshes, vertical bar graphs will scroll across the screen to represent the time required to render each frame. The higher the bar graph, the longer the rendering time.

There is a green horizontal line in the middle, representing 16ms. We need to ensure that the total time spent on each frame is below this line to avoid lag issues.

Each bar graph consists of three parts: blue represents the time to measure and draw the Display List, red represents the time required for OpenGL to render the Display List, and yellow represents the time the CPU waits for the GPU to process.

5. Why 60fps?

We often mention 60fps and 16ms, but do you know why an app's performance is measured by whether it reaches 60fps? This is because the combination of the human eye and the brain cannot perceive screen updates beyond 60fps.

12fps is roughly similar to the frame rate of manually flipping through a book quickly, which is noticeably not smooth enough. 24fps allows the human eye to perceive continuous linear motion, which is actually due to the motion blur effect. 24fps is the frame rate commonly used in film reels, as it is sufficient to support most movie scenes while minimizing costs. However, frame rates below 30fps cannot smoothly display stunning visuals, so 60fps is needed to achieve the desired effect, and anything beyond 60fps is unnecessary.

The performance goal for app development is to maintain 60fps, which means you only have 16ms = 1000/60 of time to process all tasks in each frame.

6. Android, UI, and the GPU

Understanding how Android uses the GPU for screen rendering is essential for grasping performance issues. A practical question to consider is: how is an activity's screen drawn onto the display? How are complex XML layout files recognized and rendered?

Rasterization is the fundamental process for drawing components like buttons, shapes, paths, strings, bitmaps, etc. It breaks these components down into individual pixels for display. This process can be time-consuming, and the GPU's introduction aims to speed up rasterization.

The CPU is responsible for converting UI components into polygons and textures, which are then passed to the GPU for rasterization rendering.

However, transferring data from the CPU to the GPU can be challenging. Thankfully, OpenGL ES can store textures that need to be rendered in GPU memory, allowing for direct manipulation the next time they are needed. If you update the texture content held by the GPU, the previously saved state will be lost.

In Android, resources provided by themes, such as bitmaps and drawables, are bundled into a unified texture and then sent to the GPU. This means that every time you need to use these resources, they are directly retrieved and rendered from the texture. As UI components become more diverse and abundant, more complex processes are required. For example, when displaying an image, it must first be calculated by the CPU, loaded into memory, and then passed to the GPU for rendering. Text display is even more complicated, as it requires the CPU to convert it into a texture, then hand it over to the GPU for rendering, and finally return to the CPU for drawing individual characters while referencing the content rendered by the GPU. Animation involves an even more complex operation flow.

To ensure a smooth app experience, all CPU and GPU calculations, drawing, rendering, and other operations must be completed within 16ms per frame.

7. Invalidations, Layouts, and Performance

Smooth and sophisticated animations are crucial in app design, as they significantly enhance the user experience. This section will discuss how the Android system handles updates to UI components.

Generally, Android needs to convert XML layout files into objects that the GPU can recognize and render. This operation is completed with the help of DisplayList. DisplayList holds all the data information that will be passed to the GPU for rendering on the screen.

When a View needs to be rendered for the first time, a DisplayList is created. To display this View on the screen, the GPU's drawing instructions for rendering are executed. If you need to render this View again due to operations like moving its position, you only need to execute the rendering instruction one more time. However, if you modify some visible components within the View, the previous DisplayList can no longer be used, and a new DisplayList must be created, rendering instructions re-executed, and the screen updated.

It's important to note that every time the drawing content in a View changes, a series of operations, such as creating a DisplayList, rendering the DisplayList, and updating the screen, will be executed. The performance of this process depends on the complexity of your View, the changes in the View's state, and the rendering pipeline's execution performance. For example, if a Button's size needs to be doubled, the parent View must recalculate and rearrange the positions of other child Views before increasing the Button's size. Modifying the View's size will trigger a resizing operation for the entire HierarchyView. If the View's position is changed, it will prompt the HierarchyView to recalculate the positions of other Views. If the layout is very complex, this can easily lead to severe performance issues. Minimizing overdraw as much as possible is essential.

We can use the previously introduced Monitor GPU Rendering to check rendering performance and the Show GPU view updates option in Developer Options to view update operations. Finally, we can use the HierarchyViewer tool to view layouts, making them as flat as possible, removing unnecessary UI components, and reducing the calculation time for Measure and Layout.

8. Overdraw, ClipRect, QuickReject

One critical aspect that causes performance issues is excessive and complex drawing operations. We can use tools to detect and fix overdraw issues for standard UI components, but they may not be as effective for highly customized UI components.

One trick to significantly improve drawing operation performance is by executing a few APIs. As mentioned earlier, drawing updates for non-visible UI components can cause overdraw. For example, after the Nav Drawer slides out from the foreground visible Activity, if the non-visible UI components inside the Nav Drawer continue to be drawn, this leads to overdraw. To resolve this issue, the Android system tries to minimize overdraw by avoiding drawing completely invisible components. Those non-visible Views inside the Nav Drawer will not be executed, thus saving resources.

Unfortunately, for overly complex custom Views (those that override the onDraw method), the Android system cannot detect what operations will be performed in onDraw, making it unable to monitor and automatically optimize, and thus unable to prevent overdraw. However, we can use canvas.clipRect() to help the system identify visible areas. This method specifies a rectangular area where only content within this area will be drawn, and other areas will be ignored. This API is great for helping custom Views with multiple overlapping components control the display area. Additionally, the clipRect method can help save CPU and GPU resources, as drawing instructions outside the clipRect area will not be executed, and components with partial content within the rectangular area will still be drawn.

In addition to the clipRect method, we can also use canvas.quickReject() to determine if there is no intersection with a rectangle, thus skipping drawing operations outside the rectangular area. After making these optimizations, we can use the Show GPU Overdraw mentioned earlier to check the results.

9. Memory Churn and Performance

Although Android has an automatic memory management mechanism, improper use of memory can still cause serious performance issues. Creating too many objects within the same frame is a matter of particular concern.

Android has a Generational Heap Memory model, where the system performs different GC operations based on different memory data types. For example, recently allocated objects are placed in the Young Generation area, where objects are usually quickly created and soon destroyed and recycled. The GC operation speed in this area is also faster than in the Old Generation area.

In addition to the speed difference, during GC operations, any operation from any thread must pause and wait for the GC operation to complete before other operations can resume.

Generally, a single GC operation does not take up much time, but a large number of continuous GC operations can significantly occupy the frame interval time (16ms). If too many GC operations are performed within the frame interval, the available time for other operations like calculations and rendering will naturally decrease.

There are two reasons for frequent GC execution:

Memory Churn occurs when a large number of objects are created and then quickly released in a short period.

Instantly creating a large number of objects severely occupies the Young Generation memory area. When the remaining space is insufficient, reaching the threshold, GC is triggered. Even if each allocated object occupies very little memory, their accumulation increases the heap pressure, triggering more GC operations of other types. This process can potentially affect the frame rate and make users perceive performance issues.

To solve the above problems, there is a simple and intuitive method. If you see multiple memory fluctuations in a short period in Memory Monitor, it is likely that memory churn has occurred.

Additionally, you can use the Allocation Tracker to view the same objects constantly entering and exiting the same stack in a short period. This is one of the typical signals of memory churn.

Once you have roughly located the problem, the subsequent problem-fixing becomes relatively straightforward. For example, you should avoid allocating objects in for loops, try to move object creation outside the loop body, and pay attention to the onDraw method in custom Views. The onDraw method will be called each time the screen is drawn and during animation execution. Avoid performing complex operations and creating objects in the onDraw method. For situations where object creation is unavoidable, you can consider using an object pool model to solve the frequent creation and destruction problem. However, it is important to note that you need to manually release the objects in the object pool after they are no longer in use.

10. Garbage Collection in Android

The JVM's garbage collection mechanism offers significant benefits to developers, as they don't have to constantly deal with object allocation and recycling, allowing them to focus more on higher-level code implementation. Compared to Java, languages like C and C++ have higher execution efficiency, but they require developers to manage object allocation and recycling themselves. However, in a large system, it is inevitable that some objects will be forgotten to be recycled, leading to memory leaks.

The original JVM's garbage collection mechanism has been greatly optimized in Android. Android has a three-level Generation memory model: recently allocated objects are stored in the Young Generation area, and when the object stays in this area for a certain period, it will be moved to the Old Generation, and finally to the Permanent Generation area.

Each level of the memory area has a fixed size, and new objects are continuously allocated to this area. When the total size of these objects is close to the threshold of the memory area, GC operation is triggered to free up space for other new objects.

As mentioned earlier, all threads are paused when GC occurs. The time taken by GC is related to which Generation it is in, with Young Generation having the shortest GC operation time, Old Generation being second, and Permanent Generation being the longest. The duration of the operation is also related to the number of objects in the current Generation. Naturally, traversing 20,000 objects is much slower than traversing 50 objects.

Although Google engineers are trying to shorten the time spent on each GC operation, it is still necessary to pay special attention to performance issues caused by GC. If you accidentally execute object creation operations in the smallest for loop unit, it will easily trigger GC and cause performance problems. Through Memory Monitor, we can see the memory usage status, and each instant memory reduction is due to GC operation at that time. If a large number of memory increases and decreases occur in a short period, it is likely that there are performance issues. We can also use the Heap and Allocation Tracker tool to see which objects are allocated in memory at the moment.

11. Performance Cost of Memory Leaks

Although Java has an automatic garbage collection mechanism, it does not mean that there are no memory leak issues in Java, and memory leaks can easily lead to serious performance problems.

Memory leaks refer to objects that are no longer used by the program but cannot be recognized by the GC, causing these objects to remain in memory and occupy valuable memory space. Obviously, this also reduces the available space in each level of the Generation memory area, making GC more likely to be triggered and causing performance issues.

Finding and fixing memory leaks is a tricky task, as you need to be familiar with the executed code, clearly understand how it runs in a specific environment, and then carefully investigate. For example, if you want to know whether the memory occupied by a certain activity in the program is completely released when the activity exits, you first need to use the Heap Tool to obtain a memory snapshot of the current state when the activity is in the foreground. Then, you need to create an almost memory-free blank activity for the previous Activity to jump to, and during the jump to the blank activity, actively call the System.gc() method to ensure that a GC operation is triggered. Finally, if all the memory of the previous activity has been correctly released, there should be no objects from the previous activity in the memory snapshot after the blank activity is launched.

If you find some suspicious objects that have not been released in the memory snapshot of the blank activity, you should use the Allocation Track Tool to carefully look for specific suspicious objects. You can start monitoring from the blank activity, launch the observed activity, and then return to the blank activity to end the monitoring. After doing this, you can carefully observe those objects and find the real culprit of the memory leak.

12. Memory Performance

Generally, Android has made many optimizations for garbage collection. Although other tasks are paused during GC operations, in most cases, GC operations are relatively quiet and efficient. However, if our memory usage is improper and causes frequent GC execution, it can lead to significant performance issues.

To find memory performance issues, Android Studio provides tools to help developers.

Memory Monitor: View the memory occupied by the entire app and the moments when GC occurs. A large number of GC operations in a short period is a dangerous signal.

Allocation Tracker: Use this tool to track memory allocation, as mentioned earlier.

Heap Tool: View the current memory snapshot to facilitate comparison analysis of which objects may have leaked. Please refer to the previous case for details.

13. Tool - Memory Monitor

Memory Monitor in Android Studio is a great tool to help us monitor the memory usage of the program.

14. Battery Performance

Battery power is one of the most valuable resources for handheld devices, as most devices need to be constantly charged to maintain continuous use. Unfortunately, for developers, battery optimization is often the last thing they consider. However, it is crucial not to let your app become a major battery consumer.

Purdue University studied the battery consumption of some of the most popular apps and found that on average, only about 30% of the battery power is used by the program's core methods, such as drawing images and arranging layouts. The remaining 70% is used for reporting data, checking location information, and periodically retrieving background ad information. Balancing the power consumption of these two aspects is very important.

There are several measures that can significantly reduce battery consumption:

We should try to minimize the number of times the screen is awakened and the duration of each awakening. Using WakeLock to handle awakening issues can correctly execute awakening operations and enter sleep mode according to the settings in a timely manner.

For some operations that do not need to be executed immediately, such as uploading songs and image processing, they can be performed when the device is charging or has sufficient battery power.

Triggering network requests will maintain a wireless signal for a period each time. We can bundle scattered network requests into a single operation to avoid excessive battery consumption caused by wireless signals. For more information on battery consumption caused by wireless signals due to network requests, please refer to http://hukai.me/android-training-course-in-chinese/connectivity/efficient-downloads/efficient-network-access.html

We can find the battery consumption statistics for the corresponding app through the phone settings option. We can also view detailed battery consumption using the Battery Historian Tool.

If we find that our app has excessive battery consumption issues, we can use the JobScheduler API to schedule some tasks, such as processing heavier tasks when the phone is charging or connected to Wi-Fi. For more information about JobScheduler, please refer to http://hukai.me/android-training-course-in-chinese/background-jobs/scheduling/index.html

15. Understanding Battery Drain on Android

Calculating and tracking battery consumption is a challenging and contradictory task, as recording battery consumption itself also consumes power. The only viable solution is to use third-party devices to monitor battery power, which can provide accurate battery consumption data.

The power consumption of a device in standby mode is minimal. For example, with the Nexus 5, turning on airplane mode allows it to be on standby for nearly a month. However, when the screen is lit, various hardware modules need to start working, which requires a lot of power.

After using WakeLock or JobScheduler to wake up the device to handle scheduled tasks, it is essential to return the device to its initial state promptly. Each time the wireless signal is awakened for data transmission, a lot of power is consumed. This is even more power-hungry than operations like Wi-Fi. For more details, please visit http://hukai.me/android-training-course-in-chinese/connectivity/efficient-downloads/efficient-network-access.html

Addressing battery consumption is another significant topic, which will not be expanded upon here.

16. Battery Drain and WakeLocks

Efficiently preserving battery power while constantly prompting users to use your app can be a contradictory choice. However, we can use better methods to balance the two.

Suppose you have a large number of social apps installed on your phone. Even when the phone is in standby mode, it will often be awakened by these apps to check and synchronize new data information. Android continuously shuts down various hardware components to extend the phone's standby time. First, the screen gradually dims until it turns off, and then the CPU goes to sleep. All these operations aim to save valuable battery resources. However, even in this sleep state, most apps will still try to work and continuously wake up the phone. The simplest way to wake up the phone is to use the PowerManager.WakeLock API to keep the CPU working and prevent the screen from dimming and turning off. This allows the phone to be awakened, perform tasks, and then return to sleep mode. Knowing how to acquire WakeLock is simple, but it is also crucial to release WakeLock promptly. Improper use of WakeLock can lead to severe errors. For example, the data return time of a network request is uncertain, causing something that should only take 10 seconds to wait for an hour, wasting battery power. This is why using the wakelock.acquire() method with a timeout parameter is crucial. However, simply setting a timeout is not enough to solve the problem, such as determining the appropriate timeout length and when to retry, etc.

To solve the above problems, the correct approach might be to use non-precise timers. Generally, we set a time for a specific operation, but dynamically modifying this time may be better. For example, if another program needs to wake up 5 minutes later than the time you set, it is best to wait until that moment and perform the two tasks simultaneously. This is the core working principle of non-precise timers. We can customize scheduled tasks, but if the system detects a better time, it can postpone your task to save battery consumption.

This is what the JobScheduler API does. It combines the current situation and tasks to find the ideal wake-up time, such as waiting until the device is charging or connected to Wi-Fi or executing tasks together. We can implement many flexible scheduling algorithms through this API.

Starting with Android 5.0, the Battery History Tool was released. It can view the frequency of the program being awakened, who awakened it, and how long it lasted.

Please pay attention to the app's battery consumption. Users can observe high battery-consuming apps through the phone's settings and may decide to uninstall them. Therefore, it is essential to minimize the app's battery consumption.

perfdog performance-testing

Read Previous Post >>

A Comprehensive Guide to Android NDK Development with Android Studio