The reason for starting with the topic of front-end high concurrency is twofold. Firstly, I have been responsible for some ultra-high concurrency businesses, so I have some experience in this area. Secondly, compared to business function optimization, which involves the collaboration and discussion of multiple parties such as product and design teams (the logic itself does not purely come from the front-end developers), front-end high concurrency optimization is purely controlled by front-end developers. As a front-end developer myself, perhaps the insights and perspectives I can draw from this will be deeper and more universally applicable.
When it comes to optimization, people usually do two things upon receiving an "optimization indicator" task - analyze the pain points corresponding to the "optimization indicator" and find technical solutions to address these pain points and implement them. Is this enough? My answer is no. In my understanding, this is only the first level of optimization. Although using better technology often results in better optimization effects, it is not perfect, and the optimization can be further improved. So, what is missing? In the following, I will gradually elaborate on my optimization ideas. First of all, the general optimization approach is fundamental. Let's take a look at the basic frontend high-concurrency strategies under the general optimization approach.
The core difference between high-concurrency scenarios and ordinary scenarios is the surge in parallel access volume. Therefore, the essence of frontend high-concurrency strategies is to solve the problems brought about by the increase in access volume. So, what problems does the surge in access volume bring?
Let's first take a look at a normal access flow chart for an H5 (HTML5) web page:
Under normal circumstances, the data flow from the user end to the backend is well-balanced, and the user's access volume is within the backend's acceptable range.
However, in high-concurrency scenarios, if no high-concurrency strategies are implemented, the original access flow chart will change to this (the requests from the frontend to the backend in the red area may be rejected by the backend or even crash the backend):
In the chart, we can clearly see the pain point of high concurrency: the imbalance between the two ends of the data flow process. To solve this pain point, we need to bring the two ends back to a balanced state of data flow. This can be approached from two aspects: on one hand, the backend should provide as much carrying capacity as possible (e.g., adding more machines); on the other hand, the frontend should strengthen its ability to streamline and filter requests as the "gatekeeper" between users and the backend.
After strengthening the frontend "gatekeeper" ability, we expect to see the access flow chart like this:
Although the user concurrency is high, under the frontend high-concurrency strategy, the pain point of the imbalance between the two ends is resolved. So, what are these high-concurrency strategies? Let's explore them one by one.
The front-end "gatekeeper" role needs to strengthen two capabilities: streamlining and filtering.
First, let's look at the streamlining technical solutions. If we compare the backend's carrying capacity to a "circle," then the channel between the frontend and the backend is like a water pipe with the width of the circle as its outlet, where the water can be understood as the requests in the H5 web page. In H5, there are actually two such circles: one is the maximum concurrency, and the other is the maximum bandwidth. They correspond to the number of requests and the size of requests in parallel requests. Streamlining these two can provide more "water" in and out while the area of the "circle" remains fixed.
Therefore, in streamlining technical solutions, we need to be able to streamline the number of requests and the size of requests in parallel requests.
1. Streamlining the number of requests
When the number of requests can no longer be streamlined from a logical perspective (e.g., removing some unnecessary requests), we often focus on purely technical solutions.
The current H5 request streamlining solutions are roughly as follows, with the core being: merging.
The chart lists the commonly used resource types in H5 (there are others, such as video and audio, which are not listed one by one). As can be seen from the chart, with the current technology, the reduction of the number of requests can be as extreme as possible. In extreme cases, it is possible for a business to have only one request.
2. Streamlining the size of requests
Similarly, when the size of requests can no longer be streamlined from a logical perspective (e.g., removing some unnecessary functions or code), we often focus on purely technical solutions. The current H5 request size streamlining solutions are roughly as follows, with the core being: compression.
As can be seen, with the current technology, the size of each type of resource request can be further compressed and streamlined.
The above discusses the technical solutions for the frontend "gatekeeper" streamlining ability. Now let's look at the technical solutions for the frontend "gatekeeper" filtering ability. Frontend filtering can be understood as adding a layer of mesh to the frontend "gatekeeper" that can transition specific substances, used to filter out substances that don't need to enter. There are many ways to filter specific "substances." One is passive, which only allows a specific amount of water to pass through, and the excess part cannot enter. This strategy is generally used in the backend and is called "overload protection." Another is active, which sacrifices the timeliness of data. In the frontend, this is generally called "local caching." When a request is made and the frontend has cached content, there is no need to access the server again.
Therefore, in the filtering technical solutions, the frontend can complete this through caching.
1. Cache filtering requests
The current H5 request filtering solutions are roughly as follows, with the core being: caching.
By using specific front-end caching techniques, we can achieve the "filtering" effect by directly obtaining the requests that originally needed to reach the backend from the front-end cache.
After completing the above two steps - analyzing the essential pain points and finding feasible technical solutions, the common practice is to choose the appropriate solutions and apply them to our projects. For merging, we will merge files of the same type; for compression, we will compress all uncompressed code; for caching, we will enable longer http cache, use local storage caching, and use offline packages. Although the overall strategy will be effective to some extent, I believe it is often not enough. To achieve more thorough optimization, it is necessary to think more deeply about the optimization solutions and scenarios themselves and make strategic adjustments. This often requires the appropriate thinking patterns. In the following, I will discuss some of the thinking patterns that I have summarized for deeper optimization, focusing on the differential thinking.
Differential thinking emphasizes understanding the technology and scenarios in depth, and then breaking down the technology and scenarios in a differentiated manner to achieve further technical optimization for each differentiated scenario.
From the first two steps - analyzing the essential pain points and finding feasible technical solutions, we learned that high-concurrency response at the frontend technical level can be approached from merging, compression, and caching. A simple principle is that the more thoroughly these strategies are implemented, the more concurrency the frontend can block. However, in reality, we often cannot do this and can only choose a more compromised solution.
For example, considering the impact on page access time, we will not merge the entire H5 project resources into a single request. The reason is that, essentially, each pure technical strategy has its advantages, but it will inevitably bring some disadvantages,as the saying goes, all things have advantages and disadvantages. When this disadvantage affects the project's core capabilities (such as the page access time in the experience aspect), even if the solution can better improve concurrency capabilities, it often will not be adopted after weighing the pros and cons. This is the reason I mentioned earlier that choosing only a compromised optimization solution is not thorough enough.
Under the balance of pros and cons, we often choose a compromise solution (such as Strategy 3 in the diagram below):
A more thorough optimization should involve understanding the disadvantages caused by each solution, analyzing the project scenarios based on the impact of the disadvantages, and implementing differentiated strategies according to the tolerance level of the impact of the disadvantages in each scenario. For scenarios that can accept the impact of the disadvantages, use the optimal solution; for scenarios that cannot accept the impact of the disadvantages, use a better solution; and so on, until using a compromise solution. This will achieve a differentiated and refined optimization. The overall strategy after optimization will become similar to the form shown in the diagram below, where the original project simply used the compromise strategy 3. After differential processing, some project modules will use better strategies 1 and 2.
In the following, I will use this differential thinking to further optimize the three strategies obtained from the previous two steps in frontend high concurrency - merging, compression, and caching.
When code merging reaches a certain extent, its disadvantages will gradually become more prominent.
Disadvantages include: a single request being too large, causing an impact on the page's first-screen rendering time; after merging dynamic and static requests (cgi + html), the requirements for cache timeliness will be greatly increased (cache timeliness is determined by the highest requirement among the merged resources, following the barrel principle).
Based on the impact of each disadvantage, I will conduct a differential analysis for specific scenarios below.
1. Differentiated decomposition based on "resource first-screen experience relevance"
Regarding the issue of a single file being too large after merging, which affects the page's first-screen rendering time, we can start from the impact and differentiate the page network resource requests based on the first-screen experience relevance. This way, we can minimize the impact of merging on the user experience.
Simply put, we can divide resources into two parts - high-relevance resources (first-screen) and low-relevance resources (non-first-screen). Each part of the resources is processed separately with the strategy, trying to achieve the most extreme merging and improve concurrency capabilities. When encountering a file that is too large after merging and affects the rendering time, further subdivide within this level, and so on.
For example, for CSS, first-screen rendering-related JS and image resources can be considered as high-relevance resources, embedding images as base64 in CSS, and then inlining them into the HTML page and merging with the page. For non-first-screen related JS and image resources, they are considered as low-relevance resources and merged separately. This way, it can reduce the number of concurrent requests to the maximum extent without affecting the first-screen rendering time experience.
2. Differentiated decomposition based on "resource timeliness dependency"
For the merging of dynamic and static requests (cgi + html), which affects cache timeliness and leads to higher cache timeliness requirements, we can also start from the impact and differentiate the page requests based on the timeliness dependency.
Simply put, we can divide the pages into two categories - high timeliness requirement pages (uncontrollable entry, no caching) and low timeliness requirement pages (controllable entry, can be updated by modifying the offline package, cacheable). For high timeliness requirement pages, merging dynamic and static requests will not affect this type of page, and cgi and html can be merged for such pages. For low timeliness requirement pages, these pages can be cached (e.g., using an offline package), and cgi and html are not merged.
By adopting the most optimal merging strategy for specific scenarios in a differentiated manner, the optimization effect will be further improved.
Similarly, code compression also has its disadvantages.
Disadvantages include: the higher the compression degree, the worse the code readability, making it inconvenient for online problem location; although there are better compression algorithms, the algorithms themselves have their own limitations.
1. Differentiated decomposition based on "resource readability dependency"
For the impact on code readability, there are actually code-level solutions on the market, such as project support for debug mode switching (this solution is a kind of differential thinking, dividing the scenarios according to the differences in usage scenarios into high code readability requirement scenarios and low code readability requirement scenarios, such as online code belonging to low readability requirement scenarios, using extremely compressed code; development debug mode code belonging to high readability requirement scenarios, using uncompressed code, and the two modes can be switched through parameters) and sourcemap.
2. Differentiated decomposition based on "resource platform support degree"
For the limitations of each compression algorithm (or the limitations of the products under each compression algorithm, such as various image formats, each format has its limitations), it affects the unsupported part of the platform, making it impossible to use on that part of the platform. From the impact, we can differentiate the network resources according to the platform support degree. We can sort the compression effect of the compression algorithm, and from high to low, judge and filter the platform support degree of the solution in a differentiated manner. If supported, use the current algorithm type (format); if not supported, judge and use the next algorithm type (format).
For example, for image resources, the formats are diverse, and the variety of formats actually comes from the different compression algorithms used by each format, each with its own strengths. At this time, we cannot use only one most universally applicable format but should take advantage of the above differential thinking to load images. According to the compression degree of each image format, for platforms that support tpg (called sharpp within the company), request tpg format images; for those that do not support tpg, judge whether they support webp; for platforms that support webp, request webp format images, and if they do not support webp, judge further down. For the strengths of image formats, use jpg format for images with rich colors, and png format for images with simpler colors or requiring transparent channels, and differentiate the selection of image formats according to what is most suitable. Even for image size (size is not related to compression, but the purpose is to reduce the request size, so it is used for analogy), we can also adopt this differential thinking, such as returning the most suitable size image to the client according to the current client's resolution, thus achieving high-resolution clients requesting high-resolution images and low-resolution clients requesting low-resolution images.
By adopting the most optimal compression strategy for specific scenarios in a differentiated manner, the optimization effect will be further improved.
Similar to the above, caching strategies also have corresponding disadvantages.
Disadvantages include: the longer the cache time, the worse the data accuracy, and there will be a problem where the cached data is still valid but has a significant difference from the latest data.
1. Differentiated decomposition based on "resource timeliness dependency"
Regarding the impact of resource validity on cache duration, we can grade resources based on their timeliness. This can be roughly divided into controllable update resources and uncontrollable update resources. Here, controllable and uncontrollable refer to whether the frontend page can be aware of the resource update in real-time. For frontend developers deploying JS, CSS, images, and other resources, they can all be considered as controllable update resources and set to have a very long cache time because the version information can be synchronized to the frontend page in real-time when these resources are updated (e.g., changing the file name, updating the timestamp, etc.). For resources that cannot synchronize version information to the frontend page in real-time, they can be considered as uncontrollable update resources. For these resources, we can further differentiate and grade them based on the business requirements for the timeliness of each resource.
Taking the example of QQ avatar resources used in H5 projects in Mobile QQ, in this scenario, the avatar is an uncontrollable update resource for the project. When users modify their avatars using Mobile QQ or PC QQ, H5 projects are unaware of this change, and H5 will not receive real-time update notifications (unless both parties synchronize notifications at the interface level). At this time, if the avatar cache time is set too long, the user's updated avatar will still appear as the old one in the H5 project. However, if the avatar is not cached, it will inevitably cause significant concurrent pressure on the avatar server in high concurrency scenarios. In this case, further differentiated decomposition is needed for this uncontrollable update resource.
For H5 projects with strong social interaction in Mobile QQ (such as Mobile QQ Red Packet, Mobile QQ AA Collection, etc.), although there are many avatars, the timeliness requirements for each avatar are different. Simply put, we can divide avatars into two categories: high-timeliness avatars and low-timeliness avatars. After some analysis, it can be found that for users, their own (owner state) avatar change is the most sensitive. If users modify their avatars on Mobile QQ or PC QQ and then enter the H5 project to find that their avatars have not changed, it is not very tolerable. In this case, the user's own avatar can be considered as a high-timeliness avatar. As for the timeliness of other users' (guest state) avatars, users don't care too much about whether they have changed or not, so avatars other than the user's own can be considered as low-timeliness avatars. Finally, in the strategy, cache high-timeliness avatars for a shorter time and cache low-timeliness avatars for a relatively longer time. (It is also easy to implement, as the differential logic can be placed in the frontend judgment and then add a timestamp to determine the cache time.)
By adopting the most optimal caching strategy for specific scenarios in a differentiated manner, the optimization effect will be further improved.
Under the guidance of differential thinking, high-concurrency optimization strategies have been further improved. The core idea of this thinking is to balance the advantages and disadvantages of the solution with the actual scenarios. From a general perspective, this thinking is also applicable to many aspects of work and is a universal way of thinking, not just limited to solving the frontend high concurrency issue. At the same time, not all solutions can perfectly solve problems by just adopting differential thinking. Differential thinking is just one of many ways of thinking. In fact, there are many other ways of thinking. An excellent optimization solution is often the final product of multi-dimensional thinking and balancing.
For example, boundary magnification thinking refers to the idea that when we do something, our vision should not only stay within the domain that we can fully control, but we should also enlarge the boundary and consider the solution to the problem from a broader perspective.
For instance, the cache strategy mentioned earlier actually has a disadvantage in the current H5 cache strategy that needs to be optimized using boundary magnification thinking. This disadvantage is that the current browser caching technology has its own limitations, and the effectiveness of caching depends on the degree of user's secondary access. The core problem of the disadvantage is that the cache timing is coupled with the user's first access. This makes the caching effect not as significant as imagined in some ultra-high concurrency H5 activities (unlike business H5, most users access activity-based H5 for the first time).
This cannot be solved at the pure frontend technology level. However, when we enlarge the thinking boundary and consider the platform that carries H5, this disadvantage may be resolved. This is because the core problem to be solved by this disadvantage is the decoupling of resource caching and page access, and the platform side (especially the terminal) has the ability to do this. For example, in Mobile QQ, this solution is called "offline package." Offline packages support passive caching and active caching. The page content can be cached on the user's Mobile QQ client without the user's active access, through pre-downloading or active push. First-time visitors can also directly hit the cache. This can greatly improve the effectiveness of caching. During the Spring Festival, various high-concurrency H5 projects in Mobile QQ used this technology to improve the high-concurrency capabilities of the pages.
Comprehensive logical thinking means that when we do something, our vision should not only focus on the local logic but also see the whole logic. For example, logic has normal states and abnormal states, and we cannot only consider normal states. Logic can be bidirectional or unidirectional, and for bidirectional logic, we cannot only consider the positive direction.
In fact, all the frontend high-concurrency strategies I mentioned earlier (including the diagrams) only considered the positive logic segment of the data flow, that is, the process of data flowing from the user side to the server side. The reverse logic segment of the data flow has not been considered (i.e., the logic situation where the data returns from the server-side to the user side). In high-concurrency scenarios, the reverse logic segment of the data often plays a crucial role in the logic. High-concurrency strategies that do not consider the reverse logic segment of the data flow can only be considered half completed, no matter how well the data is optimized. The following is the entire logic process of data flow (the red part is the reverse logic segment of data flow):
In the reverse logic segment of the data flow, the frontend's role in this logic layer becomes the data recipient, and the received data may have multiple states. The frontend needs to handle these states accordingly. In high concurrency, if the backend is overloaded, some data may return abnormally. In this case, at the simplest level, we can divide it into two states - success and failure. In the success state, the page is displayed normally, while in the abnormal state, the experience should be downgraded as much as possible rather than being completely unavailable.
For example, if a static resource CDN request fails, the frontend can perform such an exception logic handling: temporarily switch the static resource domain of the current abnormal user to the backup domain (such as the page domain or backup domain). This can downgrade the experience that should have been a white screen and unusable to a slower access experience, and also provide adjustment time for CDN machine expansion when the error rate exceeds the threshold.
Of course, if we combine the above differential thinking, we can also differentiate the overall state of the current server (such as the current load level), inform the page of the degree of the current server concurrency through configuration return and other strategies, and the frontend can handle these states differently, gradually downgrading.
For example, when the cgi concurrency exceeds a certain limit, the frontend can consider gradually blocking the entry of some non-core but high-traffic cgi pages, and finally only retain the core cgi entry, thus ensuring that the core functions of the project are not affected by high concurrency.
This article is a collection of my thoughts from working as a frontend developer for four years. Based on the optimization point of frontend high-concurrency strategies, I have gradually explained my thoughts on the "technique" and "thinking" aspects to you. I hope that the strategies and thinking mentioned in the article can provide some insights for you. Thank you for your patience in reading!
WeTest Quality Open Platform is the official one-stop testing service platform for game developers. We are a dedicated team of experts with more than ten years of experience in quality management. We are committed to the highest quality standards of game development and product quality and tested over 1,000 games.
WeTest integrates cutting-edge tools such as automated testing, compatibility testing, functionality testing, remote device, performance testing and security testing, covering all testing stages of games throughout their entire life cycle.