Code Refactoring in Practice: Enhancing Maintainability and Scalability

Testing Tech Hub 2023-10-30 11:37 290

In this article, we will discuss the importance of code refactoring, its principles, and how it can be applied in a real-life example to improve the internal structure of a software module.

An important point in the definition of refactoring is "an adjustment of the internal structure of software"

Background

Code refactoring and design patterns (which are not really related, I forced them together) seem to be quite mysterious things. Many times, holding onto the classic "Design Patterns" book by Gof (the group of four who wrote the book) and chewing it twice, I feel that the code is already familiar to me, and I can freely code it; However, in actual projects, how to code or how to code - what mode do you ask me to use? Cough, I can't even keep up with PM's needs. Whatever the mode or not. As for code refactoring, just implement the functions. Do you think the code is too much, messy, complex, disorganized, and incomprehensible? Hello, who knows who will maintain this code tomorrow? If you don't understand it, mind my own business. On the other hand, some books that focus on design patterns and refactoring all use Java as an example, which is the golden key to OOP. They are born with noble language lineages (don't beat me for using the best PHP), and these high-end products seem to be naturally prepared for these high-level languages. I have been working as a homeowner for some years, and I have been mainly working on C/C++old dogs with code toolbox. Just code your features well, what design mode do you use and what do you want?

Prejudice always exists. In fact, design patterns are not new knowledge in the technical world, but rather outdated. Whether it's beginners or experienced veterans, they will inadvertently use them in development - even if you haven't seen design patterns, you often naturally feel that code should be written like this. Don't believe it? You can flip through the higher powered version of "POSA: Pattern-Oriented Software Architecture" (the former is just a software design pattern, while the latter is an architecture-level design pattern: in quintuplicate, three to four thousand pages, a classic masterpiece, highly recommended, and it will take a long time to flip), and there must be a shadow of the code you have developed in it. The focus of this article is actually code refactoring, which is not outdated. As long as programmers have a pursuit, they will definitely make refactoring their daily routine. So, the starting point of this article is to illustrate the value of refactoring for code maintainability and scalability, the pace of refactoring, and the principle that language is a floating cloud, by summarizing a small example of refactoring in actual development; The mode used is also the most basic mode, and the examples are at the level of the introductory examples on the 5 pages after the introduction when writing the book. So big and small cows can skip it:). I also want to prove that C++ is not just a 'classy C'!! (Vote for the best language, I will give C++ a vote!... Impulsive, it seems a bit vague).

Some theories of reconstruction

The following theoretical section (in bold) is excerpted from the book "Refactoring", which is somewhat dull, so I will also provide some straightforward explanations (not in bold).

What is refactoring?

Refactoring is an adjustment to the internal structure of software, aimed at improving its comprehensibility and reducing its modification costs without changing the observable behavior of the software.

To put it bluntly, this job is not done by the product manager, not for new requirements, but by aspiring programmers who spontaneously upgrade and optimize their code for maintainability and scalability. So, regardless of whether the product manager assigns you a division of labor for refactoring or not, this job must be done.

Why refactor?

Improving software design: Many programs are initially designed with a lot of hype, but with the increasing scale of the program, no one cares about this or that, the code rots, and the design crashes.

Making software easier to understand: During development, the code has to fly, and after one week of code completion, I can't understand it myself. Isn't this a rare occurrence?

Help find bugs: "After the code is completed and the test is passed, I won't be able to read it. Do you think I need to find a bug myself? I'll go. After I finish developing, I still need to find a bug myself. Why do I need to test my classmates?" Those who have this idea raise their hands voluntarily~

Improving programming speed: Good code makes it easier to extend features, while bad code can't even add a feature if you want to.

When to refactor?

Third order rule, when adding features, when fixing errors, when reviewing code

The theory of the rule of three is quite interesting: if things go too far, refactor them the third time you feel disgusted. In fact, in reality, when you look at the code and feel a bit uncomfortable and nauseous with extended features, you may not be so busy and have time (or the enlightened product manager may have assigned you separate hours for refactoring). Don't hesitate, to refactor.

The relationship between refactoring and performance

Although refactoring may make software run slower, it also makes software performance optimization easier. In addition to real-time systems with strict performance requirements, the secret of "writing fast software" in any other case is to write adjustable software first, and then adjust it to obtain sufficient speed.

The language extracted from the book is somewhat ambiguous. In fact, to be more bold, the performance and elegance of software are mostly contradictory. However, if everything is based on ultimate performance, there will be no high-level languages on assembly; Furthermore, is the performance loss caused by refactoring really the root cause of software slowness? Usually not - performance optimization is also a special topic, and the key to performance optimization is to find the hotspots with the highest losses, while optimization is the main contradiction. The practice has shown that the losses caused by refactoring are often very small.

A major enemy that affects software maintenance and expansion is "coupling" - design coupling, code coupling, header file coupling, variable coupling, function coupling, data coupling, compilation coupling, and business logic coupling. This is also one of the problems that refactoring needs to solve. Before the landlord came to Goose Factory, he developed an advertising search engine and telecom cloud platform at Wolf Factory and Wolf Company. The code he came into contact with at Wolf Factory was not that large and the quality was decent, so I won't mention it for now; But in Wolf Company, can you imagine a scenario where there are three or four small departments under a large department, from an architect who has been working for more than ten years to a newly graduated recruit, with a bunch of people racking their brains, from tools to human flesh, trying every means to decouple a software that has been developed for over 10 years and has over 8 million lines of code? It is truly a magnificent and even tragic movement. Therefore, in the stage where the code size can still be controlled, timely and appropriate refactoring of the code is a very effective way to extend the software lifecycle. The book "Refactoring" has formed a methodology for the conventional techniques of refactoring, but in summary, the idea is to merge and split - merge as much as possible, and split as much as possible. For many other things, the thinking is nothing but this.

Examples

It's time for some dry goods. Firstly, let's introduce the small example optimized in this article, which is the source code of the backend ServerCenter module for the stress testing product on the Wetest platform. (Do you ask WeTest? WeTest Tencent Quality Open Platform is an open testing platform that has opened up excellent testing solutions and tools from Goose Factory, which has been established for over a decade and has been honed by thousands of games, to game developers.). The stress testing tool among them even has the slogan "X8, X8, as long as X8, the machine is at your disposal". Are you sure not to use it?

The ServerCenter module is developed based on the tsf4j framework of the Interactive Entertainment R&D department. Simply put, it is the main module for message forwarding in the pressure testing backend, forwarding different messages to different downstream modules. At the implementation level, the main logic of the program is a message-processing function, with a very simple idea and a typical message-based backend logic. Several characteristics: single process, stateless, data sharing in external storage such as Redis/DB, and completing scale out by increasing the number of processes. This architecture plan is very simple and is also determined by the current business scale and characteristics. Backend experts, this article is about code refactoring. We won't discuss the architecture issues here:

Code V1 version

Following the tapp notation of tsf4j, the ServerCenterApp class is the main logical class of the backend program, where process_ Pkg is the main logical function of message processing. Different message processor functions will be called for different messages. The process in each message processor function is not simple, and more small functions have been split out.

Due to the complexity of the business, there are too many messages to process, so the initial message processing core function process_ PKG is like this:

This can be said to be the roughest and fastest implementation method. A switch case function takes care of everything, without splitting files or complex designs. The function is quickly developed, and the code productivity is greatly improved~

However, one day later, every time a new feature was added, I had to find an insertion point in this densely packed file with thousands of lines, and the unpleasant feeling gradually accumulated. Although the backend logic of message processing is also easy to write like this (when I was at Wolf Company, I once saw a department where a new employee didn't do anything one month before joining the company, just read a message processing function with thousands of lines, and after gnawing it out, I could start working. It's eight miles away, and I pulled it back), but when I thought of the foreseeable future, I still couldn't maintain this code, Just felt uncomfortable all over (- quote from Brother Dali, where is the buddy from Station B?). Alright, let's refactor it - this is the third law mentioned in refactoring theory.

Code V2 version

The message processing code for this different branch is very suitable for using the simple factory pattern in the design pattern. As the name suggests, the simple factory model is very, very, very simple, and the principles will not be repeated. Applying it to this example is to abstract each message processing logic into a product (MsgProcessor), and to abstract the main logic of the switch case into a factory that produces the product (MsgProcessorFactory). Each time the main logic creates the MsgProcessor that processes the corresponding message by the factory based on the message code of the message, the main logic only calls its process_ PKG can complete message processing without worrying about the specific MsgProcessor.

In this way, the coupling between the main logic of the program and the processing of different types of messages in the business is decoupled - merging, splitting, decoupling, and modularization, which is both the method and goal of refactoring.

1. After refactoring, the core code of ServerCenter is shown in the following figure. It can be seen that as long as the MsgProcessor object that processes specific messages is created through the factory, its process is called_ The pkg function is sufficient, and this framework code is much simpler and more stable, which is also an implementation of OOP:

2. The code for the newly added MsgProcessorFactory factory class after refactoring is as follows:

As shown in the above figure, the logic of the switch case is migrated to the factory, and each time the factory dynamically updates the corresponding MsgProcessor class based on different message codes.

3. The definition of the truly working MsgProcessor is as follows. Due to the lack of an interface in C++, a pure virtual base class similar to an interface is defined using a pure virtual function approach. The various public resources in the original ServerCenter, such as various connection resources and buffering for receiving and sending, were passed to the Processor through the ProcessorRsc class for specific processing:

The real work code for handling various messages in the V1 version of the code has been split into various derived classes of MsgProcessor, as shown in the following figure. Although it may seem like there have been more files, the functional points have also been split more finely, making maintenance and modification relatively easy. Especially when the scale of the project really increases - for example, when the business is maintained by different project teams or personnel, this has even greater significance.

Code V3 version

Is what the V2 version has done enough?

From an implementation perspective, code is already a standard design pattern. However, when considering the language and business scenario used, it is possible to add and delete processor objects in C++ during each message processing, which may cause significant memory fragmentation and jitter. Especially in backend programs, it is generally inclined to have everything planned and planned, such as pre-allocated resources and pools, to ensure system stability and controllability. So here, optimization can be combined with business scenarios: MsgProcessor is just a logical concept object, and a single process ensures that class instances can be reused without the need for dynamic allocation each time. Because the code can be optimized again as follows:

Locate the processor as a class-level static variable, which is reused by the main program. This change can be understood as a performance optimization after refactoring the code structure.

Is there anything else that can be optimized? The answer is definitely there. One thing every coder should keep in mind is that code optimization is an ongoing process throughout the software lifecycle. Returning to this example, from the perspective of maintainability, the message cmd and processor factory are hard coded and coupled. Currently, there is no problem, but let's be more open-minded. When the business scale increases enough, the framework and messages may be maintained by two groups of people. Is it possible to use the observer mode of registration callback at this time, so that business people do not have to find platform people every time they increase the processor (as encountered in other projects)? It sounds good, but why not do it now? Because it's not necessary. Although refactoring and optimizing code and architecture can be beneficial, it is important to combine the current reality of the project and do the right things at the appropriate stage, as good architecture can be implemented. Many startups or entrepreneurial teams within large companies nowadays often follow a helpless and realistic path of "making things happen ->surviving ->increasing business volume ->code/architecture/performance optimization", regardless of the technology/architecture used at the beginning.

From a performance perspective, as business increases, there will be more and more messages to be processed. As a hotspot statement on the core path, this switch is the key to affecting performance. Optimization methods include prefixing hotspot branches in the switch, using index structures such as hash to find processors, and even using compiler-level-like branches for optimization. For large coders, many times, if the ultimate problem boils down to performance optimization, It is often more intuitive and easy to solve than other problems: it's just about finding hotspots and improving performance, there are many tricks to go. In short, 'refactor first, then optimize performance'.

Misconception

A common misconception is the proverb commonly used in the development field: "If you have a hammer in your hand, everything looks like a nail.

1) Pattern abuse

Although this article is not actually about patterns, a common problem in pattern application still needs to be mentioned is pattern abuse. Many people tend to design patterns for the sake of design patterns, making it even more difficult to maintain code that is neither coherent nor generic. What is a true expert? No sword in hand, no sword in heart - no pattern in hand, no pattern in heart. Why is there no mode? I am already familiar with it on my chest, fully integrated, and no longer limited to form. I can easily and perfectly grasp it.

The mode used during development should be chosen appropriately based on the project and code size. If an inappropriate mode is used for a simple task, it will result in twice the effort and half the effort. For example, there are reflection patterns in architecture level patterns, which are supported by the language itself in many high-level languages (such as Java, where reflection is a characteristic of the language itself and also catalyzes excellent frameworks such as Spring). In languages that do not support reflection, the idea of this pattern needs to be re-implemented in the native language. For example, if the C language itself has no reflection, implementing a complete reflection framework on some software platforms for large telecommunications devices can enhance the dynamic loading of module configurations to a high level, which is very rewarding. But classmate, you said you need to write a simple hello world, to be honest

The current reflex mode is brain pumping.

2) Refactoring= redo?

An important point in the definition of refactoring is "an adjustment of the internal structure of software". What does adjustment mean? Just adjust it, adjust it, not just casually push it down and start over. There are two common scenarios for redoing: one is that the software is so bad that it can no longer be broken, cannot be carried, cannot be maintained, and cannot function properly. The cost of refactoring is far less than redoing it, so let's start over (this is the correct posture); On the other hand, a guy with a sore egg, for some reason (digging a new hole, finding a new job, etc.), always blows the breeze with everyone and says, "Hey, let's redo this thing." Okay, this guy is the instability factor of the team (this is the wrong posture). Of course, if the team is idle and has a lot of time to do nothing outside of normal development tasks, then it's no exception.

In summary, for a module or system that has already been launched and has a certain business volume, there is a significant risk of starting from scratch. It is necessary to undergo comprehensive analysis and evaluation and be cautious. Experts always improve and optimize existing systems like divine doctors, rather than cutting people down and rebuilding them. Of course, the difficulty is also very high. Otherwise, how can "refactoring" be written separately and form a methodology for just these two words.

software-quality-assurance software-development

Read Previous Post >>

How to Identify and Locate Memory Leaks in Android Activities