Mobile games often involve more Chinese mobile operators and APN access points like wap/net compared to PC games, which complicates the network.
Mobile gateway cmwap is a type where many users share a common exit, and the timestamp of the packets sent by cmwap is disordered. If tcp_tw_reccycle is enabled, packets with "backward" timestamps will be treated as "recycled tw connection retransmissions, not new requests." Therefore, mobile game servers must change /proc/sys/net/ipv4/tcp_timestamps to 0.
WeChat and QQ domains use the same VIP. When accessing the same service, under certain circumstances, the client proxy firewall or some mobile gateways see the two domain names resolving to the same target IP and combine the two requests into one connection.
For Layer 7 protocol services, WeChat and QQ domains should apply for different VIPs to solve the problem. This will ensure that the requests are treated separately, improving the overall user experience.
Overload after expansion in daily cool run: Due to the impact of TGW load imbalance, when a single machine has more than 22,000 active connections, if TGW continues to assign new connections to this machine, it will cause users to experience login failures and negatively affect their experience.
TGW allocates new incoming connections based on weighted polling. As long as the service port provided by RS is available and the weight is normal, it will continue to allocate new incoming connections to RS.
Since TGW's mechanism for detecting the real-time connection count of RS has not yet been able to obtain accurate data, they temporarily have no way to allocate new incoming connections based on the connection idle status of RS.
Adopting the concept of minimum weighted connections, add an external module responsible for monitoring the load of different RSs (tentatively set as the current connection count). If the load ratio is inconsistent with the configured weight ratio set by the business, dynamically adjust the real-time weight ratio until the RS load ratio is almost consistent with the configured weight ratio. (An error within 5% is considered approximately equal, and the weight will no longer be dynamically adjusted)
Adjustment start: Initiate adjustment when the RS1:RS2 load/weight difference is greater than 5% accuracy
Adjustment end: Adjustment ends when the RS1:RS2 load/weight difference is less than or equal to 5% accuracy
Adjustment frequency: every 5 seconds
Due to the mobile nature of mobile users, some users may download mobile games in mainland China and use them in Hong Kong, Macau, Taiwan, or abroad.
The gaming experience can be negatively impacted by international export bandwidth limitations and the Great Firewall of China (GFW).
In addition to the Shanghai Santong data center, it is necessary to add a Hong Kong access point (Hong Kong TGW directly connected to Shanghai RS).
Promote the expansion of dedicated lines between Hong Kong and Shenzhen.
Carriers may hijack domain names to save inter-network traffic and speed up user access, which can lead to an inability to access the game.
1. WiFi users access the company's domain hijacking monitoring system.
2. 2G/3G users utilize the SDK for domain monitoring.
3. Hong Kong users monitor domain names through the deployment of client monitoring apps.
4. The SDK provides a fault location tool that guides users to resolve the issue after detecting domain hijacking.
The SDK implements its own DNS functionality and takes over the socket communication of the mobile game.
In situations where mobile devices have multi-user shared mobile gateway exits and weak network conditions, the protection verification algorithm based on the source IP+TTL of this attack method can negatively impact mobile game operations. This can lead to abnormal fluctuations in online users during the protection process, and some users may be unable to log in to the game or need to attempt multiple logins to succeed.
1. Separate mobile game protection strategies from PC games. Configure the mobile game defense algorithm strategy to have a duration of 1-6 seconds (PC games typically have a duration of about 3 seconds).
2. Deploy multiple VIPs in large-capacity defense node data centers within the same city.
Network access diagram before optimization: The carrier's international export bandwidth and firewall are the bottlenecks for game access.
Network access diagram after optimization: We bypassed the two bottlenecks of international export bandwidth and firewall through Tencent's TGW access to the carrier, Tencent Shenzhen-Hong Kong dedicated line, IP tunnel technology, and Layer 4 proxy forwarding technology.
Access optimization effect: (single access time, multiple accesses in one game login or interaction)
The average time for WiFi access is optimized from 724ms to 198ms, and the average time for non-WiFi access is optimized from 1173ms to 621ms.
Current plan risks:
1. The success rate of direct connection and proxy access in the WAP mode is relatively low (mobile cmwap, Unicom uniwap, Telecom ctwap). WAP users account for about 6% of all users, and mobile cmwap users account for about 13% of mobile users.
2. HA proxy only supports IP forwarding. When the domain name IP changes, we have an agreement with the game provider to inform us in advance to switch the forwarding IP. Concurrently, we also have various connectivity monitoring:
3. Monitor domain name resolution at the minute level. If the resolved domain name is inconsistent with the forwarding domain name, an immediate alarm is triggered.
4. Forwarding server IP connectivity test. If the forwarding IP cannot be connected, an immediate alarm is triggered.
5. Monitoring of TGW forwarding request count.
Legal risks have been documented:
1. Data interaction between Tencent Mainland and Hong Kong through dedicated lines is also subject to national regulatory authorities, and there are no violations.
2. The game's account system is accessed through MSDK. In MSDK, sensitive data such as user account information and relationship chain information have been converted to openID, not directly using user accounts, so there will be no leakage of player accounts and personal privacy information. During the entire data transmission process, MSDK also encrypts the data from the client to the server to prevent tampering and leakage.
For IEG mobile game access, we need to access a Japanese game called Monster Strike, which uses a protocol called STUN, with an RFC number of 5766, for UDP and TCP penetration of NAT connections. Due to the existence of this protocol, it becomes very difficult for us to use TGW across carriers (requires configuration of tens of thousands of ports).
We suggest restructuring to adapt to TGW's cross-network access. Set up a TCP layer proxy between TGW and TURN SERVER to solve cross-network access problems and a wide range of port convergence work.
Original structure access process
The original structure access primarily uses the TURN server for NAT penetration, and a relay address is used for transfer. In this article, the relay address is the TURN server IP + port 16000.
1. Host user A initiates a connection to the TURN server's 3478 port and applies for resources.
2. The TURN server responds that the TURN server IP + port 16000 can be used as the relay address.
3. Host user A initiates a connection to the TURN server's 16000 port and establishes a TCP connection with the relay address.
4. After User B knows the IP and port of host user A (TURN server port 16000) through the APP server, it initiates a connection. Once the connection is successful, user A and user B establish a P2P relationship through the relay address.
1. There are hardly any external IPs for the server.
2. The network quality is poor when users from different carriers connect to the same IP.
3. One room per port results in severe resource waste, requiring 400,000 ports for 500,000 online users.
Cross-network proxy solution access process
1. Host user A initiates a connection to TGW and applies for resources.
2. TGW forwards the traffic to the TURN server's 3478 port through IP TUNL technology.
3. The TURN server returns the applied relay address port 16000 to TGW.
4. TGW sends the relay address to host user A.
5. Host User A sends the relay address (IP + port) to the APP SERVER.
6. Host user A initiates a connection request to TGW, with appid, openid/mid, IP, and port.
7. TGW forwards the connection request to the proxy through IP TUNL. After the proxy connects successfully, it receives the appid, opened, mid (MAC address), IP, and port from host user A for the next step of the connection.
8. The proxy connects to the corresponding TURN server IP and port with the received IP and port.
9. User B queries the required IP and PORT through the APP server.
10. User B initiates a connection request to TGW, with appid, openid/mid, IP, and PORT.
11. TGW forwards the connection request to the proxy through IP TUNL. After the proxy connects successfully, it receives the appid, uid, IP, and PORT from user B for the next step of the connection.
12. The proxy connects to the corresponding TURN server port 16000 with the received IP and PORT.
1. Effectively solves the port resource problem, using only one external port.
2. Effectively solves the network access problem for different carriers, using domain name access.
3. Almost no modifications are required for the game client and server-side, as MSDK provides a low-level encapsulation connect protocol interface.
4. Packet capture, log query, malicious traffic control, and avalanche prevention operations can be performed uniformly in the proxy.
Room dismantling process
There are two situations:
1. Ordinary users exit or disconnect.
In this case, once the proxy detects that the exiting or disconnected user is an ordinary user, it directly dismantles the connection established with the TURN server.
2. Host user exits or disconnects.
In this case, once the proxy detects that the exiting or disconnected user is a Host user, it disconnects all connections to the same IP and PORT to the TURN server.
3. The App server sends a signal to close the room.
The handling method is the same as situation 2.
4. TURN server disconnects. Regardless of which connection from the proxy to the TURN server is disconnected by the TURN server, the proxy only needs to close the corresponding connection between the user and the proxy. That is, the TURN server manages normal connection requests, and the proxy only performs transparent forwarding processing.
1. Secure access control list. We need to establish a secure access port list on the proxy to ensure that the access port falls within the whitelist of secure access ports, preventing users from forging requests and sending (for example) requests to the TURN server's 36000, 56000, or any high-risk ports.
2. Port request count limitation. For any TURN server port, we can limit the number of connections to avoid excessive port usage due to attack traffic.
1. When the user initiates a connection request to TGW, carry the openid/mid to make a unique judgment and track the connection, reducing the difficulty of debugging and improving the efficiency and accuracy of debugging.
2. For logs, we need to balance effective debugging and the large volume of logs. By default, logs for client connection, disconnection, and exceptions such as timeouts and flash disconnections are recorded, while logs for other normal transmissions are not recorded to strike a balance between debugging and large log volumes.