The 2022 Tencent Global Digital Ecosystem Conference has come to a successful conclusion. With the theme of “Digital Reality Innovation and Industrial Co-progress”, the conference focuses on the integration of digital reality and explores the use of digital technologies that are fully interconnected to help the high-quality development of the real economy. The conference has 29 product technology-themed sessions, 18 industry-themed sessions and 6 ecological-themed sessions. Business leaders, customers, and partners jointly summarize experience, build consensus, and promote the new development of digital-real integration.

In June this year, Tencent announced that a large number of internal self-developed businesses have been fully migrated to the cloud, becoming the largest cloud-native practice in China, with a cumulative saving of more than 3 billion yuan in IT costs, fully demonstrating Tencent Cloud’s products, technologies and comprehensive service capabilities.

The application of cloud native technology in cloud computing PaaS has entered the deep water area. Based on the practice of customer business implementation, Tencent Cloud microservices and middleware products have been deeply optimized and implemented in terms of product capabilities, usability, and operability. . This conference set up a special session on microservices and middleware. Starting from the best practice in product development, operation and maintenance, etc., this session elaborates in detail on how enterprises should avoid mistakes in the process of developing microservices and building cloud-native middleware in the cloud-native era. Detours, focus on business needs, and help enterprises develop and innovate. This article is a collection of dry goods for the first speech topic of the Microservices and Middleware Special Session. Welcome everyone to watch!

This article will start from the following five aspects, and conduct an in-depth analysis of the exploration and challenges of routing, multi-active, grayscale, and current limiting under the microservice architecture.

1. Overview of microservices

2. Best practices in the testing phase

3. Best practices in the release phase

4. Best practices in the production phase

5. Summary of microservice architecture

Microservices overview

The evolution of enterprise architecture

application layer

The earliest application architecture, that is, the IT system is usually a single architecture. With the development and progress of technology, a service-oriented architecture such as SOV has emerged. Until now, the most popular architecture is the microservice architecture.

Monolithic architecture:

  • Tightly coupled, the system is complex, and it affects the whole body
  • Repeated manufacturing of various wheels
  • completely closed architecture

SOA architecture includes interface layer, logic layer and data layer:

  • Usually system integration via ESB, loosely coupled
  • Requires centralized, planned downtime for expansion or updates
  • Huge team and high communication costs

Microservice architecture:

  • DevOps: CI/CD, fully automated
  • Scalability: automatic elastic scaling
  • High availability: upgrade and expansion without interrupting business

resource layer

With the development of the application layer, the development of the resource layer also changes accordingly. When it comes to the monolithic architecture, everyone is still IDC infrastructure, which is the first stage; Cloud computing and virtualization technology; in the third stage, everyone began to explore how to do containerization; then in the fourth stage, which is now, everyone began to explore how to do Severless serverless.

Challenges under the Evolution of Enterprise Architecture

After arriving at the microservice architecture, everyone wants to implement more scenarios, such as Devops, architecture scalability, architecture high availability, multi-environment routing, publishing, etc., followed by endless challenges.

How to do traffic routing?

How to ensure multi-active disaster recovery?

How to implement canary and blue-green releases?

How to achieve full link gray scale?

How to withstand the traffic peak?

Next, let’s discuss the microservice practice under the above challenges.

Practice of microservices in the testing phase

Test phase: Solve the traffic routing problem of multiple test environments

Pain points

In the microservice system, when developing and testing, if multiple teams develop at the same time, or multiple systems need to be debugged, the full amount of services needs to be deployed for testing each time. How to deploy only the services that have changed this time, and other services reuse the baseline environment service resources through dynamic traffic routing has become a major pain point in the testing phase of this microservice.


1. Save resource costs, develop/test applications on demand, and discard them after use;

2. Improve R&D efficiency and get rid of a lot of localized configuration work;

3. Realize cross-environment joint debugging without competing for different test environments.

In order to realize this solution, two components in the microservice architecture need to be used, one is the gateway of the entry layer, and the other is a service governance framework such as a service grid.


As shown in the figure below, there are usually multiple environments in the test environment, which are divided into baseline environment and feature environment. During the test, team 1 wants to test the blue feature environment on the left, and team 2 wants to test the right The green characteristic environment can be realized through the framework of service governance at this time.

Implementation plan

1. Example marking

K8s registration scenario: add annotation to the workload to mark the environment label.

Microservice framework registration scenario: group all instances under the service, and use tags to distinguish the deployment environment.

2. Flow dyeing

Ingress gateways color traffic characteristics. For example: coloring requests for a specific uin.

3. Traffic routing from the gateway to the backend service

The ingress gateway uses label routing to perform dynamic routing according to the test environment information in the request.

4. Routing between backend services and services

The management center dynamically routes services in different test environments according to the characteristics of request traffic.

Microservice practice in the release phase

Release Phase: Implementing Canary, Rolling, or Blue-Green Releases

Under the microservice architecture, the issue of publishing will be involved.

There are currently three popular release methods, one is canary release, the other is rolling release, and the other is blue-green release. The principles of these three common release strategies are the same, and they all expect that during the release process, Absolutely test the new version to be released, so that all users can avoid any problems in the new version that affect all users. However, during the release process, the strategies of these three release methods will be somewhat different.

The canary release is to upgrade this release according to the proportion. If there is no problem in a certain instance, then gradually release the proportion until all the traffic reaches the V2 version, which realizes a canary .

For the services released this time, upgrade one/batch of instances first, and if the test is OK, then upgrade the remaining instances in batches until all instances are upgraded to V2.

The blue-green release is to divide the instance into two camps, one green camp and one blue camp. The running instance is version V1, and it is put into the green camp. At this time, a new instance V2 version is deployed to the blue camp. Then conduct a comprehensive test on the V2 version. After the test is ok, switch the traffic from V1 to V2 through load balancing, so as to realize a seamless release and ensure a comprehensive test of the new version online environment.

How to realize the specified traffic ratio for gray release

As shown in the figure below, the API gateway and the registration configuration center are used to implement the specified traffic ratio for gray release. At the ingress layer, use the API gateway to access service A, then upgrade service A to a V2 version, adjust 10% of the traffic through the API gateway, and switch from V1 to V2 version, which realizes the gray scale of the ingress layer according to the traffic ratio . After the V2 version test has no problem, switch 100% of the traffic to the V2 version.

So how do you do it when calling between services? You can see the lower part of the figure below, use service B to call service C, in this process, upgrade service C to V2 version, then there will be a V2 instance in service C, and then register through the configuration center Adjusting 10% of the traffic, and calling service C from service B, realizes the gray scale of calls between services according to the traffic ratio.

Implementation plan

1. Gateway adjustment ratio

Direct a percentage of traffic to the V2 version.

2. The adjustment ratio of the registration configuration center

Direct a percentage of traffic to the V2 version.

How to achieve grayscale for specified part of users, regions or other conditions

As shown in the figure on the lower right, the V2 version is released in the entry layer service A, and then the conditions for grayscale are set through the gateway, then the gateway can set the conditions according to the set conditions, such as a certain parameter in the Header, or It is the Pass with a certain parameter. According to such setting conditions, specific users can access the V2 version to realize traffic switching at the entry layer.

When calling between services, if you want to achieve conditional grayscale, you need to use a service governance framework, such as the label routing in the service grid, you can mark service C, and then when doing traffic routing, According to this condition, it is routed to the service C of the V2 version, so that the traffic routing of the service grid is realized.

Implementation plan

1. Gateway adjustment ratio

Direct a percentage of traffic to the V2 version.

2. Service grid setting conditional routing

Conditionally direct traffic to the V2 version.

Release stage: Realize full link gray scale

What does full link gray scale mean? That is, before the official version is released, there is a grayscale environment that can test all functions, end-to-end testing. In order to ensure that other customers will not be affected after the function is released and launched, the test is carried out in a small area first. release.

As shown in the figure below, first distinguish the environment. For example, in the production environment, there is a formal environment and a grayscale environment. When a new function is to be released, the instance of the corresponding new function is deployed to the grayscale environment. environment; then use the service governance framework to mark this instance and add a label to distinguish it as an instance of a grayscale environment; then when routing this traffic at the gateway layer, grayscale coloring is performed on the traffic, usually there will be dynamic There are two ways of coloring or static coloring, but no matter which way, the purpose is to distinguish the traffic. All ordinary traffic enters the official environment. For the traffic that wants to be grayscale, mark it as V2, and then enter the grayscale environment from the V2 version. In the grayscale environment, when the user center goes to the credit center, it will return to the official environment. Because the points center does not have an instance of the grayscale environment, it needs to use the service governance framework to route the traffic. The same formal environment access to the grayscale environment is also realized through such a routing line.


  • Full link isolation traffic lane​
  • End-to-end stable environment​
  • One-click switching of traffic​
  • Observability

Implementation plan

1. Example marking

K8s registration scenario: add a version label to the Workload by adding Pod labels.

Microservice framework registration scenario: group all instances under the service, and distinguish versions through tags.

2. Flow dyeing

The gateway performs grayscale coloring on traffic characteristics. There are two ways of dynamic dyeing and static dyeing.

3. Traffic routing from the gateway to the backend service

Through label routing, traffic is forwarded according to the service version information in the request.

4. Routing between backend services and services

Each service on the link can perform dynamic routing according to the characteristics of request traffic.

Microservice practice in the production stage

Production stage: Realize multi-active disaster recovery

Live more in the same city

Multi-active disaster recovery is a cloud-native multi-active disaster recovery architecture solution.

As we all know, there are usually multiple availability zones in the same region. As shown in the figure below, there are Availability Zone 1 and Availability Zone 2. The same service A is deployed in the two Availability Zones, and then Service A will read and write the underlying database, and the database will be divided into active and standby. The primary database is in Availability Zone 1 , the standby database is in zone 2. Then, at this time, the traffic can be distributed in proportion at the gateway layer, and can enter Availability Zone 1 or Availability Zone 2. Service A in Availability Zone 1 can perform read and write operations on the main database, while services in Availability Zone 2 can read and write to the main database. The primary database will only do write operations, and its read operations will come from the backup database. At the same time, the main database will synchronize the data to the backup database in real time, so as to realize the architectural scenario of multi-active disaster recovery in the same city.

In microservices, usually multiple nodes of an application are deployed in different availability zones, and then registered under the same service, so as to achieve multi-active registry.

Production stage: multi-active disaster recovery of microservice architecture

In the multi-active disaster recovery under the microservice architecture, in addition to the service instance itself, it also involves related components such as gateways and registration configuration centers.

As shown in the figure below, ingress layer gateways are usually deployed across availability zones. The red lines isolate different availability zones. If there is one zone, two zones, and three zones, the gateway can become three nodes during this process. Deploy to three zones. The same is true for the registration configuration center. It is divided into three nodes to deploy to 3 availability zones, and then the corresponding services can be automatically deployed to 3 availability zones. The advantage of this is that if any availability zone is down, the service will still It can respond normally to ensure the high availability of the entire architecture.

Production stage: multi-active disaster recovery at the access layer

In the production stage, how to implement multi-active disaster recovery at the access layer?

If you want to implement multi-active at the access layer, you usually need to do several things:

  • Flow proportional switching
  • Intra-city disaster recovery switching
  • Cross-city disaster recovery switching

This requires the use of a more powerful gateway, such as the popular Kong gateway or Nginx, to achieve similar operations. All gateways should support traffic ratio switching. You only need to adjust different traffic ratios to different availability zones. As shown in the figure below, the service in Guangzhou District 1 is down, and you can manually switch the traffic ratio from the gateway to Guangzhou District 2. Zones can also be switched automatically. To achieve disaster recovery switching in the same city, you only need to switch the traffic to the second district of Guangzhou to respond to the disaster recovery in the same city. If the entire Guangzhou area is down, all the traffic can be directly switched from the gateway layer to the second area of ​​Shanghai to achieve this kind of cross-city disaster recovery.

Production stage: Realize multi-active disaster recovery and nearby access

How to implement multi-active disaster recovery and nearby access in this service room?

What does near access mean?

For example, now there are two instances of a service, and one instance is deployed in Guangzhou and one instance is also deployed in Shanghai. A user in Shanghai wants to access this service. If he accesses the instance in Guangzhou, the entire access The link will become longer and the delay will increase, so it is best to visit the instance in Shanghai nearby.

How to visit nearby?

Configure the region information of the service through the service governance framework. For example, two instances, the first instance is configured in the Guangzhou region, and the second instance is configured in the Shanghai region. When the user is visiting, the service framework can identify that the user in Shanghai should go to Shanghai region, to achieve the nearest access.

What about the multi-active disaster recovery area in the service room?

In fact, the principle is similar to the access layer. As shown in the figure below, there is a point center in Guangzhou District 1 who wants to visit the activity center, but at this time the activity center is down, and usually the service governance framework will actively explore If it finds that the activity center is down, it will take the initiative to do the disaster recovery switch in the same city, and switch the points center to the activity center in the second district of Guangzhou to visit, so as to achieve the disaster recovery in the same city. If all the Guangzhou districts are down, then the service governance framework can automatically switch the service traffic to the Shanghai district to achieve cross-city disaster recovery. The game Awakening of Dawn uses this method to realize the cross-city disaster recovery architecture.

Production stage: Routing that supports unitized architecture

Pain points:

  • Resources cannot meet the needs of horizontal expansion;
  • The business needs to support SET code transformation;
  • The visiting users of the business are geographically distributed widely and hope to solve the geographical location problem.

Unitization scheme:

  • Business systems are deployed in different data centers in multiple regions.
  • Through SET, the traffic in different areas can be operated in a closed loop within the area.
  • After a SET fails, service traffic is switched to other SETs.

Implementation plan:

1. Instance marking: All instances under the service are grouped according to units (SETs), and the units (SETs) can be distinguished by labels.

2. Dynamic routing: Dynamic routing of services in different unit modules (SETs) according to request traffic characteristics.

3. Isolation: Strong isolation of service calls between SETs.

How to do it?

Group different instances in the framework of service governance, as shown in the figure below, there is a user center with 6 instances, the 3 instances on the left are classified as unit 1, and the 3 instances on the right are marked as unit 2, so that The grouping by instance is realized, and all the corresponding accessed services are grouped into unit 1. The advantage of this is that during the entire access process, the traffic can be guaranteed to be in this unit, which realizes the relationship between units and units. between isolation operations. Such a process mainly relies on the registration configuration center and the service governance framework to mark and group instances to achieve unitization. WeChat Pay uses a unitized architecture to guarantee a financial-level safe and reliable transaction system architecture.

Production stage: flow-limiting scenario

Current limiting stage:

1. Access layer traffic limiting

2. Call current limit between services

Current limiting dimension:

1. Current limiting of services/interfaces/labels

2. Current limit of microservices for seconds, minutes, hours, days, etc.

Current limiting type:

1. Stand-alone current limiting: For the level current limiting of a single adjusted instance, the traffic limit only takes effect for the current adjusted instance and is not shared.

2. Distributed current limiting: For all instance-level current limiting under the service, multiple service instances share the same global traffic limit

The figure below is a simple architecture diagram on how to limit traffic at the ingress layer and between services.

Different current limiting will be done in different stages. As shown in the figure below, the traffic limiting at the ingress layer is usually done by the gateway. After configuring certain traffic limiting rules, when a large number of traffic requests come in and the traffic limiting rules are triggered, two things are usually performed: discarding or queuing. In the attack scenario, all invalid requests are usually discarded; for normal access, such as panic buying, flash sales and other scenarios, it will be queued to access the back-end services to ensure that the back-end services are not broken down. When calling between services, for example, when the user center calls the credit center, when the traffic becomes large, the credit center’s flow-limiting rules are triggered, and the credit center will discard or queue according to the current-limiting rules configured in the service governance framework, usually Calls between services will be queued at a uniform speed to achieve the effect of current limiting.

King of Glory and Tencent Video will do a lot of marketing activities every year, and the traffic will increase suddenly during the activities. Such a sudden increase in traffic, in order to protect the back-end services from being broken down, the database will not be broken down, it must be at the entrance layer To make flow-limiting rules between services, they use the gateway and service governance framework to realize this kind of flow-limiting scenario.

Summary of microservice architecture

Typical microservice architecture

After the request comes in from the front end, it will enter the gateway. Tencent Cloud mainly uses the cloud-native gateway, which has CLB load balancing capabilities, secure routing capabilities, and current limiting capabilities. It can forward requests to back-end services. Back-end services mainly refer to For Tencent Cloud microservices, each service will have multiple elastic instances to respond to the peaks and valleys of different businesses. It will automatically scale elastically to support the situation of insufficient resources during high traffic, and at the same time, it will not cause resource waste during low traffic. Case. The management of so many services must have a registration center to ensure the registration and discovery of services, and a configuration center to manage the configuration of all services, and then cooperate with the service management center to realize dynamic routing and current limiting, etc., including the underlying link tracking and Monitoring to implement a complete microservice architecture.

So to realize a typical microservice architecture, what kind of products does Tencent Cloud use to support it? That is the microservice engine TSE.

The microservice engine TSE provides an open source enhanced cloud-native gateway, registration and configuration center, service governance platform, and elastic microservices to help users quickly build a lightweight, highly available, and scalable microservice architecture. The use of the microservice engine is fully compatible with the open source version, and has been enhanced in many aspects such as function, usability, and maintainability.

  • The cloud native gateway is mainly divided into Kong and Nginx open source components to support some demands of the gateway.
  • The registration configuration center supports the mainstream components on the market Rookie Parlacos, Apollo, Console, and URA cards.
  • The Service Governance Center is Tencent’s open source and self-developed Polaris. Polaris has a great influence within Tencent. WeChat Pay and Glory of Kings are using Polaris as their service governance framework to achieve the above-mentioned current limiting and globalization. Scenarios such as link grayscale and multi-active disaster recovery.
  • Elastic microservices are responsible for implementing the deployment and operation of the service itself to achieve elastic contraction of serverless resources.

#Exploration #Challenges #Routing #MultiActivity #Grayscale #Current #Limiting #Microservice #Architecture #Tencent #Cloud #Middlewares #Personal #Space #News Fast Delivery

Leave a Comment

Your email address will not be published. Required fields are marked *