Test environment governance has always been a very important topic for major companies. The stability of the test environment greatly affects the efficiency of iterative development & testing.

On the whole, the main reasons for the instability of the test environment are as follows:

  1. The change of the test environment is not a final state change, and there are often situations where the code release/configuration release causes the service to fail to start or the link to have problems.
  2. Changes are frequent, development requires joint debugging, testing requires iterative testing, codes need to be changed, configurations also need to be changed, authority control is more difficult to do, and increases the instability of the test environment.
  3. Parallel requirements, a single application requires multiple branches to support the testing of multiple requirements at the same time, and the preemption and conflict of test environment resources are obvious.

Dewu testing environment stability governance has also gone through several stages:

  • 2020~2021: Multiple sets of physical environment isolation solutions (based on ECS)

    There are three sets of test environments T0, T1, and T2, each of which is physically isolated without resource conflict and sharing.

    Plan T1 for iterative testing, T0 for integrated regression, and T2 for independent project allocation and use. However, in actual use, there are too many parallel business tests, and conflicts are obvious. The environment begins to be used indiscriminately, and anyone who needs it can occupy it A set of environments is used. The result is that there is no stable environment, the validity of the test cannot be guaranteed, and the environment conflicts of parallel projects cannot be resolved.

  • 2021~2022: MF full link container environment solution (based on container)

    With the growth of business, 3 sets of test environments can no longer meet business needs. Therefore, last year, Dewu quickly built 10 sets of MF environments based on containers to support the testing of independent projects.

    The MF environment is built based on T0, DB and T0 are shared, and all other resources are independent. The purpose is to ensure that the business only needs to ensure the stability of T0. All MF environments can quickly synchronize the latest services and configurations based on T0, so that the environment can be used at any time. Take and resolve environment conflicts of parallel projects.

    In the actual implementation process, the problem of project environment conflicts has been solved, but the stability of the MF environment is still relatively serious, and the maintenance cost is huge. The main reasons are concentrated in:

    T0 environment stability, not all domains are integrated in T0, resulting in T0 stability cannot be guaranteed

    After MF synchronizes T0, it will need a second debugging and acceptance due to various reasons (loss of new services, incomplete/disordered configuration, etc.)

    During the use of the MF environment, related changes such as basic services (sso, gateway, middleware) cannot be updated to the MF environment in time, which affects business testing

    Therefore, in the second half of 2022, we will try to use the dyeing environment to solve the environmental stability problem.

  • 2022: Colored environment scheme (based on traffic isolation)

    The coloring environment is a solution based on traffic isolation. Through the transparent transmission of traffic standards, the benchmark environment traffic and the dyeing environment traffic are isolated to implement a multi-environment solution and support parallel testing without affecting each other.

    Compared with the MF environment, there is no need to maintain multiple full-link environments, and the maintenance cost is reduced. If all changed services are deployed in the dyed environment, the stability of the baseline environment will be improved, which is equivalent to the improvement of the stability of all environments.

    The following mainly introduces how to do the dyeing environment

2.1 Basic ideas


As shown in the figure below, the original idea was:

  1. The service can route traffic to the corresponding coloring service according to the traffic label
  2. If the coloring standard corresponding to the coloring environment does not have this service, the traffic will go to the reference environment
  3. If the dyeing environment service is added but not deployed, or the deployed service process hangs up, the traffic will report an error instead of going to the baseline environment (to avoid some service anomalies not being exposed)
  4. DB, MQ, Redis and other middleware are expected to use the same set to avoid waste

Based on this idea, where do we need to start to modify to support the dyeing environment? It can be solved by dismantling from the idea:

  1. How to transparently transmit traffic tokens?
  2. How is traffic routed to dyed nodes?
  • How is the rpc interface routed to the dye node?
  • How does the MQ message make the dyeing environment consumer consume?
  1. After solving the problem of transparent transmission of traffic tokens and the problem of dyed routing, it is necessary to consider how the traffic originator puts the dyed token on it?

2.2 Implementation plan

The following schemes only do traffic isolation, and the DB data layer does not do isolation

1. How to transparently transmit traffic tokens?

First, the traffic mark will be placed in the x-infr-flowtype field in the http header at the traffic entry layer:

x-infr-flowtype:<CE_ColoringEnv> ##CE_是固定前缀,为了和压测标做区分

After the traffic arrives at the gateway, the transparent transmission method of the traffic mark on the service link is through the OpenTracing specification.luggageAbility to obtain the dyeing mark from the header, and stuff it into the trace to transmit it downward.

640-176 640-21 In this way, the dyeing marks can be obtained in the entire link.

2. How is the traffic routed to the dyed nodes?

Here are two considerations:

(1) How to find the dyed node after rpc call, after getting the dyed mark?What needs to be solved here is how to identify the dyed nodes

(2) MQ messages, how does the producer send messages with dyeing marks, and how does the consumer process messages with dyeing marks?


When adding service deployment in this coloring environment, the coloring standard will be injected into the environment variable COLORING_ENV by default

The container publishing configuration page will automatically increase the COLORING_ENV variable

640-178 640-179

So far, the COLORING_ENV environment variable can be read when the service starts. The next step is to see how the registration center distinguishes the coloring nodes.

First of all, when a service is added to the dyeing environment, the service will add a node in the dyeing field of the registration center, indicating that the service has a service node in the dyeing environment.

The main problem to be solved by the dyeing farm is: if the dyeing node is down, the dyeing environment traffic should judge whether there should be a dyeing node in the dyeing environment. Avoid testing issues that are not exposed.

Dyeing Field: CE_


Coloring Field Service Node: :80


Secondly, during service registration, the service node information and method registration will carry the coloring mark :

640-182 640-183

At this point, the registration center can identify dyed nodes based on the dyed mark, and business services (based on the fusion framework) can perform dyed traffic routing based on the dyed mark in Trace and the dyed nodes in the registration center.

  • MQ transformation–identify and process MQ messages

The main solution of MQ is that the message sent by the message producer of the dyeing environment is only consumed by the consumers of the dyeing environment. If there is no consumer node in the dyeing environment, it will be consumed by the consumers of the reference environment.

There are two approaches discussed here before:

The first is a solution based on topic isolation. Each dyeing environment uses a different topic for communication, so the isolation is better and the messages are not easy to be lost.

The second is that the topic is not isolated, and all dyeing environments share a topic. The producer puts the dyeing label on the message when producing the message, and the consumer has one for each dyeing environment. When the consumer consumes, it will judge the dyeing label in the message and the local Whether the dyeing marks are consistent, if they are consistent, consume, if not, return ACK directly without following the specific consumption logic.

The second option is currently selected, and the following is a detailed introduction based on the second option:

basic process


as the picture shows:

  1. ServiceB_Color1 will automatically register the GID_Color1_Topic consumer group and listen to Topic_A. The Color2 and Color3 environments are the same.
  2. Messages with Color1 are produced by ServiceA_Color1 and consumed by ServiceB_Color1.
  3. Messages with Color2 are produced by ServiceA_Color2 and consumed by ServiceB, because ServiceB has no nodes in the Color2 dyeing environment
  4. Messages with Color3 Since the coloring environment Color3 has no ServiceA_Color3 node, the traffic with Color3 will hit the base environment ServiceA. At this time, ServiceA will produce messages with Color3, and this message will be consumed by ServiceB_Color3

Cooperate with business description:

When the coloring environment is started, the GID with the coloring mark will be automatically created, eg: the original GID is GID_AAA, and the GID automatically created by coloring is GID__AAA


Let’s look at the content and processing logic of the message:


As shown in the figure above: the DMQ_ENV_TAG field will be added in the dyeing message attribute, and the dyeing tag will be added, and then the corresponding dyeing environment subscription group will be consumed.

Looking at the picture above, you will find that “it seems” that all dyeing environments have been consumed, but in fact, other environments directly return ACKs without following the specific consumption logic. You can refer to the log for details.

Code description: Based on the color tag msgTag in the Message and the local service color tag envTag to judge and distinguish consumption logic.


3. Dyeing flow inlet carries dyeing label

After solving the transparent transmission of the colored mark and the logical processing of the colored mark, the rest is how to bring the colored mark on the traffic initiator. In fact, it is to insert the colored mark into the x-infr-flowtype field in the header.

Among them, the acquisition of the dyeing environment list is provided by the publishing platform for each traffic entrance party to choose.

In the current business promotion process, the main entrance parties encountered are roughly as follows:

The ingress traffic carrying dyeing marks is relatively simple in logic, so I won’t give a detailed technical introduction here, only an introduction at the usage level


So far, the entire business transformation has been basically completed, and a complete set of processes such as how to construct dyed traffic, how to transparently transmit traffic indicators, how to identify dyed nodes, and how to process key dyeing logic after identification is clear.

3.1 Implementation path

The entire implementation path of the dyeing project consists of several stages:

  1. Project approval & middleware transformation (April-June)

    Including infrastructure transformation (unified framework, gateway, registration center, configuration center, timeout center, DMQ, etc.) & client transformation & publishing platform transformation, etc., as well as basic link verification after the transformation is completed

  2. Online Grayscale & Full Link Service Adaptation (July~August)

    Early July: 5 transaction & middleware-related service upgrade related jar packages were put online for verification to ensure that the dyeing transformation will not affect production.

    August: Start to promote the global application to upgrade the jar package related to dyeing

  3. Independent project use (September)

    Before the end of September, several independent projects have completed the application of dyeing environmental test verification

  4. Business iterative use (October~November)

    In October, we began to try to promote the whole business to carry out trial and error debugging in the dyeing environment

    The trial is over, and the iterative use of the coloring environment is gradually promoted

3.2 Business use effect

independent project: At present, all independent projects in the whole domain have been switched to the dyeing environment test.

version iteration: Judging from the iterative use results of the latest version, more than 95% of the global requirements can be tested using the dyeing environment.

The remaining 5% of demand scenarios mainly involve the following two aspects:

  1. Data isolation: At present, there are existing solutions in support, which will involve a small amount of demand support.
  2. Front-end dyeing: The current dyeing environment mainly addresses the needs of back-end dyeing, and some scenarios rely on front-end dyeing (supported by multiple front-ends). The scheme is basically implemented and will be applied together with back-end dyeing.

At this stage, the dyeing environment solves the problems of test environment conflict and test environment stability, and compared with the previous solution of multiple independent environments, it also saves a lot in cost. Follow-up Dewu will also try to use the ability of dyeing to solve the problem of grayscale release in production, and I believe it will have good results.


Pay attention to Dewu technology, and update technical dry goods every Monday, Wednesday and Friday nights at 18:30

If you think the article is helpful to you, please comment, forward and like~

#Dewu #Dyeing #Environment #Landing #Practice #Personal #Space #Dewu #Technology #News Fast Delivery

Leave a Comment

Your email address will not be published. Required fields are marked *