It has been several months since the release of Apache Doris 1.1.0. During this period, we have rethought and established the process of releasing new versions in the community, and officially introduced the LTS (Long-Term Support, long-term support) version. Concept, in the 1.1.x series of versions, no major features will be introduced, and only bug fixes and stability improvements will be provided, in an effort to meet the high stability requirements of more community users. The good news is that this action has achieved obvious results. The stability of the latest version of the 1.1.x series has withstood the test of many users’ production environments.

After comprehensively considering the version iteration rhythm and user needs, we decided to release many new features in version 1.2, which not only includes optimization and improvement in performance, but also includes many long-awaited functions of community users. After a long period of development, testing, tuning, etc., we are happy to tell you that,The Apache Doris 1.2.0 version has entered the final release preparation stage and is expected to meet you in the first week of December.

For the performance improvement that community users are most concerned about, we have tested multiple standard test sets based on 1.2.0 RC (Release Candidate, candidate release version), and selected version 1.1.3 and version 0.15.0 as comparison references item.

According to the test, the overall performance of version 1.2.0 RC is compared to version 1.1.3 in the SSB-Flat wide table scenarioincreased by nearly 4 timesCompared with the performance of version 0.15.0increased by nearly 10 timesin the TPC-H multi-table association scenario, it is more effective than version 1.1.3nearly 3 times improvementCompared with version 0.15.0 performance11x improvementAbove, the performance of multiple scenarios has been greatly improved.

At the same time, we submitted the test data of version 1.2.0 RC to ClickBench, the world-renowned database test leaderboard. In the latest leaderboard,Apache Doris topped the list with its impressive performance, ranking first in the global ranking of similar product import performance, second in cold run and third in hot run query performance under general-purpose models (c6a.4xlarge, 500gb gp2) score!

About ClickBench

ClickBench is a performance test ranking list initiated by the well-known analytical database ClickHouse. In the ClickBench performance ranking list, the test data is taken from the real production environment, covers various data types, and covers typical scenarios such as ad hoc queries and statistical reports, which can truly reflect The performance of major databases in the production environment has attracted the participation of internationally renowned databases such as Snowflake, Redshift, Athena, Greenplum, and Druid. The evaluation indicators are the time to import the same data set under a specific model, the size of the storage space occupied, and the length of time to execute SQL, which are used to measure the data import performance, data compression ratio and query performance respectively. The best-performing one of all test results will become the baseline, and the indicators of the same test item will be compared with the baseline data to obtain a ratio. This ratio reflects the gap with the industry’s best. When a new test result exceeds the original baseline, it will automatically become the new baseline.

In terms of query performance, the Hot Run and Cold Run will be executed for each SQL to count the duration, that is, the SQL will be executed repeatedly 3 times and the one that takes the shortest time is taken, and the one that takes the shortest time is taken and executed directly after starting and clearing the memory. Finally, all SQL The geometric average of the ratio of the execution time to the baseline is the final test result. Therefore, ClickBench pays more attention to the excellent performance of the database in all test scenarios, rather than one or a few scenarios, which makes the database need all-round capability improvement.

In the test results submitted this time,query performanceApache Doris without any tuning, Cold Run won the second place among all products of the same modelHot Run ranked third among all products of the same model, with a total of 8 best results in SQL refresh listBecome a new performance benchmark.import performanceApache Doris Data writing efficiency ranks first among all products of the same modelwrite 70G data before compressionOnly takes 415ssingle node writeSpeeds over 170 MB/swhile achieving the ultimate query performance, it also ensures efficient writing efficiency!


Figure 1 Cold Run


Figure 2 Hot Run


Figure 3 Load Time

click the link : Go to view

About SSB

Star Schema Benchmark (SSB) is a lightweight performance test set in the data warehouse scenario. SSB provides a simplified star schema data set based on TPC-H, which is mainly used to test the performance of multi-table relational query under star schema. In addition, the industry usually flattens the SSB into a wide table model (hereinafter referred to as: SSB-Flat) to test the performance of the query engine.

On all 13 queries of the SSB-Flat wide table model, Apache Doris 1.2.0 is better than the previous version, and there is no performance regression. The overall performance is better than that of the 1.1.3 version.Nearly 4x improvementcompared to version 0.15.0 withNearly 10 times improvementsingle SQL has the highest performanceIncreased by nearly 13 times.At the same time, under the SSB star model, the overall performance of Apache Doris 1.2.0 is better than that of version 1.1.3Increased by nearly 2 timescompared to version 0.15.0Increased nearly 31 timesthe highest single SQLImprove nearly 60 timespresenting a huge performance evolution.


Figure 4 SSB-Flat wide table model


Figure 5 SSB star schema

(click the link Go to view)

About TPC-H

TPC-H is a decision support benchmark (Decision Support Benchmark), which consists of a set of business-oriented ad hoc queries and concurrent data modification, and the data queried and populated in the database has a wide range of industry relevance. This benchmark demonstrates a decision support system that examines large amounts of data, executes highly complex queries, and answers key business questions. The performance index reported by TPC-H is called TPC-H composite query performance index per hour (QphH@Size), which reflects multiple aspects of the system’s ability to process queries. These aspects include the database size chosen when executing the query, the query processing capability when the query is submitted by a single stream, and the query throughput when the query is submitted by many concurrent users.

On 22 queries on the TPC-H standard test dataset,Compared with version 1.1.3, the overall performance of Apache Doris version 1.2.0 has increased by nearly 3 times, and compared to version 0.15.0, it has increased by more than 11 times, and the single SQL has increased by up to nearly 70 times!


Figure 6 TPCH-100 performance test comparison

(click the link Go to view)

From the above performance test results, it can be seen thatThere is no doubt that version 1.2 has become the version with the best performance since Apache Doris was open sourced, which also makes Apache Doris a new benchmark for global OLAP database performance. This achievement is inseparable from the dedication of all community developers and the trust of all users. It is precisely because of the efforts of all community members that Apache Doris has made rapid progress. I would like to express my gratitude to all community developers and users. My sincerest thanks.

It is true that performance is not all that the database pursues. In the new version 1.2, there are more latest features waiting to be announced. Please look forward to the subsequent Release Note for the complete functions. I believe it will surprise every long-awaited user. Finally, it is expected that more developers and open source enthusiasts will join the Apache Doris community together to promote the excellent open source projects of Chinese people to the world and become the new cornerstone of modern data analysis technology.

# Interactive moments #

Doris Summit 2022 has officially set sail, and the latest development progress of Apache Doris and RoadMap will be synchronized at the Summit. We sincerely solicit speech topics from the entire community. If you have good ideas, including but not limited to business best practices, technical in-depth analysis, industry trend interpretation, data ecological solutions, etc., you are welcome to submit topics to participate in sharing, and share with the community. Experts in the field have in-depth discussions and exchanges.

Topic collection link:

#times #performance #improvement #multiple #scenarios #performance #version #Apache #Doris #revealed #News Fast Delivery

Leave a Comment

Your email address will not be published. Required fields are marked *