Databend v0.8 has been released. Databend is a modern cloud data warehouse developed based on Rust. It is committed to realizing high-performance and elastically scalable real-time data analysis and activating users’ data potential.

announced that, the development of Databend v0.8 started on March 28, with a total of 5000+ commits and 4600+ file changes. In the past 5 months, more than 120 contributors in the community have added 42W lines of code and deleted 16W lines, which is equivalent to rewriting Databend. In this release, the community has made significant improvements to the SQL Planner framework and migrated all SQL statements to the new Planner, providing full JOIN and subquery support.

download link:https://github.com/datafuselabs/databend/releases/tag/v0.8.0-nightly

major improvements

  • New Planner: JOIN! JOIN! JOIN!

In order to better support complex SQL queries and improve user experience, Databend v0.8 has designed a new Planner framework.

Driven by New Planner, Databend adds JOIN and efficient sub-query support. All sub-queries have been completely decorated before entering runtime:


select vip_info.Client_ID, vip_info.Region 
    from vip_info right 
    join purchase_records 
    on vip_info.Client_ID = purchase_records.Client_ID;
  • New Parser: The best Parser ever!

While refactoring Planner, the Databend community is based on nom(https://github.com/Geal/nom) and partt realize a new Parser that takes into account development efficiency and user experience.

The new Parser allows developers to easily design/develop/test complex SQL syntaxes in an intuitive way


COPY
    ~ INTO ~ #copy_unit
    ~ FROM ~ #copy_unit
    ~ ( FILES ~ "=" ~ "(" ~ #comma_separated_list0(literal_string) ~ ")")?
    ~ ( PATTERN ~ "=" ~ #literal_string)?
    ~ ( FILE_FORMAT ~ "=" ~ #options)?
    ~ ( VALIDATION_MODE ~ "=" ~ #literal_string)?
    ~ ( SIZE_LIMIT ~ "=" ~ #literal_u64)?

At the same time, it can give users specific and accurate error information:


MySQL [(none)]> select number from numbers(10) as t inner join numbers(30) as t1 using(number);
ERROR 1105 (HY000): Code: 1065, displayText = error:
  --> SQL:1:8
  |
1 | select number from numbers(10) as t inner join numbers(30) as t1 using(number)
  |        ^^^^^^ column reference is ambiguous

Never worry about not knowing where your SQL went wrong again.

access The New Databend SQL Planner:https://databend.rs/blog/new-planner for more details

new features

In addition to the newly designed Planner, the Databend community has implemented numerous new features:

COPY enhancements:

The COPY capability has been greatly enhanced, and the current Databend can


COPY 
    INTO ontime200 
    FROM 'https://repo.databend.rs/dataset/stateful/ontime_2006_[200-300].csv' 
    FILE_FORMAT = (TYPE = 'CSV')


COPY 
    INTO ontime200 
    FROM 's3://bucket/dataset/stateful/ontime.csv.gz' 
    FILE_FORMAT = (TYPE = 'CSV' COMPRESSION=AUTO)


COPY 
    INTO 'azblob://bucket/'  
    FROM ontime200
    FILE_FORMAT = (TYPE = 'PARQUET‘)

Hive supports:

Databend v0.8 designed and developed Multi Catalog and implemented Hive Metastore support on this basis!

Now Databend can directly interface with Hive and read data from HDFS.


select * from hive.default.customer_p2 order by c_nation;

time travel:

A long time ago, the Databend community shared the implementation of the underlying FUSE Engine From Git to Fuse Engine(https://databend.rs/blog/databend-engine). One of the very important features is to support time travel, we can query the data table at any point in time.

Since version v0.8, this function has been officially implemented, and now we can


-- Travel to the time when the last row was inserted
select * from demo at (TIMESTAMP => '2022-06-22 08:58:54.509008'::TIMESTAMP); 
+----------+
| c        |
+----------+
| batch1.1 |
| batch1.2 |
| batch2.1 |
+----------+


DROP TABLE test;

SELECT * FROM test;
ERROR 1105 (HY000): Code: 1025, displayText = Unknown table 'test'.

-- un-drop table
UNDROP TABLE test;

-- check
SELECT * FROM test;
+------+------+
| a    | b    |
+------+------+
|    1 | a    |
+------+------+

Make business data more secure!

CTE supports:

CTE (Common Table Expression) is a frequently used function in OLAP business. It is used to define a temporary result set within the execution scope of a single statement. It is only valid during the query period. It realizes the reuse of code segments, improves readability, and better Implement complex queries.

Databend v0.8 reimplemented CTEs based on New Planner, and now users can happily use WITH to declare CTEs:


WITH customers_in_quebec 
     AS (SELECT customername, 
                city 
         FROM   customers 
         WHERE  province="Québec") 
SELECT customername 
FROM   customers_in_quebec
WHERE  city = 'Montréal' 
ORDER  BY customername; 

In addition to these features mentioned above, Databend v0.8 also supports UDFs, adds DELETE statements, and further strengthens the support for semi-structured data types, not to mention a large number of SQL statement improvements and the addition of new methods.

Quality improvement

Functional implementation is only the first step in product delivery. In Databend v0.8, the community introduced the concept of engineering quality to evaluate the development quality of Databend from the three dimensions of users, contributors and community.

Reassure users:

In order to allow users to use Databend with confidence, the community has added a lot of tests in the past three months, extracted from YDB and other enriched stateless test sets, added stateful tests of data sets such as ontime and hits, and launched SQL Logic Test Coverage tests for all interfaces, with SQL Fuzz enabled to cover edge cases.

Not only that, the community is also online Databend Perf(https://perf.databend.rs/) Do continuous performance testing of Databend in the production environment to discover unexpected performance regressions in time.

Make contributors comfortable:

Databend is a large Rust project that has been criticized by the community for its build time.

In order to improve this problem and make contributors feel comfortable, the community has launched a high-configuration, specially tuned Self-hosted Runner to perform PR integration tests, and enabled Mergify, mold, dev-tools and other services or tools to optimize CI process.

At the same time, it also initiated a new plan to adjust the structure of the Databend project, splitting the original huge query crate into multiple sub-crates, avoiding changing a line of code as much as possible, and executing the check for five minutes.

Make the community happy:

Databend is a contributor and participant in the open source community.During the development of v0.8, the Databend community established the principle of Upstream First, actively followed up and adopted the latest upstream version, reported known bugs, contributed its own Patch, and opened the Tracking issues of upstream first violation(https://github.com/datafuselabs/databend/issues/6926) to keep up with the latest developments.

The Databend community is actively exploring integration with other open source projects, and has already implemented integration and support for third-party drivers such as Vector, sqlalchemy, and clickhouse-driver.

next steps

Databend v0.8 is a solid foundation. We have a new Planner, which makes it easier to implement functions and optimize. In version 0.9, we expect to improve the following areas:

  • Query Result Cache
  • JSON Optimization
  • Table Share
  • Processor Profiling
  • Resource Quota
  • Data Caching

Welcome to read Release proposal: Nightly v0.9(https://github.com/datafuselabs/databend/issues/7052) for the latest news~

go now

Visit the release log and download the latest version to learn more, if you encounter problems, welcome to use Github Issues (https://github.com/datafuselabs/databend/issues)Submit feedback!

About Databend

Databend is an open source, flexible, low-cost new data warehouse that can also perform real-time analysis based on object storage. Looking forward to your attention, we will explore cloud-native data warehouse solutions together and build a new generation of open source Data Cloud.

#Databend #v08 #released #modern #cloud #data #warehouse #based #Rust #News Fast Delivery

Databend v0.8 released, a modern cloud data warehouse based on Rust – News Fast Delivery

Leave a Comment

Your email address will not be published. Required fields are marked *