Apache Druid 24.0.0 released – News Fast Delivery

Apache Druid It is a distributed data processing system that supports real-time multi-dimensional OLAP analysis. It supports both high-speed real-time data ingestion processing and real-time and flexible multi-dimensional data analysis queries. Therefore, the most commonly used scenario of Druid is flexible and fast multi-dimensional OLAP analysis in the context of big data.

In addition, Druid has a key feature: it supports pre-aggregation ingestion and aggregation analysis of data based on timestamps, so some users often use it in scenarios with time-series data processing and analysis.

Currently Apache Druid 24.0.0 is released, this versionContains over 300 new features, bug fixes, performance enhancements, documentation improvements and additional tests from 67 contributors. Here are some of the new features:

Multi-stage query task engine

SQL-based ingestion of Apache Druid (ingestion) uses a distributed multi-stage query architecture that includes a query engine called the Multi-Phase Query Task Engine (MSQ Task Engine). The MSQ task engine extends Druid’s query capabilities so that queries that reference external data can be written and ingested using SQL INSERT and REPLACE.

As of Druid 24.0.0, SQL-based ingestion using the multi-stage query task engine is the most recommended solution, while alternative ingestion solutions such as native batch processing and Hadoop-based ingestion systems are still supported.

refer to:

#12524
#12386
#12523
#12589

nested columns

Druid now supports storing nested data structures directly in the newly added COMPLEX column type. COMPLEX columns store a copy of structured data in JSON format, with dedicated internal columns and indexes for nested literal values (types STRING, LONG, and DOUBLE).

refer to:

#12753
#12714
#12753
#12920

Update Java support

Java 11 is fully supported, with improved Java 17 support.

#12839

query engine update

Updated query handling for column indexes and filters

The redesigned column index is very flexible, allowing various index types to be modeled. Added a mechanism to build filters that use updated indexes, while also allowing other column implementations to implement built-in index types to provide adapters to use indexes in the current collection filters provided by Druid.

#12388

time filter operator

You can now use the Druid SQL operator TIME_IN_INTERVAL to filter query results based on time. Use TIME_IN_INTERVAL in preference to the SQL BETWEEN operator to filter by time. For more information, see Date and Time Functions.

#12662

Null values and the “in” filter

ifvaluesarray containsnull, the “in” filter matches null values. Unlike SQL IN filters that do not match null values.

For more information, see Query Filters and SQL Data Types.
#12863

Virtual columns in search queries

Previously, search queries could only search for dimensions present in the data source, now virtual columns are supported as parameters in the query.

#12720

Optimizing simple MIN/MAX SQL queries on __time

Simple query like nowselect max(__time) from dsastimeBoundaryThe query runs to take advantage of the time dimension ordering in the segment. A feature flag can be set to enable this feature.

#12472
#12491

String aggregation result

First/Last string aggregators now compare based on value only.Previously, the value of the first/last string aggregator was first based on_timeColumns are compared, and then by value.

If you have an existing query and want to keep using it_timecolumn and value, update the query to use ORDER BY MAX(timeCol).

#12773

Jackson serialization

Introduced and implemented new helper functionsJacksonUtilsto achieve SerializerProviderObject reuse.

Additionally, by default the GroupByQueryToolChest Backward compatibility for mapped rows, which eliminates copy heavyweightsObjectMapper. Introduced a configuration option that allows administrators to explicitly enable backwards compatibility.

#12468

Updated IPAddress Java library

A new IPAddress Java library dependency has been added to handle IP addresses, the library includes IPv6 support, and IPv4 functions have been migrated to use the new library.

#11634

Others include lots of performance improvements, this is a big release, check out the update announcement for more details.

#Apache #Druid #released #News Fast Delivery