In the more than ten years of rapid development of open source big data technology, we have witnessed the rise and change of diversified technologies. How to make deep insights into the past, present and future of open source big data technology from massive data through data processing and visualization? How to provide useful reference for developers to learn, select and develop technologies in the field of open source big data technology?With this kind of thinking, the Open Atom Open Source Foundation, X-Lab Open Lab, and Alibaba Open Source Committee jointly initiated the“2022 Open Source Big Data Heat Report”project.

Project Description

“2022 Open Source Big Data Heat Report”Collect relevant public data for correlation analysis, draw a heat map based on the big data technology stack through core indicators such as Star, Issue, and open PR, and study the technical trends of open source big data after entering a new stage, as well as the operating mode of the open source community on the technology trend. boosting effect. The project research follows the following 7 stages: preliminary screening of public data -> project technical classification -> expert review -> finalist announcement & solicitation correction -> caloric value calculation and correlation analysis -> data insight and project research -> report review.

Data Sources

Github and Jira public data from January 2015 to September 2022, including project id, Star, Issue, open PR, review comment, merge PR, etc.

Data screening

The project initially screened open-source big data projects with Topic Tag on Github that meet the following conditions:

Topic Tag: big-data, etl, data-ingestion, data-collection, data-pipeline, data-analysis, data-analytics, analytics, data visualization, business-intelligence, data science, data-engineering

Technical classification

According to the framework of the modern technology stack of big data, the technical classification of the preliminary screening projects is carried out. Technical categories include:

Data integration, stream processing, data storage, data query and analysis, data development, data scheduling and orchestration, data management/security/middleware, data visualization.

illustrate:

  • Data query and analysis classification focuses on big data analysis type projects, excluding OLTP databases, HTAP databases and NoSQL databases with OLTP capabilities
  • Data source linking and processing capabilities are required in data visualization classification, excluding visualization framework tool projects
  • In the data management/security/middleware category, there are fewer items and functions overlap each other, so they are grouped into one category
  • This report focuses on the field of big data, excluding big data AI integration projects

Project announcement

The shortlisted projects (92 in total) are now announced, and the publicity period is from October 10 to October 16, 2022.

Technical classification

project name

data integration

airbytehq/airbyte

alibaba/DataX

apache/camel

apache/flume

apache/incubator-seatunnel

apache/inlong

apache/sqoop

dbt-labs/dbt-core

debezium/debezium

ververica/flink-cdc-connectors

stream processing

apache/beam

apache/flink

apache/incubator-heron

apache/incubator-streampark

apache/kafka

apache/pulsar

apache/samza

apache/storm

Data query and analysis

apache/arrow-datafusion

apache/calcite

apache/cassandra

apache/doris

apache/drill

apache/druid

apache/hawq

apache/hbase

apache/hive

apache/impala

apache/incubator-kyuubi

apache/kylin

apache/lucene

apache/phoenix

apache/pig

apache/pinot

apache/solr

apache/spark

apache/tez

ClickHouse/ClickHouse

duckdb/duckdb

elastic/elasticsearch

eventql/eventql

greenplum-db/gpdb

opensearch-project/OpenSearch

prestodb/presto

StarRocks/starrocks

trinodb/trino

uber/aresdb

data storage

apache/avro

apache/bookkeeper

apache/carbondata

apache/hadoop-hdfs

apache/hudi

apache/iceberg

apache/incubator-pegasus

apache/kudu

apache/ozone

apache/parquet-format

delta-io/delta

hazelcast/hazelcast

juicedata/juicefs

Data Management/Security/Middleware

apache/ambari

apache/arrow

apache/atlas

apache/bigtop

apache/hadoop

apache/knox

apache/ranger

cube-js/cube.js

datahub-project/datahub

data development

apache/incubator-devlake

apache/zeppelin

jupyter/notebook

pachyderm/pachyderm

data visualization

apache/superset

dataease/dataease

edp963/davinci

elastic/kibana

getredash/redash

grafana/grafana

keplergl/kepler.gl

metabase/metabase

shzlw/poli

Data Scheduling and Orchestration

Alluxio/alluxio

apache/airflow

apache/dolphinscheduler

apache/incubator-linkis

apache/nifi

apache/oozie

apache/zookeeper

dagster-io/dagster

kestra-io/kestra

PrefectHQ/prefect

Supplementary Call for Other Projects

If you are also a fan of open source projects, if your well-known projects are not in the above list, but meet the following criteria, you can scan the QR code below to participate in the project submission during the publicity period.

Participation Criteria:

1. Open source big data projects with clear open source protocols and complete documents; new versions have been released within half a year

2. One of the following Topic Tags on Github: big-data, etl, data-ingestion, data-collection, data-pipeline, data-analysis, data-analytics, analytics, data visualization, business-intelligence, data science, data-engineering

way of participation:

Scan the QR code above to participate in the solicitation
Deadline: 24:00 on October 16, 2022

release notice

“Open Source Big Data Heat Report 2022”It will be officially released at the Yunqi Conference in November 2022.

Special thanks

  • Co-sponsors: Open Atom Open Source Foundation, X-Lab Open Lab, Alibaba Open Source Committee
  • Strategic cooperation: Open Source China, InfoQ, Alibaba Cloud Developer Community
  • Cooperative media: CSDN, Datafun, SegmentFault

#Open #Source #Big #Data #Heat #Report #Finalist #Project #AnnouncementNews Fast Delivery

Leave a Comment

Your email address will not be published. Required fields are marked *