“Open Source Big Data Heat Report 2022” Finalist Project Announcement-News Fast Delivery

In the more than ten years of rapid development of open source big data technology, we have witnessed the rise and change of diversified technologies. How to make deep insights into the past, present and future of open source big data technology from massive data through data processing and visualization? How to provide useful reference for developers to learn, select and develop technologies in the field of open source big data technology?With this kind of thinking, the Open Atom Open Source Foundation, X-Lab Open Lab, and Alibaba Open Source Committee jointly initiated the“2022 Open Source Big Data Heat Report”project.

Project Description

“2022 Open Source Big Data Heat Report”Collect relevant public data for correlation analysis, draw a heat map based on the big data technology stack through core indicators such as Star, Issue, and open PR, and study the technical trends of open source big data after entering a new stage, as well as the operating mode of the open source community on the technology trend. boosting effect. The project research follows the following 7 stages: preliminary screening of public data -> project technical classification -> expert review -> finalist announcement & solicitation correction -> caloric value calculation and correlation analysis -> data insight and project research -> report review.

Data Sources

Github and Jira public data from January 2015 to September 2022, including project id, Star, Issue, open PR, review comment, merge PR, etc.

Data screening

The project initially screened open-source big data projects with Topic Tag on Github that meet the following conditions:

Topic Tag: big-data, etl, data-ingestion, data-collection, data-pipeline, data-analysis, data-analytics, analytics, data visualization, business-intelligence, data science, data-engineering

Technical classification

According to the framework of the modern technology stack of big data, the technical classification of the preliminary screening projects is carried out. Technical categories include:

Data integration, stream processing, data storage, data query and analysis, data development, data scheduling and orchestration, data management/security/middleware, data visualization.

illustrate:

Data query and analysis classification focuses on big data analysis type projects, excluding OLTP databases, HTAP databases and NoSQL databases with OLTP capabilities
Data source linking and processing capabilities are required in data visualization classification, excluding visualization framework tool projects
In the data management/security/middleware category, there are fewer items and functions overlap each other, so they are grouped into one category
This report focuses on the field of big data, excluding big data AI integration projects

Project announcement

The shortlisted projects (92 in total) are now announced, and the publicity period is from October 10 to October 16, 2022.

Technical classification	project name
data integration	airbytehq/airbyte alibaba/DataX apache/camel apache/flume apache/incubator-seatunnel apache/inlong apache/sqoop dbt-labs/dbt-core debezium/debezium ververica/flink-cdc-connectors
stream processing	apache/beam apache/flink apache/incubator-heron apache/incubator-streampark apache/kafka apache/pulsar apache/samza apache/storm
Data query and analysis	apache/arrow-datafusion apache/calcite apache/cassandra apache/doris apache/drill apache/druid apache/hawq apache/hbase apache/hive apache/impala apache/incubator-kyuubi apache/kylin apache/lucene apache/phoenix apache/pig apache/pinot apache/solr apache/spark apache/tez ClickHouse/ClickHouse duckdb/duckdb elastic/elasticsearch eventql/eventql greenplum-db/gpdb opensearch-project/OpenSearch prestodb/presto StarRocks/starrocks trinodb/trino uber/aresdb
data storage	apache/avro apache/bookkeeper apache/carbondata apache/hadoop-hdfs apache/hudi apache/iceberg apache/incubator-pegasus apache/kudu apache/ozone apache/parquet-format delta-io/delta hazelcast/hazelcast juicedata/juicefs
Data Management/Security/Middleware	apache/ambari apache/arrow apache/atlas apache/bigtop apache/hadoop apache/knox apache/ranger cube-js/cube.js datahub-project/datahub
data development	apache/incubator-devlake apache/zeppelin jupyter/notebook pachyderm/pachyderm
data visualization	apache/superset dataease/dataease edp963/davinci elastic/kibana getredash/redash grafana/grafana keplergl/kepler.gl metabase/metabase shzlw/poli
Data Scheduling and Orchestration	Alluxio/alluxio apache/airflow apache/dolphinscheduler apache/incubator-linkis apache/nifi apache/oozie apache/zookeeper dagster-io/dagster kestra-io/kestra PrefectHQ/prefect

Supplementary Call for Other Projects

If you are also a fan of open source projects, if your well-known projects are not in the above list, but meet the following criteria, you can scan the QR code below to participate in the project submission during the publicity period.

Participation Criteria:

1. Open source big data projects with clear open source protocols and complete documents; new versions have been released within half a year

2. One of the following Topic Tags on Github: big-data, etl, data-ingestion, data-collection, data-pipeline, data-analysis, data-analytics, analytics, data visualization, business-intelligence, data science, data-engineering

way of participation:

Scan the QR code above to participate in the solicitation
Deadline: 24:00 on October 16, 2022

release notice

“Open Source Big Data Heat Report 2022”It will be officially released at the Yunqi Conference in November 2022.

Special thanks

Co-sponsors: Open Atom Open Source Foundation, X-Lab Open Lab, Alibaba Open Source Committee
Strategic cooperation: Open Source China, InfoQ, Alibaba Cloud Developer Community
Cooperative media: CSDN, Datafun, SegmentFault

#Open #Source #Big #Data #Heat #Report #Finalist #Project #AnnouncementNews Fast Delivery

Project Description

Project announcement

Supplementary Call for Other Projects

release notice

Special thanks

Leave a Comment Cancel Reply