Skip to Main Content

AutoMQ x Databend: Cloud Data Warehouse built in Rust with Kafka Ecosystem

Databend is a state-of-the-art, cloud-native data warehouse developed using Rust, tailored for cloud architectures and leveraging object storage. It provides enterprises with a robust big data analytics platform featuring an integrated lakehouse architecture and a separation of compute and storage resources.

This article outlines the steps to import data from AutoMQ into Databend using bend-ingest-kafka.

Environment Setup

Prepare Databend Cloud and Test Data

Initially, navigate to Databend Cloud to launch a Warehouse, and proceed to create a database and a test table in the worksheet.


create database automq_db;
create table users (
id bigint NOT NULL,
name string NOT NULL,
ts timestamp,
status string
)

Prepare AutoMQ and Test Data

Follow the Stand-alone Deployment guide to set up AutoMQ, ensuring there is network connectivity between AutoMQ and Databend.

Promptly create a topic named example_topic in AutoMQ and populate it with test JSON data by following these instructions.

Create a Topic

To set up a topic using Apache Kafka® command-line tools, first ensure that you have access to a Kafka environment and the Kafka service is active. Here's an example command to create a topic:


./kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 10.0.96.4:9092 --partitions 1 --replication-factor 1

When executing the command, replace `topic` and `bootstrap-server` with the actual address of the Kafka server being used.

Once the topic is created, use the command below to confirm that the topic was successfully established.


./kafka-topics.sh --describe example_topic --bootstrap-server 10.0.96.4:9092

Generate Test Data

Create a JSON formatted test data, matching the previous table.


{
"id": 1,
"name": "测试用户",
"timestamp": "2023-11-10T12:00:00",
"status": "active"
}

Write Test Data

Write test data to a topic named `example_topic` using Kafka's command-line tools or through programming. Here is an example using the command-line tool:


echo '{"id": 1, "name": "测试用户", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 10.0.96.4:9092 --topic example_topic

When executing the command, replace `topic` and `bootstrap-server` with the actual address of the Kafka server being used.

Use the following command to view the data just written to the topic:


sh kafka-console-consumer.sh --bootstrap-server 10.0.96.4:9092 --topic example_topic --from-beginning

Create a bend-ingest-databend job

bend-ingest-kafka is capable of monitoring Kafka and batching data into a Databend Table. Once bend-ingest-kafka is deployed, the data import job can be started.


bend-ingest-kafka --kafka-bootstrap-servers="localhost:9094" --kafka-topic="example_topic" --kafka-consumer-group="Consumer Group" --databend-dsn="https://cloudapp:password@host:443" --databend-table="automq_db.users" --data-format="json" --batch-size=5 --batch-max-interval=30s

When executing the command, replace `kafka-bootstrap-servers` with the actual Kafka server address being used.

Parameter Description

databend-dsn

The DSN for connecting to the warehouse, provided by Databend Cloud, is detailed in this document.

batch-size

bend-ingest-kafka accumulates data up to the specified batch-size before initiating a data synchronization.

Validate Data Import

Access the Databend Cloud worksheet and execute a query on the automq_db.users table to verify the synchronization of data from AutoMQ to the Databend Table.