Databend
Databend is a next-generation cloud-native data warehouse developed in Rust. It is specifically designed for cloud architecture. Built on object storage, it offers businesses seamless integration with data lakes and warehouses while providing a big data analytics platform that separates compute from storage.
This document will guide you on how to import data from AutoMQ into Databend using bend-ingest-kafka.
Environment Preparation
Prepare Databend Cloud and Test Data
Firstly, visit Databend Cloud to initiate a Warehouse. Then, create databases and test tables within the worksheet.
create database automq_db;
create table users (
id bigint NOT NULL,
name string NOT NULL,
ts timestamp,
status string
)
Prepare AutoMQ and Test Data
Refer to Deploy Multi-Nodes Cluster on Linux▸ to set up AutoMQ, making sure there's network connectivity between AutoMQ and Databend.
Quickly create a topic named example_topic
in AutoMQ and write test JSON data into it by following these steps.
Create Topic
Utilize Apache Kafka command-line tools to create a topic. Ensure you have access to the Kafka environment and that the Kafka service is running. Below is an example command for creating a topic:
./kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 10.0.96.4:9092 --partitions 1 --replication-factor 1
When executing commands, replace topic
and bootstrap-server
with the actual Kafka server addresses.
After creating the topic, you can use the following command to verify whether the topic has been successfully created.
./kafka-topics.sh --describe example_topic --bootstrap-server 10.0.96.4:9092
Generate Test Data
Generate test data in JSON format that corresponds to the previously mentioned table.
{
"id": 1,
"name": "Test User",
"timestamp": "2023-11-10T12:00:00",
"status": "active"
}
Write Test Data
Use Kafka's command-line tools or programming methods to write test data into a topic named example_topic. Here is an example using command-line tools:
echo '{"id": 1, "name": "Test User", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 10.0.96.4:9092 --topic example_topic
When executing the command, replace the topic and bootstrap-server with the actual Kafka server address being used.
You can view the recently written topic data using the following command:
sh kafka-console-consumer.sh --bootstrap-server 10.0.96.4:9092 --topic example_topic --from-beginning
Create a Bend-ingest-databend Job
bend-ingest-kafka is designed to monitor Kafka and batch write data into a Databend Table. After deploying bend-ingest-kafka
, the data import job can be initiated.
bend-ingest-kafka --kafka-bootstrap-servers="localhost:9094" --kafka-topic="example_topic" --kafka-consumer-group="Consumer Group" --databend-dsn="https://cloudapp:password@host:443" --databend-table="automq_db.users" --data-format="json" --batch-size=5 --batch-max-interval=30s
When executing the command, replace kafka-bootstrap-servers with the actual Kafka server address you are using.
Parameter Description
databend-dsn
The DSN provided by Databend Cloud for connecting to the warehouse can be found in this documentation.
batch-size
bend-ingest-kafka
will gather data until it reaches the specified batch size, then initiate a data synchronization process.
Verify Data Import
Navigate to the Databend Cloud worksheet and query the automq_db.users
table. You should confirm that the data has been successfully synchronized from AutoMQ to the Databend Table.
