Apache Doris
Apache Doris is a high-performance, real-time analytical database based on MPP architecture, known for its exceptional ease of use and sub-second response times for queries on massive datasets. It supports both high-concurrency point query scenarios and high-throughput complex analytical scenarios. Consequently, Apache Doris is well-suited for use cases such as report analysis, ad-hoc queries, unified data warehouse construction, and data lake federation query acceleration. Users can build applications on top of this, such as user behavior analysis, A/B testing platforms, log search analysis, user profiling, and order analysis.
This document discusses how to import data from AutoMQ into Apache Doris using Apache Doris Routine Load. For detailed information about Routine Load, please refer to the Routine Load Fundamentals document.
Environment Preparation
Prepare Apache Doris and Test Data
Ensure that an operational Apache Doris cluster is available. For demonstration purposes, we have set up a test Apache Doris environment on Linux by following the Docker Deployment for Doris document.
Create a database and test table:
create database automq_db;
CREATE TABLE automq_db.users (
id bigint NOT NULL,
name string NOT NULL,
timestamp string NULL,
status string NULL
) DISTRIBUTED BY hash (id) PROPERTIES ('replication_num' = '1');
Prepare Kafka Command-line Tools
Download the latest TGZ package from AutoMQ Releases and extract it. Assuming the extraction directory is $AUTOMQ_HOME, this article will use the tools in $AUTOMQ_HOME/bin to create topics and generate test data.
Prepare AutoMQ and Test Data
Refer to AutoMQ Deploy Multi-Nodes Cluster on Linux▸ to deploy a functional cluster, ensuring network connectivity between AutoMQ and Apache Doris.
Quickly create a topic named example_topic
in AutoMQ and write a test JSON data into it, following the steps below.
Create Topic
Use the Apache Kafka command-line tool to create the topic. Make sure you have access to the Kafka environment and that the Kafka service is running. Here is an example command for creating a topic:
$AUTOMQ_HOME/bin/kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 127.0.0.1:9092 --partitions 1 --replication-factor 1
When executing the command, you need to replace the topic and bootstrap-server with the actual AutoMQ Bootstrap Server address.
After creating the topic, you can use the following command to verify if the topic was successfully created.
$AUTOMQ_HOME/bin/kafka-topics.sh --describe example_topic --bootstrap-server 127.0.0.1:9092
Generate Test Data
Generate a JSON formatted test data that corresponds with the previous table.
{
"id": 1,
"name": "Test User"
"timestamp": "2023-11-10T12:00:00",
"status": "active"
}
Write Test Data
Use Kafka command line tools or programming methods to write the test data into a Topic named example_topic. Below is an example using the command line tool:
echo '{"id": 1, "name": "Test User", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic example_topic
Use the following command to view the data just written to the topic:
sh $AUTOMQ_HOME/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic example_topic --from-beginning
When executing the command, you need to replace the topic and bootstrap-server with the actual AutoMQ Bootstrap Server address.
Create Routine Load Import Job
Create a Routine Load job in the Apache Doris command line to receive JSON data, which is used for continuously importing data from the AutoMQ Kafka topic. For specific parameters of Routine Load, please refer to Doris Routine Load documentation.
CREATE ROUTINE LOAD automq_example_load ON users
COLUMNS(id, name, timestamp, status)
PROPERTIES
(
"format" = "json",
"jsonpaths" = "[\"$.id\",\"$.name\",\"$.timestamp\",\"$.status\"]"
)
FROM KAFKA
(
"kafka_broker_list" = "127.0.0.1:9092",
"kafka_topic" = "example_topic",
"property.kafka_default_offsets" = "OFFSET_BEGINNING"
);
When executing the command, replace kafka_broker_list
with the actual AutoMQ Bootstrap Server address being used.
Verify Data Import
First, check the status of the Routine Load job to ensure the task is running.
show routine load\G;
Then, query the relevant table in the Apache Doris database to confirm that the data has been successfully imported.
select * from users;
+------+--------------+---------------------+--------+
| id | name | timestamp | status |
+------+--------------+---------------------+--------+
| 1 | Test User | 2023-11-10T12:00:00 | active |
| 2 | Test User | 2023-11-10T12:00:00 | active |
+------+--------------+---------------------+--------+
2 rows in set (0.01 sec)