Apache Doris
Apache Doris is a high-performance, real-time analytical database based on MPP architecture, known for its exceptional ease of use and sub-second response times for queries on massive datasets. It supports both high-concurrency point query scenarios and high-throughput complex analytical scenarios. Consequently, Apache Doris is well-suited for use cases such as report analysis, ad-hoc queries, unified data warehouse construction, and data lake federation query acceleration. Users can build applications on top of this, such as user behavior analysis, A/B testing platforms, log search analysis, user profiling, and order analysis.
This document will introduce how to use Apache Doris Routine Load to import data from AutoMQ into Apache Doris. For a detailed understanding of Routine Load, please refer to the Routine Load Basic Principles documentation.
Environment Preparation
Prepare Apache Doris and Test Data
Ensure that a usable Apache Doris cluster is ready. For demonstration purposes, a test Apache Doris environment has been deployed on Linux, as referenced in the Docker Deployment of Doris documentation.
Create a database and test table:
create database automq_db;
CREATE TABLE automq_db.users (
id bigint NOT NULL,
name string NOT NULL,
timestamp string NULL,
status string NULL
) DISTRIBUTED BY hash (id) PROPERTIES ('replication_num' = '1');
Prepare Kafka Command-line Tools
Download the latest TGZ package from AutoMQ Releases and extract it. Assuming the extraction directory is $AUTOMQ_HOME, this article will use the tools in $AUTOMQ_HOME/bin to create topics and generate test data.
Prepare AutoMQ and Test Data
Refer to the AutoMQ Deploy Multi-Nodes Cluster on Linux▸ to deploy a usable cluster and ensure network connectivity between AutoMQ and Apache Doris.
Quickly create a topic named example_topic
in AutoMQ and write a test JSON data into it, following the steps below.
Create Topic
Use the Apache Kafka command-line tool to create the topic. Make sure you have access to the Kafka environment and that the Kafka service is running. Here is an example command for creating a topic:
$AUTOMQ_HOME/bin/kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 127.0.0.1:9092 --partitions 1 --replication-factor 1
When executing the command, you need to replace the topic and bootstrap-server with the actual AutoMQ Bootstrap Server address.
After creating the topic, you can use the following command to verify if the topic was successfully created.
$AUTOMQ_HOME/bin/kafka-topics.sh --describe example_topic --bootstrap-server 127.0.0.1:9092
Generate Test Data
Generate a JSON formatted test data that corresponds with the previous table.
{
"id": 1,
"name": "Test User"
"timestamp": "2023-11-10T12:00:00",
"status": "active"
}
Write Test Data
Use Kafka command line tools or programming methods to write the test data into a Topic named example_topic. Below is an example using the command line tool:
echo '{"id": 1, "name": "Test User", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic example_topic
Use the following command to view the data just written to the topic:
sh $AUTOMQ_HOME/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic example_topic --from-beginning
When executing the command, you need to replace the topic and bootstrap-server with the actual AutoMQ Bootstrap Server address.
Create Routine Load Import Job
Create a Routine Load job in the Apache Doris command line that receives JSON data, to continuously import data from the AutoMQ Kafka topic. For specific Routine Load parameter details, please refer to Doris Routine Load.
CREATE ROUTINE LOAD automq_example_load ON users
COLUMNS(id, name, timestamp, status)
PROPERTIES
(
"format" = "json",
"jsonpaths" = "[\"$.id\",\"$.name\",\"$.timestamp\",\"$.status\"]"
)
FROM KAFKA
(
"kafka_broker_list" = "127.0.0.1:9092",
"kafka_topic" = "example_topic",
"property.kafka_default_offsets" = "OFFSET_BEGINNING"
);
When executing the command, replace kafka_broker_list
with the actual AutoMQ Bootstrap Server address being used.
Verify Data Import
First, check the status of the Routine Load job to ensure the task is running.
show routine load\G;
Then, query the relevant table in the Apache Doris database to confirm that the data has been successfully imported.
select * from users;
+------+--------------+---------------------+--------+
| id | name | timestamp | status |
+------+--------------+---------------------+--------+
| 1 | Test User | 2023-11-10T12:00:00 | active |
| 2 | Test User | 2023-11-10T12:00:00 | active |
+------+--------------+---------------------+--------+
2 rows in set (0.01 sec)