AutoMQ with Apache Doris: Real-Time Data Streaming and Analytics

May 20, 2024
AutoMQ Team
4 min read
AutoMQ with Apache Doris: Real-Time Data Streaming and Analytics

Apache Doris is a high-performance, real-time analytical database that leverages an MPP architecture for its lightning-fast usability. It delivers query results in sub-second times, even under enormous data volumes. Apache Doris excels in supporting both high-concurrency point queries and high-throughput complex analytical scenarios. As a result, it effectively satisfies the requirements for report analysis, ad-hoc queries, unified data warehouse construction, and accelerated federated queries for data lakes. Users can utilize this technology to develop applications focused on user behavior analysis, AB testing platforms, log search analysis, user profiling, and order analysis.

This article will explore how to use Apache Doris Routine Load to import data from AutoMQ into Apache Doris. For more information on Routine Load, please refer to the Routine Load Fundamental Principles document.

Environment Setup

Preparing Apache Doris and Test Data

Ensure that a functional Apache Doris cluster is ready. For demonstration purposes, we used the Docker Deployment for Doris document to set up a test environment of Apache Doris on Linux.

Creating databases and test tables:

sql

create database automq_db;
CREATE TABLE automq_db.users (
                                 id bigint NOT NULL,
                                 name string NOT NULL,
                                 timestamp string NULL,
                                 status string NULL

) DISTRIBUTED BY hash (id) PROPERTIES ('replication_num' = '1');

Preparing Kafka command-line tools

Download the latest TGZ package from AutoMQ Releases and extract it. Assume the extraction directory is $AUTOMQ_HOME, and tools from $AUTOMQ_HOME/bin will be used in this article to create topics and generate test data.

Prepare AutoMQ and test data

Refer to the AutoMQ Official Deployment Document to deploy a functional cluster, ensuring network connectivity between AutoMQ and Apache Doris.

Quickly create a topic named example_topic in AutoMQ and write a test JSON data to it following the steps below.

Create Topic

Use the Apache Kafka® command-line tool to create topics, ensuring you have access to a Kafka environment and the Kafka service is running. Here is an example command to create a topic:

bash

$AUTOMQ_HOME/bin/kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 127.0.0.1:9092  --partitions 1 --replication-factor 1

After creating the topic, you can use the following command to verify that the topic has been successfully created.

shell

$AUTOMQ_HOME/bin/kafka-topics.sh --describe example_topic --bootstrap-server 127.0.0.1:9092

Generate test data

Create JSON formatted test data corresponding to the previously mentioned table.

json

{
  "id": 1,
  "name": "test user",
  "timestamp": "2023-11-10T12:00:00",
  "status": "active"
}

Write test data

Use Kafka's command line tool or a programming method to write the test data into a topic named example_topic. Here is an example using the command line tool:

bash

echo '{"id": 1, "name": "test user", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic example_topic

To view the data just written into the topic, use the following command:

shell

sh $AUTOMQ_HOME/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic example_topic --from-beginning

Create a Routine Load job for importing data

In the Apache Doris command line, set up a Routine Load job that accepts JSON data to continuously import data from an AutoMQ Kafka topic. For details on the Routine Load parameters, please refer to Doris Routine Load.

sql

CREATE ROUTINE LOAD automq_example_load ON users
COLUMNS(id, name, timestamp, status)
PROPERTIES
(
    "format" = "json",
    "jsonpaths" = "[\"$.id\",\"$.name\",\"$.timestamp\",\"$.status\"]"
 )
FROM KAFKA
(
    "kafka_broker_list" = "127.0.0.1:9092",
    "kafka_topic" = "example_topic",
    "property.kafka_default_offsets" = "OFFSET_BEGINNING"
);

Validate data import

First, verify the status of the Routine Load job to ensure it is actively running.

shell

show routine load\G;

Then, query the relevant tables in the Apache Doris database to confirm that the data has been successfully imported.

sql

select * from users;
+------+--------------+---------------------+--------+
| id   | name         | timestamp           | status |
+------+--------------+---------------------+--------+
|    1 | testuser     | 2023-11-10T12:00:00 | active |
|    2 | testuser     | 2023-11-10T12:00:00 | active |
+------+--------------+---------------------+--------+
2 rows in set (0.01 sec)

Newsletter

Subscribe for the latest on cloud-native streaming data infrastructure, product launches, technical insights, and efficiency optimizations from the AutoMQ team.

Join developers worldwide who leverage AutoMQ's Apache 2.0 licensed platform to simplify streaming data infra. No spam, just actionable content.

I'm not a robot
reCAPTCHA

Never submit confidential or sensitive data (API keys, passwords, credit card numbers, or personal identification information) through this form.