AutoMQ x StarRocks: Powering Real-Time Analytics

June 1, 2024
AutoMQ Team
4 min read
AutoMQ x StarRocks: Powering Real-Time Analytics

StarRocks is a cutting-edge analytical data warehouse that leverages technologies such as vectorization, MPP architecture, CBO, intelligent materialized views, and a columnar storage engine with capabilities for real-time updates. It supports multidimensional, real-time, and high-concurrency data analysis.

This article explores how to use StarRocks Routine Load for importing data from AutoMQ into StarRocks. To dive deeper into the essentials of Routine Load, refer to the Routine Load Fundamentals documentation.

Environment Preparation

Set up StarRocks and prepare the test data.

Ensure a StarRocks cluster is readily available. For demonstration purposes, we refer to Using Docker to Deploy StarRocks to install a demo cluster on a Linux machine.

Create a test table for the database and primary key models:

sql
create database automq_db; create table users ( id bigint NOT NULL, name string NOT NULL, timestamp string NULL, status string NULL ) PRIMARY KEY (id) DISTRIBUTED BY HASH(id) PROPERTIES ( "replication_num" = "1", "enable_persistent_index" = "true" );

Prepare AutoMQ and test data

Refer to Stand-alone Deployment to deploy AutoMQ, ensuring network connectivity between AutoMQ and StarRocks.

Quickly create a topic named example_topic in AutoMQ, and write a test JSON data into it by following these steps.

Create Topic

Use the Apache Kafka command line tool to create a topic, ensuring that you have access to a Kafka environment and that the Kafka service is operational. Here is an example command to create a topic:

bash
./kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 10.0.96.4:9092 --partitions 1 --replication-factor 1

After creating the topic, you can use the following command to check if the topic has been successfully created.

bash
./kafka-topics.sh --describe example_topic --bootstrap-server 10.0.96.4:9092

Generate test data

Produce JSON formatted test data, which should correspond to the previously mentioned table.

json
{ "id": 1, "name": "testuser", "timestamp": "2023-11-10T12:00:00", "status": "active" }

Write test data

Use Kafka's command-line tools or a programmatic method to write the test data to a topic named 'example_topic'. Here's how you can do it using the command-line tool:

bash
echo '{"id": 1, "name": "testuser", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 10.0.96.4:9092 --topic example_topic

To view the data that was just written to the topic, run the following command:

bash
sh kafka-console-consumer.sh --bootstrap-server 10.0.96.4:9092 --topic example_topic --from-beginning

Creating a Routine Load import job

In the StarRocks command line, create a Routine Load job to continuously import data from the AutoMQ Kafka topic.

sql
CREATE ROUTINE LOAD automq_example_load ON users COLUMNS(id, name, timestamp, status) PROPERTIES ( "desired_concurrent_number" = "5", "format" = "json", "jsonpaths" = "[\"$.id\",\"$.name\",\"$.timestamp\",\"$.status\"]" ) FROM KAFKA ( "kafka_broker_list" = "10.0.96.4:9092", "kafka_topic" = "example_topic", "kafka_partitions" = "0", "property.kafka_default_offsets" = "OFFSET_BEGINNING" );

Parameter Description

Data Format

Specify the data format as JSON by setting "format" = "json" in the PROPERTIES clause.

Data Extraction and Transformation

To define the mapping and transformation relationship between the source data and the target table columns, you can use the COLUMNS and jsonpaths parameters. The COLUMNS parameter maps column names to the target table's column names, with the order of columns mirroring that of the source data. The jsonpaths parameter extracts the necessary field data from JSON data, akin to creating new CSV data. The COLUMNS parameter then provisionally names columns based on the sequence of fields specified in jsonpaths. For additional information on data transformation, please refer to the Data Transformation During Import Documentation.

Validate Data Import

Initially, verify the status of the Routine Load import job to confirm that the task is active.

sql
show routine load\G;

Next, examine the related tables in the StarRocks database to verify that the data has been successfully imported.

sql
StarRocks > select * from users; +------+--------------+---------------------+--------+ | id | name | timestamp | status | +------+--------------+---------------------+--------+ | 1 | testuser | 2023-11-10T12:00:00 | active | | 2 | testuser | 2023-11-10T12:00:00 | active | +------+--------------+---------------------+--------+ 2 rows in set (0.01 sec)

Newsletter

Subscribe for the latest on cloud-native streaming data infrastructure, product launches, technical insights, and efficiency optimizations from the AutoMQ team.

Join developers worldwide who leverage AutoMQ's Apache 2.0 licensed platform to simplify streaming data infra. No spam, just actionable content.

I'm not a robot
reCAPTCHA

Never submit confidential or sensitive data (API keys, passwords, credit card numbers, or personal identification information) through this form.