Preface
With the growing demand for real-time data processing, enterprises need more efficient and flexible data integration solutions. AutoMQ [1], a cloud-optimized Kafka messaging system, emerges as an ideal choice for businesses due to its substantial cost benefits and elastic capabilities. By integrating AutoMQ with Airbyte [2] and data warehouses, the data integration process can be further simplified, and data analytics capabilities can be enhanced, enabling efficient real-time data flow and analysis. This empowers enterprises to make informed decisions quickly. This article will guide you through integrating these components.Overview of AutoMQ
AutoMQ is a cloud-optimized stream processing system that remains fully compatible with Apache Kafka. By offloading storage to object storage, AutoMQ considerably boosts system cost efficiency and flexibility. Specifically, AutoMQ is built on the stream storage repository S3Stream on S3 and offloads storage to shared cloud storage EBS and S3, providing low-cost, low-latency, highly available, highly reliable, and unlimited capacity stream storage capabilities. Compared to the traditional Shared Nothing architecture, AutoMQ employs a Shared Storage architecture, significantly reducing the complexity of storage and operations while enhancing system scalability and reliability. The design philosophy and technical benefits of AutoMQ make it an ideal option for replacing existing Kafka clusters in enterprises. By adopting AutoMQ, businesses can greatly reduce storage costs, streamline operations, and achieve automatic scaling and balancing of clusters, allowing for more efficient responses to changing business demands. In addition, AutoMQ effectively supports high-throughput cold read operations and zero-interruption service, ensuring stable system operation under varying loads. The storage architecture of AutoMQ is as follows:
Airbyte Overview
Airbyte is a data integration platform designed to simplify and automate the creation and management of data pipelines. It supports a wide variety of source and target systems, enabling users to easily configure data pipelines through a user-friendly web interface or API. Airbyte offers efficient Extract, Transform, Load (ETL) capabilities with built-in scheduling and monitoring mechanisms to ensure the reliability and performance of data pipelines. Its modular design supports custom connectors to meet diverse data integration demands. Airbyte’s major advantages include high scalability and flexibility, allowing users to swiftly adapt to various data sources and target systems. Built-in data normalization and automated scheduling functionalities enhance the efficiency and consistency of data processing. With containerized deployment, Airbyte streamlines installation and scaling, making it apt for enterprise-level data integration and data warehousing. Additionally, its comprehensive connector library and community support make it an excellent tool for data engineers and analysts to efficiently address complex data integration challenges.
Prerequisites
- Data Source: An available AutoMQ node.
- Data Connector: Available Airbyte Environment.
- Data Endpoint (Data Warehouse): In this example, I’ve selected a cloud-deployed Databricks [3] cluster.
Quick Deployment
Deploy AutoMQ
Deployment can be achieved by consulting the official AutoMQ documentation: Deploy Multi-Nodes Cluster on Linux▸. Once the setup is complete, data preparation can be done using either the Kafka SDK or manually, followed by the data synchronization process. I’ve prepared some data in advance, which can be observed using various visualization tools to monitor AutoMQ node status, such as Redpanda Console [5], Kafdrop [6], and others. Here, I’ve chosen Redpanda Console, where you can see that there are currently 50 topics, each containing 1000 initial messages.
Deploying Airbyte
Refer to the official Airbyte documentation: Quickstart | Airbyte [7]Here, I will use the example of deploying Airbyte on a Linux system.
Environment Preparation
First, you need to installabctl
, an official setup tool provided by Airbyte that facilitates quick setup of the required Airbyte environment. Note that this tool requires a Docker environment. If you don’t have Docker installed, please refer to Docker’s installation instructions: Docker Install [8]. You can check your Docker version by running the command docker version
:
Preparing the Abctl Tool
To get started with abctl, execute the following commands sequentially. Here, I’m downloading versionversion: v0.9.2
:
Deploying the Airbyte Environment
By executing the commandabctl local install
, this will pull Airbyte’s images in Docker and deploy the environment using Helm. Some of the logs are as follows:
http://localhost:8000
with the default credentials:
- Username:
airbyte
yaml - Password:
password
zhaoxi
and ktpro123
respectively, you can run the following command:

Deploying Databricks
If you do not yet have a Databricks service available, please refer to the official documentation for setup: Google Databricks[9].Data Synchronization
Add New Data Source
Add AutoMQ as a data source. Thanks to AutoMQ’s full compatibility with Kafka, you can set up an AutoMQ data source using Kafka’s data source template. Navigate via the Airbyte interface’s left sidebar -> Sources -> search Kafka, then fill in basic information such as Bootstrap Servers, Protocol, Topic Pattern, etc.
Topic-.*
to match all topics with the prefix Topic-
. This aligns with the format of my prepared data, so you need to ensure your data can be matched as well. After successful addition, we can see the following results, proving that the data source connection was successful:

Add Data Destination
We have chosen Databricks as our data destination, although you can select other options if you wish. For a complete list of supported destinations, please visit: Destinations | Airbyte [10]. In the Airbyte interface, go to the sidebar -> Destinations -> Search for Databricks:
- Go to the created Databricks Cluster -> Select Advanced Options -> JDBC/ODBC, and you will find the values for HTTP PATH and Server Hostname.

- In the top right corner of the cluster, select the user -> go to Settings -> choose User -> Developer -> AccessToken -> Generate new Token. You will receive a Token similar to
dapi8d336faXXXXXXXXXa6aa18a086c0e
.

Initiate Connection and Transfer Data
With both the data source and data endpoint ready, we can now establish a connection. Select Airbyte’s left sidebar -> Connections -> choose the data source and data endpoint -> establish connection. After successfully connecting, you need to select the mode of data transmission. Here, both incremental sync and full sync options are provided. I opted for the full sync mode:




Verification Results
After successfully transferring the data, we can access the Databricks cluster to review the transfer results: