AutoMQ on MinIO: Cost-Effective Kafka Cluster on Self-Hosted S3-Compatible Object Storage

Introduction

MinIO is a high-performance, distributed object storage system designed for operation on standard hardware, delivering an impressive cost-performance ratio and wide applicability. Ideal for high-performance private clouds, its simple yet effective architecture ensures superior performance while providing extensive object storage capabilities. Suitable for traditional applications such as secondary storage, disaster recovery, and archiving, as well as emerging areas like machine learning, big data, private cloud, and hybrid cloud scenarios, MinIO showcases its strong adaptability and excellence.

Leveraging MinIO's full compatibility with the S3 API, you can establish an AutoMQ cluster in your private data center to create a streaming system fully compatible with Kafka but offering better cost-efficiency, ultimate scalability, and single-digit millisecond latency. This article will guide you on deploying an AutoMQ cluster in your private data center's MinIO.

Prerequisites

A functioning MinIO environment. If you have not yet set up MinIO, follow the official website guidance for installation.

Prepare five hosts for deploying the AutoMQ cluster. We recommend selecting Linux amd64 hosts equipped with 2 cores and 16GB of RAM, and preparing two virtual storage volumes. Here's an example:

Role	IP	Node ID	System volume	Data volume
CONTROLLER	192.168.0.1	0	EBS 20GB	EBS 20GB
CONTROLLER	192.168.0.2	1	EBS 20GB	EBS 20GB
CONTROLLER	192.168.0.3	2	EBS 20GB	EBS 20GB
BROKER	192.168.0.4	3	EBS 20GB	EBS 20GB
BROKER	192.168.0.5	4	EBS 20GB	EBS 20GB

Tips:
Ensure these machines are located within the same subnet and have the capability to communicate with each other.
In non-production settings, it's possible to deploy only one Controller, which will also act as a Broker by default.

Download the latest official binary installation package from AutoMQ Github Releases to install AutoMQ.
Create two custom-named object storage buckets on MinIO, called automq-data and automq-ops.
1. Configure the AWS CLI with the necessary Access Key and Secret Key by setting environment variables.
```
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minio-secret-key-CHANGE-ME
```
1. Create an S3 bucket using the AWS CLI.
```
aws s3api create-bucket --bucket automq-data --endpoint=http://10.1.0.240:9000
```

Install and initiate the AutoMQ cluster.

Step 1: Generate an S3 URL.

AutoMQ includes the automq-kafka-admin.sh tool, which facilitates the rapid startup of AutoMQ. Just provide an S3 URL with the required endpoint and authentication details to launch AutoMQ with a single click, eliminating the need for manual cluster ID creation or storage formatting.

bin/automq-kafka-admin.sh generate-s3-url \ 
--s3-access-key=xxx  \ 
--s3-secret-key=yyy \ 
--s3-region=cn-northwest-1  \ 
--s3-endpoint=s3.cn-northwest-1.amazonaws.com.cn \ 
--s3-data-bucket=automq-data \ 
--s3-ops-bucket=automq-ops

When employing MinIO, use the following configuration to create a specific S3 URL.

Parameter Name	Default value	Description
--s3-access-key	minioadmin	Environment variable MINIO_ROOT_USER
--s3-secret-key	minio-secret-key-CHANGE-ME	Environment variable MINIO_ROOT_PASSWORD
--s3-region	us-west-2	This parameter has no effect in MinIO and can be assigned any value, such as us-west-2.
--s3-endpoint	http://10.1.0.240:9000	The endpoint can be retrieved by executing the command `sudo systemctl status minio.service`.
--s3-data-bucket	automq-data	-
--s3-ops-bucket	automq-ops	-

Output result

Once the command is executed, the process will automatically move through the following stages:

Discover the core features of S3 by supplying an accessKey and secretKey to test the compatibility between AutoMQ and S3.
Generate an s3url using credential and endpoint details.
Fetch the startup command for AutoMQ using the s3url. In the command, replace `--controller-list` and `--broker-list` with the actual CONTROLLER and BROKER required for deployment.

Here are the outcomes:

############  Ping s3 ########################

[ OK ] Write s3 object
[ OK ] Read s3 object
[ OK ] Delete s3 object
[ OK ] Write s3 object
[ OK ] Upload s3 multipart object
[ OK ] Read s3 multipart object
[ OK ] Delete s3 object
############  String of s3url ################

Your s3url is:

s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=xxx&s3-secret-key=yyy&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA


############  Usage of s3url  ################
To start AutoMQ, generate the start commandline using s3url.
bin/automq-kafka-admin.sh generate-start-command \
--s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" \
--controller-list="192.168.0.1:9093;192.168.0.2:9093;192.168.0.3:9093"  \
--broker-list="192.168.0.4:9092;192.168.0.5:9092"

TIPS: Please replace the controller-list and broker-list with your actual IP addresses.

Step 2: Create a list of startup commands

Update the --controller-list and --broker-list parameters in the commands generated from the previous step with your host details, specifically substituting them with the IP addresses of the 3 CONTROLLERS and 2 BROKERS outlined during the preparation phase, using the default ports 9092 and 9093.

bin/automq-kafka-admin.sh generate-start-command \
--s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" \
--controller-list="192.168.0.1:9093;192.168.0.2:9093;192.168.0.3:9093"  \
--broker-list="192.168.0.4:9092;192.168.0.5:9092"

Parameter Description

Parameter Name	Required	Description
--s3-url	is	Created using the command line utility bin/automq-kafka-admin.sh generate-s3-url, which incorporates authentication, cluster ID, and additional parameters
--controller-list	is	At least one address is required, serving as the IP and port list for the CONTROLLER host. The format should be IP1:PORT1; IP2:PORT2; IP3:PORT3
--broker-list	is	At least one address is required, serving as the IP and port list for the BROKER host. The format should be IP1:PORT1; IP2:PORT2; IP3:PORT3
--controller-only-mode	No	Determine if the CONTROLLER node is solely dedicated to the CONTROLLER role. By default, this setting is false, indicating that the deployed CONTROLLER node simultaneously acts as a BROKER role.

Output result

After running the command, it will produce the necessary commands to initiate AutoMQ.

############  Start Commandline ##############
To start an AutoMQ Kafka server, please navigate to the directory where your AutoMQ tgz file is located and run the following command.

Before running the command, make sure that Java 17 is installed on your host. You can verify the Java version by executing 'java -version'.

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=0 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.1:9092,CONTROLLER://192.168.0.1:9093 --override advertised.listeners=PLAINTEXT://192.168.0.1:9092

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=1 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.2:9092,CONTROLLER://192.168.0.2:9093 --override advertised.listeners=PLAINTEXT://192.168.0.2:9092

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=2 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.3:9092,CONTROLLER://192.168.0.3:9093 --override advertised.listeners=PLAINTEXT://192.168.0.3:9092

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker --override node.id=3 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.4:9092 --override advertised.listeners=PLAINTEXT://192.168.0.4:9092

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker --override node.id=4 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.5:9092 --override advertised.listeners=PLAINTEXT://192.168.0.5:9092

TIPS: Start controllers first and then the brokers.

By default, the node.id automatically starts at 0.

Step 3: Start AutoMQ

To initiate the cluster, sequentially execute the series of commands from the previous step on the designated CONTROLLER or BROKER host. For instance, to launch the first CONTAINER process at 192.168.0.1, run the first command from the generated startup command list.

bin/kafka-server-start.sh --s3-url="s3://s3.cn-northwest-1.amazonaws.com.cn?s3-access-key=XXX&s3-secret-key=YYY&s3-region=cn-northwest-1&s3-endpoint-protocol=https&s3-data-bucket=automq-data&s3-path-style=false&s3-ops-bucket=automq-ops&cluster-id=40ErA_nGQ_qNPDz0uodTEA" --override process.roles=broker,controller --override node.id=0 --override controller.quorum.voters=0@192.168.0.1:9093,1@192.168.0.2:9093,2@192.168.0.3:9093 --override listeners=PLAINTEXT://192.168.0.1:9092,CONTROLLER://192.168.0.1:9093 --override advertised.listeners=PLAINTEXT://192.168.0.1:9092

Parameter Description

When using the startup command, unspecified parameters will automatically use the default configuration of Apache Kafka®. For new parameters introduced by AutoMQ, AutoMQ's default values will be applied. To modify these defaults, append --override key=value parameters to the end of the command.

Parameter Name	Mandatory	Instructions
s3-url	Yes	Generated by the bin/automq-kafka-admin.sh generate-s3-url command-line tool, which includes information such as identity authentication and cluster ID.
process.roles	Yes	The options are CONTROLLER or BROKER. If a host serves as both CONTROLLER and BROKER, the configuration value should be CONTROLLER, BROKER.
node.id	Yes	An integer used to uniquely identify the BROKER or CONTROLLER within the Kafka cluster, which must remain unique within the cluster.
controller.quorum.voters	Yes	The host information participating in the KRAFT election, includes nodeid, IP and port information, for example: 0@192.168.0.1:9093, 1@192.168.0.2:9093, 2@192.168.0.3:9093.
listeners	Yes	Listening IP and Port
advertised.listeners	Yes	The BROKER provides the access address for the Client.
log.dirs	No	Directory for storing KRAFT and BROKER metadata.
s3.wal.path	No	In a production environment, it is recommended to store AutoMQ WAL data on a separately mounted new raw device volume. This can yield better performance, as AutoMQ supports writing data to raw devices, thereby reducing latency. Please ensure to configure the correct path to store WAL data.
autobalancer.controller.enable	No	The default value is false, traffic rebalancing is not enabled. Once traffic rebalancing is automatically turned on, the auto balancer component of AutoMQ will automatically migrate partitions to ensure overall traffic balance.

Tips: If you need to enable continuous traffic rebalancing or run Example: Self-Balancing When Cluster Nodes Change, it is recommended to explicitly specify the parameter --override autobalancer.controller.enable=true when starting the Controller.

Running in the Background

To operate in background mode, append the following snippet at the end of your command:

command > /dev/null 2>&1 &

Data volume path

Use the Linux `lsblk` command to check local data volumes; unpartitioned block devices qualify as data volumes. Here, vdb represents an unpartitioned raw block device.

vda    253:0    0   20G  0 disk
├─vda1 253:1    0    2M  0 part
├─vda2 253:2    0  200M  0 part /boot/efi
└─vda3 253:3    0 19.8G  0 part /
vdb    253:16   0   20G  0 disk

By default, AutoMQ stores metadata and WAL data in the /tmp directory. It's crucial to recognize that if the /tmp directory is mounted on tmpfs, it is unsuitable for production environments.

For optimal performance in production or formal testing settings, adjust the settings as follows: redirect the metadata directory to `log.dirs` and the WAL data directory to `s3.wal.path` (applicable for raw write-data devices) to alternate locations.

bin/kafka-server-start.sh ...\
--override  s3.telemetry.metrics.exporter.type=prometheus \
--override  s3.metrics.exporter.prom.host=0.0.0.0 \
--override  s3.metrics.exporter.prom.port=9090 \
--override  log.dirs=/root/kraft-logs \
--override  s3.wal.path=/dev/vdb \
> /dev/null 2>&1 &

Tips:
Please change s3.wal.path to the actual local raw device name. To set up AutoMQ's Write-Ahead-Log (WAL) on local SSD storage, you need to ensure that the specified file path is on an SSD disk with more than 10GB of available space. For instance, --override s3.wal.path=/home/admin/automq-wal.
When deploying AutoMQ in a private data center for production environments, ensure the reliability of the local SSD. For example, you can use RAID technology.

Thus, you have successfully set up an AutoMQ cluster using MinIO, creating an economical, low-latency, and nearly instantaneous elastic Kafka cluster. For additional insights into AutoMQ features like near-instantaneous partition reassignment and self-balancing, please consult the official example.

Introduction​

Prerequisites​

Install and initiate the AutoMQ cluster.​

Step 1: Generate an S3 URL.​

Output result​

Step 2: Create a list of startup commands​

Parameter Description​

Output result​

Step 3: Start AutoMQ​

Parameter Description​

Running in the Background​

Data volume path​

Introduction

Prerequisites

Install and initiate the AutoMQ cluster.

Step 1: Generate an S3 URL.

Output result

Step 2: Create a list of startup commands

Parameter Description

Output result

Step 3: Start AutoMQ

Parameter Description

Running in the Background

Data volume path