Example: Self-Balancing when Cluster Nodes Change
The document introduces the use of Kafka CLI tools to verify support for automatic partition reassignment and data balancing during AutoMQ cluster scaling operations. The Kafka CLI tools run via a Docker image provided by AutoMQ.
-
Create a topic with 16 partitions and send a balanced load.
-
When starting and stopping brokers, observe whether partitions automatically reassign between different brokers.
This automatic data balancing is an inherent feature of AutoMQ, ensuring that data is automatically and evenly distributed across the cluster. By monitoring the distribution of partitions and the load status of brokers, you can verify whether automatic partition balancing is working as expected.
Prerequisites
Before performing the automatic partition data rebalancing test, ensure the following conditions are met:
Complete the installation and deployment of the AutoMQ cluster. You can refer to the following methods to install and deploy AutoMQ:
If you deploy the cluster using Deploy Multi-Nodes Cluster on Linux▸ or Deploy Multi-Nodes Cluster on Kubernetes▸, ensure that when starting the Controller, set autobalancer.controller.enable to true to enable automatic data rebalancing.
Additionally, the host running the test program must meet the following conditions:
-
Linux/Mac/Windows Subsystem for Linux
-
Docker
If downloading container images is slow, please refer to Docker Hub Mirror Configuration▸.
- Deploy Multi-Nodes Test Cluster on Docker
- Deploy Multi-Nodes Cluster on Linux
If the previous AutoMQ cluster was deployed according to Deploy Multi-Nodes Test Cluster on Docker▸, the obtained cluster bootstrap address will be similar to "server1:9092,server2:9092,server3:9092," and the AutoMQ cluster will be located under the "automq_net" Docker network.
Please replace the bootstrap-server address below with the actual cluster address based on your deployment configuration.
Create Topic
docker run --network automq_net automqinc/automq:latest /bin/bash -c "/opt/kafka/kafka/bin/kafka-topics.sh --partitions 16 --create --topic self-balancing-topic --bootstrap-server server1:9092,server2:9092,server3:9092"
View Partition Distribution
docker run --network automq_net automqinc/automq:latest /bin/bash -c "/opt/kafka/kafka/bin/kafka-topics.sh --topic self-balancing-topic --describe --bootstrap-server server1:9092,server2:9092,server3:9092"
Topic: self-balancing-topic TopicId: AjoAB22YRRq7w6MdtZ4hDA PartitionCount: 16 ReplicationFactor: 1 Configs: min.insync.replicas=1,elasticstream.replication.factor=1,segment.bytes=1073741824
Topic: self-balancing-topic Partition: 0 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 1 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 2 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 3 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 4 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 5 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 6 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 7 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 8 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 9 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 10 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 11 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 12 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 13 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 14 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 15 Leader: 2 Replicas: 2 Isr: 2
Start Producer
docker run --network automq_net automqinc/automq:latest /bin/bash -c "/opt/kafka/kafka/bin/kafka-producer-perf-test.sh --topic self-balancing-topic --num-records=1024000 --throughput 5120 --record-size 1024 --producer-props bootstrap.servers=server1:9092,server2:9092,server3:9092 linger.ms=100 batch.size=524288 buffer.memory=134217728 max.request.size=67108864"
Start the Consumer
docker run --network automq_net automqinc/automq:1.0.4 /bin/bash -c "/opt/kafka/kafka/bin/kafka-consumer-perf-test.sh --topic self-balancing-topic --show-detailed-stats --timeout 300000 --messages=1024000 --reporting-interval 1000 --bootstrap-server=server1:9092,server2:9092,server3:9092"
Stop the Broker
Stop a server to reassign its partitions to other nodes. After stopping, observe the recovery status of producers and consumers.
docker stop automq-server3
After stopping, you can see the following logs from the producer:
[2024-04-29 05:00:03,436] WARN [Producer clientId=perf-producer-client] Got error produce response with correlation id 49732 on topic-partition self-balancing-topic-7, retrying (2147483641 attempts left). Error: NOT_LEADER_OR_FOLLOWER (org.apache.kafka.clients.producer.internals.Sender)
[2024-04-29 05:00:03,438] WARN [Producer clientId=perf-producer-client] Received invalid metadata error in produce request on partition self-balancing-topic-7 due to org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic partition.. Going to request metadata update now (org.apache.kafka.clients.producer.internals.Sender)
After waiting for several seconds, you can see that production and consumption have returned to normal.
2024-05-07 11:56:08,920] WARN [Producer clientId=perf-producer-client] Received invalid metadata error in produce request on partition self-balancing-topic-3 due to org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic partition.. Going to request metadata update now (org.apache.kafka.clients.producer.internals.Sender)
[2024-05-07 11:56:08,920] WARN [Producer clientId=perf-producer-client] Got error produce response with correlation id 42141 on topic-partition self-balancing-topic-3, retrying (2147483646 attempts left). Error: NOT_LEADER_OR_FOLLOWER (org.apache.kafka.clients.producer.internals.Sender)
[2024-05-07 11:56:08,920] WARN [Producer clientId=perf-producer-client] Received invalid metadata error in produce request on partition self-balancing-topic-3 due to org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic partition.. Going to request metadata update now (org.apache.kafka.clients.producer.internals.Sender)
[2024-05-07 11:56:08,588] 25693 records sent, 5138.6 records/sec (5.02 MB/sec), 8.9 ms avg latency, 1246.0 ms max latency.
[2024-05-07 11:56:13,589] 25607 records sent, 5120.4 records/sec (5.00 MB/sec), 1.8 ms avg latency, 44.0 ms max latency.
[2024-05-07 11:56:18,591] 25621 records sent, 5121.1 records/sec (5.00 MB/sec), 1.6 ms avg latency, 10.0 ms max latency.
Re-examining Partition Distribution
After the producer resumes writing, we re-examine the partition distribution and observe that all partitions are now located on broker1. AutoMQ quickly and efficiently completes the reassignment of partitions from the stopped node and rebalances the traffic.
docker run --network automq_net automqinc/automq:latest /bin/bash -c "/opt/kafka/kafka/bin/kafka-topics.sh --topic self-balancing-topic --describe --bootstrap-server server1:9092,server2:9092,server3:9092"
Topic: self-balancing-topic TopicId: AjoAB22YRRq7w6MdtZ4hDA PartitionCount: 16 ReplicationFactor: 1 Configs: min.insync.replicas=1,elasticstream.replication.factor=1,segment.bytes=1073741824
Topic: self-balancing-topic Partition: 0 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 1 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 2 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 3 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 4 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 5 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 6 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 7 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 8 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 9 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 10 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 11 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 12 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 13 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 14 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 15 Leader: 1 Replicas: 1 Isr: 1
Restart the Broker
Restarting automq-server3 triggers the automatic reassignment of partitions. After a few seconds of retries, the producer and consumer can resume operations.
docker start automq-server3
At this point, we confirm the partition distribution again and find that the partitions have successfully undergone automatic reassignment.
Topic: self-balancing-topic TopicId: AjoAB22YRRq7w6MdtZ4hDA PartitionCount: 16 ReplicationFactor: 1 Configs: min.insync.replicas=1,elasticstream.replication.factor=1,segment.bytes=1073741824
Topic: self-balancing-topic Partition: 0 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 1 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 2 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 3 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 4 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 5 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 6 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 7 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 8 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 9 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 10 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 11 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 12 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 13 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 14 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 15 Leader: 1 Replicas: 1 Isr: 1
If the previous AutoMQ cluster was set up according to Deploy Multi-Nodes Cluster on Linux▸, the resulting cluster Bootstrap address would look like "192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092".
Please substitute the bootstrap-server address below with the real address from your deployed cluster configuration.
Create Topic
docker run automqinc/automq:latest /bin/bash -c "/opt/kafka/kafka/bin/kafka-topics.sh --partitions 16 --create --topic self-balancing-topic --bootstrap-server 192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092"
Check Partition Distribution
docker run automqinc/automq:latest /bin/bash -c "/opt/kafka/kafka/bin/kafka-topics.sh --topic self-balancing-topic --describe --bootstrap-server 192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092"
Topic: self-balancing-topic TopicId: 7mYxhYliSMq-FCrl4pcD5Q PartitionCount: 16 ReplicationFactor: 1 Configs: min.insync.replicas=1,segment.bytes=1073741824
Topic: self-balancing-topic Partition: 0 Leader: 4 Replicas: 4 Isr: 4
Topic: self-balancing-topic Partition: 1 Leader: 0 Replicas: 0 Isr: 0
Topic: self-balancing-topic Partition: 2 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 3 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 4 Leader: 3 Replicas: 3 Isr: 3
Topic: self-balancing-topic Partition: 5 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 6 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 7 Leader: 3 Replicas: 3 Isr: 3
Topic: self-balancing-topic Partition: 8 Leader: 0 Replicas: 0 Isr: 0
Topic: self-balancing-topic Partition: 9 Leader: 4 Replicas: 4 Isr: 4
Topic: self-balancing-topic Partition: 10 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 11 Leader: 3 Replicas: 3 Isr: 3
Topic: self-balancing-topic Partition: 12 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 13 Leader: 4 Replicas: 4 Isr: 4
Topic: self-balancing-topic Partition: 14 Leader: 0 Replicas: 0 Isr: 0
Topic: self-balancing-topic Partition: 15 Leader: 3 Replicas: 3 Isr: 3
Start Producer
docker run automqinc/automq:latest /bin/bash -c "/opt/kafka/kafka/bin/kafka-producer-perf-test.sh --topic self-balancing-topic --num-records=1024000 --throughput 5120 --record-size 1024 --producer-props bootstrap.servers=192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092 linger.ms=100 batch.size=524288 buffer.memory=134217728 max.request.size=67108864"
Start the Consumer
docker run automqinc/automq:1.0.4 /bin/bash -c "/opt/kafka/kafka/bin/kafka-consumer-perf-test.sh --topic self-balancing-topic --show-detailed-stats --timeout 300000 --messages=1024000 --reporting-interval 1000 --bootstrap-server=192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092"
Stop the Broker
Stop a broker so that its partitions are reassigned to other brokers. After stopping, you can observe the recovery of producers and consumers.
jps | grep Kafka | awk '{print $1}' | xargs kill
When stopping the broker, the producer logs the following:
[2024-04-29 05:00:03,436] WARN [Producer clientId=perf-producer-client] Got error produce response with correlation id 49732 on topic-partition self-balancing-topic-7, retrying (2147483641 attempts left). Error: NOT_LEADER_OR_FOLLOWER (org.apache.kafka.clients.producer.internals.Sender)
[2024-04-29 05:00:03,438] WARN [Producer clientId=perf-producer-client] Received invalid metadata error in produce request on partition self-balancing-topic-7 due to org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic partition.. Going to request metadata update now (org.apache.kafka.clients.producer.internals.Sender)
After waiting for several seconds, you can see that production and consumption return to normal.
[2024-05-07 10:59:16,474] 25616 records sent, 5123.2 records/sec (5.00 MB/sec), 1.7 ms avg latency, 16.0 ms max latency.
[2024-05-07 10:59:26,238] WARN [Producer clientId=perf-producer-client] Got error produce response with correlation id 121226 on topic-partition self-balancing-topic-4, retrying (2147483646 attempts left). Error: NOT_LEADER_OR_FOLLOWER (org.apache.kafka.clients.producer.internals.Sender)
[2024-05-07 10:59:26,240] WARN [Producer clientId=perf-producer-client] Received invalid metadata error in produce request on partition self-balancing-topic-4 due to org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic partition.. Going to request metadata update now (org.apache.kafka.clients.producer.internals.Sender)
[2024-05-07 10:59:26,241] WARN [Producer clientId=perf-producer-client] Got error produce response with correlation id 121227 on topic-partition self-balancing-topic-4, retrying (2147483646 attempts left). Error: NOT_LEADER_OR_FOLLOWER (org.apache.kafka.clients.producer.internals.Sender)
[2024-05-07 10:59:26,241] WARN [Producer clientId=perf-producer-client] Received invalid metadata error in produce request on partition self-balancing-topic-4 due to org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic partition.. Going to request metadata update now (org.apache.kafka.clients.producer.internals.Sender)
[2024-05-07 10:59:21,477] 25599 records sent, 5114.7 records/sec (4.99 MB/sec), 1.8 ms avg latency, 19.0 ms max latency.
[2024-05-07 10:59:26,486] 25667 records sent, 5132.4 records/sec (5.01 MB/sec), 4.8 ms avg latency, 2284.0 ms max latency.
Review Partition Distribution Again
After the producer resumes writing, we review the partition distribution and observe that all partitions are concentrated on broker1. AutoMQ swiftly and automatically redistributes the partitions from the broker that was stopped and rebalances the traffic.
docker run automqinc/automq:latest /bin/bash -c "/opt/kafka/kafka/bin/kafka-topics.sh --topic self-balancing-topic --describe --bootstrap-server 192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092"
Topic: self-balancing-topic TopicId: 7mYxhYliSMq-FCrl4pcD5Q PartitionCount: 16 ReplicationFactor: 1 Configs: min.insync.replicas=1,segment.bytes=1073741824
Topic: self-balancing-topic Partition: 0 Leader: 4 Replicas: 4 Isr: 4
Topic: self-balancing-topic Partition: 1 Leader: 0 Replicas: 0 Isr: 0
Topic: self-balancing-topic Partition: 2 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 3 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 4 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 5 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 6 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 7 Leader: 0 Replicas: 0 Isr: 0
Topic: self-balancing-topic Partition: 8 Leader: 0 Replicas: 0 Isr: 0
Topic: self-balancing-topic Partition: 9 Leader: 4 Replicas: 4 Isr: 4
Topic: self-balancing-topic Partition: 10 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 11 Leader: 4 Replicas: 4 Isr: 4
Topic: self-balancing-topic Partition: 12 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 13 Leader: 4 Replicas: 4 Isr: 4
Topic: self-balancing-topic Partition: 14 Leader: 0 Replicas: 0 Isr: 0
Topic: self-balancing-topic Partition: 15 Leader: 2 Replicas: 2 Isr: 2
Restarting the Broker
When broker2 is restarted, it triggers the automatic reassignment of partitions. After several seconds of retrying, producers and consumers can resume operations.
For details on starting broker2, refer to Deploy Multi-Nodes Cluster on Linux▸
At this point, confirming the partition distribution shows that the partitions have already undergone automatic reassignment.
Topic: self-balancing-topic TopicId: 7mYxhYliSMq-FCrl4pcD5Q PartitionCount: 16 ReplicationFactor: 1 Configs: min.insync.replicas=1,segment.bytes=1073741824
Topic: self-balancing-topic Partition: 0 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 1 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 2 Leader: 3 Replicas: 3 Isr: 3
Topic: self-balancing-topic Partition: 3 Leader: 4 Replicas: 4 Isr: 4
Topic: self-balancing-topic Partition: 4 Leader: 0 Replicas: 0 Isr: 0
Topic: self-balancing-topic Partition: 5 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 6 Leader: 4 Replicas: 4 Isr: 4
Topic: self-balancing-topic Partition: 7 Leader: 3 Replicas: 3 Isr: 3
Topic: self-balancing-topic Partition: 8 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 9 Leader: 0 Replicas: 0 Isr: 0
Topic: self-balancing-topic Partition: 10 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 11 Leader: 4 Replicas: 4 Isr: 4
Topic: self-balancing-topic Partition: 12 Leader: 3 Replicas: 3 Isr: 3
Topic: self-balancing-topic Partition: 13 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 14 Leader: 0 Replicas: 0 Isr: 0
Topic: self-balancing-topic Partition: 15 Leader: 1 Replicas: 1 Isr: 1