Skip to main content

示例:集群节点变化触发分区自平衡

本文档介绍使用 Kafka CLI 工具验证 AutoMQ 集群扩缩容期间支持自动分区迁移和数据均衡。其中 Kafka CLI 工具通过 AutoMQ 提供的 docker 镜像运行。

  1. 通过创建一个具有 16 个分区的 Topic,并发送均衡的负载。

  2. 启停 Broker 时,观察分区是否会在不同的 Broker 之间自动迁移。

这种自动数据平衡是 AutoMQ 的内置功能,可以确保数据在集群中自动平衡分布。通过监控分区的分布情况和 Broker 的负载情况,可以验证分区自动平衡是否按预期工作。

前置条件

进行分区数据自动重平衡测试前,需要满足如下条件:

完成 AutoMQ 集群的安装部署 ,您可以参考以下方式安装部署 AutoMQ:

tip

如果通过Linux 主机部署多节点集群▸ 或者Kubernetes 部署多节点集群▸ 部署集群,需要确保启动 Controller 时设置 autobalancer.controller.enable 为 true 才能开启数据自动重平衡。

此外,运行测试程序的主机 需要满足如下条件:

  • Linux/Mac/Windows Subsystem for Linux

  • Docker

info

如果下载容器镜像速度慢,请参照 Docker Hub 镜像加速▸

如果此前的 AutoMQ 集群参考Docker 部署多节点测试集群▸ 部署,则获取的集群 Bootstrap 地址是类似 “server1:9092,server2:9092,server3:9092 ”,且 AutoMQ 集群位于“automq_net ” Docker 网络下。

tip

请根据部署的实际配置,将下方的 bootstrap-server 地址更换成实际集群的地址。

创建 Topic


docker run --network automq_net automqinc/automq:latest /bin/bash -c "/opt/kafka/kafka/bin/kafka-topics.sh --partitions 16 --create --topic self-balancing-topic --bootstrap-server server1:9092,server2:9092,server3:9092"

查看分区分布


docker run --network automq_net automqinc/automq:latest /bin/bash -c "/opt/kafka/kafka/bin/kafka-topics.sh --topic self-balancing-topic --describe --bootstrap-server server1:9092,server2:9092,server3:9092"


Topic: self-balancing-topic TopicId: AjoAB22YRRq7w6MdtZ4hDA PartitionCount: 16 ReplicationFactor: 1 Configs: min.insync.replicas=1,elasticstream.replication.factor=1,segment.bytes=1073741824
Topic: self-balancing-topic Partition: 0 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 1 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 2 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 3 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 4 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 5 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 6 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 7 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 8 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 9 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 10 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 11 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 12 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 13 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 14 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 15 Leader: 2 Replicas: 2 Isr: 2

启动生产者


docker run --network automq_net automqinc/automq:latest /bin/bash -c "/opt/kafka/kafka/bin/kafka-producer-perf-test.sh --topic self-balancing-topic --num-records=1024000 --throughput 5120 --record-size 1024 --producer-props bootstrap.servers=server1:9092,server2:9092,server3:9092 linger.ms=100 batch.size=524288 buffer.memory=134217728 max.request.size=67108864"

启动消费者


docker run --network automq_net automqinc/automq:1.0.4 /bin/bash -c "/opt/kafka/kafka/bin/kafka-consumer-perf-test.sh --topic self-balancing-topic --show-detailed-stats --timeout 300000 --messages=1024000 --reporting-interval 1000 --bootstrap-server=server1:9092,server2:9092,server3:9092"

停止 Broker

停止一台 server,使得其上的分区迁移到其他节点上。停止后可以观察生产者和消费者的恢复情况。


docker stop automq-server3

停止后可以看到生产者有如下日志:


[2024-04-29 05:00:03,436] WARN [Producer clientId=perf-producer-client] Got error produce response with correlation id 49732 on topic-partition self-balancing-topic-7, retrying (2147483641 attempts left). Error: NOT_LEADER_OR_FOLLOWER (org.apache.kafka.clients.producer.internals.Sender)
[2024-04-29 05:00:03,438] WARN [Producer clientId=perf-producer-client] Received invalid metadata error in produce request on partition self-balancing-topic-7 due to org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic partition.. Going to request metadata update now (org.apache.kafka.clients.producer.internals.Sender)

等待若干秒后可以看到生产消费恢复正常。


2024-05-07 11:56:08,920] WARN [Producer clientId=perf-producer-client] Received invalid metadata error in produce request on partition self-balancing-topic-3 due to org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic partition.. Going to request metadata update now (org.apache.kafka.clients.producer.internals.Sender)
[2024-05-07 11:56:08,920] WARN [Producer clientId=perf-producer-client] Got error produce response with correlation id 42141 on topic-partition self-balancing-topic-3, retrying (2147483646 attempts left). Error: NOT_LEADER_OR_FOLLOWER (org.apache.kafka.clients.producer.internals.Sender)
[2024-05-07 11:56:08,920] WARN [Producer clientId=perf-producer-client] Received invalid metadata error in produce request on partition self-balancing-topic-3 due to org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic partition.. Going to request metadata update now (org.apache.kafka.clients.producer.internals.Sender)
[2024-05-07 11:56:08,588] 25693 records sent, 5138.6 records/sec (5.02 MB/sec), 8.9 ms avg latency, 1246.0 ms max latency.
[2024-05-07 11:56:13,589] 25607 records sent, 5120.4 records/sec (5.00 MB/sec), 1.8 ms avg latency, 44.0 ms max latency.
[2024-05-07 11:56:18,591] 25621 records sent, 5121.1 records/sec (5.00 MB/sec), 1.6 ms avg latency, 10.0 ms max latency.

再次查看分区分布

生产者恢复写入后我们再次查看分区分布,可以看到分区全部分布在 broker1 上。AutoMQ 自动并且迅速地完成了被停止的节点上的分区迁移以及流量的重平衡。


docker run --network automq_net automqinc/automq:latest /bin/bash -c "/opt/kafka/kafka/bin/kafka-topics.sh --topic self-balancing-topic --describe --bootstrap-server server1:9092,server2:9092,server3:9092"


Topic: self-balancing-topic TopicId: AjoAB22YRRq7w6MdtZ4hDA PartitionCount: 16 ReplicationFactor: 1 Configs: min.insync.replicas=1,elasticstream.replication.factor=1,segment.bytes=1073741824
Topic: self-balancing-topic Partition: 0 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 1 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 2 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 3 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 4 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 5 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 6 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 7 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 8 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 9 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 10 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 11 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 12 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 13 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 14 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 15 Leader: 1 Replicas: 1 Isr: 1

再次启动 broker

再次重新启动 automq-server3 触发分区的自动迁移,生产者和消费者经过若干秒的重试后即可重新恢复。


docker start automq-server3

此时再次确认分区分布,可以发现分区已经完成了自动迁移。


Topic: self-balancing-topic TopicId: AjoAB22YRRq7w6MdtZ4hDA PartitionCount: 16 ReplicationFactor: 1 Configs: min.insync.replicas=1,elasticstream.replication.factor=1,segment.bytes=1073741824
Topic: self-balancing-topic Partition: 0 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 1 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 2 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 3 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 4 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 5 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 6 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 7 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 8 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 9 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 10 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 11 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 12 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 13 Leader: 2 Replicas: 2 Isr: 2
Topic: self-balancing-topic Partition: 14 Leader: 1 Replicas: 1 Isr: 1
Topic: self-balancing-topic Partition: 15 Leader: 1 Replicas: 1 Isr: 1