How AutoMQ Achieves Second-Level Partition Reassignments

- When the KRaft Controller receives a partition reassignment command, it constructs the corresponding PartitionChangeRecord and commits it to the KRaft Log layer. This action removes Broker-0 from the Leader Replica list and adds Broker-1 to the Follower Replica list. Broker-0, upon syncing the KRaft Log and detecting the P1 partition change, initiates the partition shutdown process.
- During the partition shutdown, if P1 contains data that has not yet been uploaded to object storage, a forced upload is triggered. In a stable running cluster, this data typically amounts to a few hundred megabytes. Given the burst network bandwidth capabilities provided by current cloud providers, this process usually takes only seconds. Once P1’s data upload is complete, the partition can be safely closed and deleted from Broker-0.
- After the Broker completes its shutdown, it will proactively trigger a leader election. At this point, Broker-1, being the sole Replica, is promoted to the Leader of P1, and the partition recovery process begins.
- During partition recovery, the metadata corresponding to P1 is fetched from the object storage to restore the relevant Checkpoint for P1. Depending on P1’s shutdown state (whether it was a Cleaned Shutdown), the corresponding data recovery is performed.
- At this point, the partition reassignment is complete.
Significance of Second-Level Partition Reassignment
In a production environment, a Kafka cluster typically serves multiple applications. Fluctuations in application traffic and partition distribution can cause cluster capacity issues or machine hotspots. Kafka operations personnel need to expand the cluster and reassign hotspot partitions to idle nodes to ensure the availability of cluster services. The time taken for partition reassignment determines the efficiency of emergency response and maintenance:- The shorter the partition reassignment time, the shorter the duration from cluster expansion to meeting capacity demands, and the shorter the service disruption time.
- Faster partition reassignment results in shorter observation times for operations personnel, enabling quicker operational feedback and decision-making for subsequent actions.