Skip to Main Content

AutoMQ Benchmark

AutoMQ, built on an S3 shared storage architecture that separates compute and storage, offers significant benefits such as rapid scalability, cost efficiency, and high performance, all while ensuring 100% compatibility with Apache Kafka. Compared to Apache Kafka, AutoMQ provides better throughput performance and higher maximum throughput during cold reads. This document describes how to conduct performance testing on an AutoMQ cluster.

Prerequisites

Before conducting cluster performance testing, the following conditions must be met:

  1. Complete the installation and deployment of the AutoMQ cluster. You can refer to the following methods for installing and deploying AutoMQ.
  1. Prepare necessary load generators. It is recommended to create a certain number of load generators within the same VPC network to ensure that network throughput and the CPU of the load generators do not become bottlenecks in the testing scenario.

Test Scenario

Scenario 1: Tail Read

Tail Read, commonly referred to as "hot read," evaluates conditions where the offset difference between the Producer and Consumer is minimal. The essential metrics for hot write and hot read scenarios are as follows:

  • Read/Write Throughput: Real-time read/write scenarios are typical for Kafka. Achieving higher throughput with the same hardware resources implies improved performance.

  • Write and E2E Latency: Given constant network throughput, lower message write latency is desirable. Similarly, reduced E2E latency (the time taken from production to consumption of a message) is advantageous.

In the Tail Read scenario, messages sent by the Producer are consumed immediately after being written to the Broker. Here, messages consumed by the Consumer are retrieved directly from the AutoMQ Log Cache, which eliminates the need to read from object storage, thereby reducing resource consumption.

Scenario 2: Catch-Up Read

Catch-Up Read, also known as a "catch-up read" or "cold read," evaluates the scenario where the Consumer's consumption offset significantly lags behind the Producer's offset. In this scenario, the Consumer is initially paused and, after accumulating a certain number of messages, consumption resumes. At this point, messages consumed by the Consumer are fetched from object storage, with Block Cache handling pre-reading and caching.

In this test scenario, the following metrics are primarily focused on:

  • Is the catch-up read speed fast enough? Determine whether the consumption speed of each Consumer Group exceeds the Producer's write speed. Only if it does can it be said that the Consumer is capable of catching up with the Producer.

  • Is Write Throughput Affected During Catch-Up Reads? Check to see if there's a reduction in the message flow from the Producer during catch-up reads and if there's an increase in send latency.

Testing with Kafka CLI

Users can perform performance tests using Kafka CLI tools by running kafka-producer-perf-test.sh and kafka-consumer-perf-test.sh.

If your previous AutoMQ cluster was deployed following the guide Deploy Multi-Nodes Cluster on Linux▸, the resulting cluster bootstrap address will look something like "192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092".

Please adjust the bootstrap-server addresses below to match your actual cluster configuration.

Create Topic


./kafka-topics.sh --create --topic test-topic --bootstrap-server 192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092

Send Message


./kafka-producer-perf-test.sh --topic test-topic --num-records=1024000 --throughput 5120 --record-size 1024 --producer-props bootstrap.servers=192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092

The parameters linger.ms and batch.size are crucial for performance tuning. You can consult Performance Tuning For Client▸ to adjust these settings according to your specific needs.

Consume Message


./kafka-consumer-perf-test.sh --topic test-topic --show-detailed-stats --timeout 300000 --messages=1024000 --reporting-interval 1000 --bootstrap-server=192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092

The AutoMQ team has enhanced the Kafka performance testing CLI tool provided by the community and released a one-click performance testing tool. This tool draws on the OpenMessaging Benchmark framework, providing the following advantages:

  • Support for single-process multiple clients to enhance benchmarking density and efficiency: Unlike Apache Kafka's built-in kafka-producer-perf-test.sh and kafka-consumer-perf-test.sh scripts, the automq-perf-test.sh script allows for multiple Producers and Consumers to run within a single process, thereby sending and receiving messages across multiple Topics. This approach better mirrors real-world scenarios and improves ease of use.

  • One-click execution without distributed deployment: In contrast to the OpenMessaging Benchmark testing framework, automq-perf-test.sh eliminates the need for distributed deployment of multiple Workers, enabling you to execute tests with a single click on a single machine. This simplifies deployment and use in testing scenarios where large scale is not required.

  • Provides Catch Up Read test scenarios: Furthermore, the automq-perf-test.sh script can handle more complex cold read test scenarios by permitting multiple Consumer Groups to start consumption at different offsets. This setup helps prevent cache reuse during cold reads, allowing for performance testing under more rigorous conditions.

  • Open and neutral: This testing script exclusively utilizes the Apache Kafka Client, which ensures compatibility with performance testing on stream systems like Apache Kafka, MSK, and other platforms supporting the Kafka protocol.

Scenario 1: Tail Read

The following use case tests the Tail Read performance of AutoMQ. This test case involves:

  • A production and consumption traffic ratio of 1:1.

  • Data is written to a total of 1,280 partitions across 10 topics.

  • Writing 1600 messages per second, each 51 KiB in size (without any batching), resulting in a write speed of 80 MiB/s.

Note that before executing the script below, you need to replace the --bootstrap-server address with the actual AutoMQ access point address.


KAFKA_HEAP_OPTS="-Xmx12g -Xms12g" ./bin/automq-perf-test.sh \
--bootstrap-server 0.kf-v8tj9bmunqdo1og8.wanshao-for-aws.automq.private:9092,1.kf-v8tj9bmunqdo1og8.wanshao-for-aws.automq.private:9092,2.kf-v8tj9bmunqdo1og8.wanshao-for-aws.automq.private:9092 \
--producer-configs batch.size=0 \
--consumer-configs fetch.max.wait.ms=1000 \
--topics 10 \
--partitions-per-topic 128 \
--producers-per-topic 1 \
--groups-per-topic 1 \
--consumers-per-group 1 \
--record-size 52224 \
--send-rate 1600 \
--warmup-duration 10 \
--test-duration 5 \
--reset

Scenario 2: Catch-Up Read

The following use case tests the Catch-Up Read performance of AutoMQ, structured as follows:

  • The production-to-consumption traffic ratio is 1:3.

  • Data is written to a total of 1,280 partitions across 10 topics.

  • 800 messages, each sized at 64 KiB, are written per second (with no batching), resulting in a write speed of 50 MiB/s.

  • Buffer 600 seconds of data (~30 GiB) before beginning to catch up, then initiate 3 Consumer Groups to start catching up, with each Group's starting point staggered by 30 seconds (~1.5 GiB).

Note that before executing the script below, you need to replace the --bootstrap-server address with the actual AutoMQ access point address.


KAFKA_HEAP_OPTS="-Xmx12g -Xms12g" ./bin/automq-perf-test.sh \
--bootstrap-server 0.kf-hsd29pri8q5myud5.wanshao-for-aws.automq.private:9092,1.kf-hsd29pri8q5myud5.wanshao-for-aws.automq.private:9092,2.kf-hsd29pri8q5myud5.wanshao-for-aws.automq.private:9092 \
--producer-configs batch.size=0 \
--consumer-configs fetch.max.wait.ms=1000 \
--topics 10 \
--partitions-per-topic 128 \
--producers-per-topic 1 \
--groups-per-topic 3 \
--consumers-per-group 1 \
--record-size 65536 \
--send-rate 800 \
--backlog-duration 600 \
--group-start-delay 30 \
--warmup-duration 5 \
--reset

Startup Parameters:

  • --bootstrap-server: Specifies the initial connection nodes for the Kafka cluster, provided in the format "host1:port1,host2:port2". It should be noted that these addresses are only used for the initial connection to obtain cluster metadata, so you do not need to provide the addresses of all Brokers in the cluster. Just a few running and accessible addresses will suffice.

  • --common-configs: Specifies the common configuration for Kafka Admin Client, Producer, and Consumer, including authentication-related configurations.

  • --topic-configs: Specify configurations related to topics, such as message retention time, etc.

  • --producer-configs: Specify configurations related to producers, like batch size, batch duration, compression method, etc.

  • --consumer-configs: Specify configurations related to consumers, including the maximum size of messages fetched in a single operation, etc.

  • --reset: Determines whether to delete all existing topics in the cluster before executing the benchmark.

  • --topic-prefix: The prefix for topics used during the test.

  • --topics : The number of topics created during testing.

  • --partitions-per-topic : The number of partitions in each topic. The total number of partitions used for testing is calculated as --topics * --partitions-per-topic.

  • --producers-per-topic : The number of producers created for each topic. The total number of producers used for testing is calculated as --topics * --producers-per-topic.

  • --groups-per-topic : The number of consumer groups created for each topic, which also represents the read-write ratio (fan-out) during testing.

  • --consumers-per-group : The number of consumers in each consumer group. The total number of consumers used for testing is calculated as --topics * --groups-per-topic * --consumers-per-group.

  • --record-size: The size of each message sent by the Producer, measured in bytes.

  • --send-rate: The total number of messages sent per second by all Producers. The product of --record-size and --send-rate represents the write throughput during testing.

  • --random-ratio: The proportion of random data in messages, commonly used for testing scenarios where the Producer has enabled compression. The value ranges between 0.0 and 1.0; the larger the value, the more random data in messages, theoretically resulting in poorer compression efficiency. The default value is 0.0, meaning each message is entirely identical.

  • --random-pool-size: The size of the random message pool. A message is randomly selected from this pool each time a message is sent. This option is only effective when --random-ratio is greater than 0.

  • --backlog-duration: Used in catch-up read test scenarios to control the duration of message accumulation, measured in seconds. The product of --record-size, --send-rate, and --backlog-duration represents the size of accumulated messages before the catch-up reading begins.

  • --group-start-delay: Used in catch-up read test scenarios to control the interval at which each Consumer Group begins consumption. Measured in seconds, this option can stagger the consumption progress of each Consumer Group to prevent cache reuse, thereby more accurately simulating real catch-up read scenarios.

  • --send-rate-during-catchup: Used in catch-up read test scenarios to control the sending rate of the Producer during the catch-up read period. By default, it mirrors the setting of --send-rate.

  • --warmup-duration: Specifies the duration of the warm-up period before conducting the test, measured in minutes. During the warm-up period, the first 50% gradually increases the Producer's sending rate to the level specified by --send-rate, and the latter 50% maintains it at --send-rate. Metrics collected during the warm-up period are excluded from the final results. To effectively warm up the JVM, it is advised to set --warmup-duration to at least 10 minutes.

  • --test-duration: Indicates the duration of the actual test, measured in minutes. This parameter is applicable only in non-catch-up read test scenarios (when --backlog-duration is less than or equal to 0).

  • --reporting-interval: Determines the frequency for collecting statistics of related metrics during the test, measured in seconds.

Performance Optimization and Technical Support

The performance of an AutoMQ cluster is influenced by various factors, including computing resource specifications and kernel parameter tuning. Performance optimization and baseline calibration for production environments can be quite complex.

You can reach out to the AutoMQ team via this form to acquire performance test reports and best practices for performance optimization in production scenarios.