Prometheus Metrics
This article will introduce the observable metric definitions of AutoMQ, helping you better understand the performance and operational status of AutoMQ.
AutoMQ metrics are defined and presented in Prometheus format; if other protocol formats are needed, you'll need to perform the conversion yourself.
General Metrics
Kafka_server_connection_count
The current number of connections established by the node.
- Type: Gauge
Kafka_network_threads_idle_rate
The idle rate of Kafka SocketServer network threads, range: [0, 1.0].
- Type: Gauge
Kafka_io_threads_idle_time_nanoseconds_total
The idle time for Kafka request handler threads, expressed as the cumulative value of Apache Kafka's native metric, RequestHandlerAvgIdlePercent, is recorded in nanoseconds. By calculating the derivative with respect to time (measured in nanoseconds), the thread idle rate can be determined. Note that when a node functions as a combined node (acting as both Controller and Broker), since each has its own request handlers, this metric represents the sum of the Controller and Broker values. The maximum derived thread idle rate in this scenario is 2.0.
- Type: Counter
Controller Metric
Kafka_controller_active_count
This metric indicates whether the current Controller node is active. A value of 1 indicates an active status, whereas 0 denotes inactivity.
- Type: Gauge
Kafka_broker_active_count
The number of active Brokers in the current cluster.
- Type: Gauge
Kafka_broker_fenced_count
The number of Brokers currently fenced in the cluster.
- Type: Gauge
Kafka_topic_count
Total number of topics in the current cluster.
- Type: Gauge
Kafka_partition_total_count
Total number of partitions in the current cluster.
- Type: Gauge
Kafka_partition_offline_count
Total number of partitions without leaders in the current cluster.
- Type: Gauge
Kafka_stream_auto_balancer_metrics_time_delay_milliseconds
Latency time for AutoBalancer monitoring metrics reported by each Broker node in the cluster. When latency exceeds a certain threshold, the Broker node is considered out-of-sync by AutoBalancer and will no longer participate in partition scheduling by AutoBalancer.
-
Type: Gauge
-
Labels:
- node_id: Node ID reporting AutoBalancer monitoring metrics.
Kafka_stream_s3_object_count
The current total number of objects uploaded to the object storage from the cluster, categorized by object state.
-
Type: Gauge
-
Labels:
-
state: The state of the object, divided into the following three categories:
-
prepared: Objects that are not yet fully written and have not been committed
-
committed: Objects that are fully written and have been committed
-
mark_destroyed: Objects marked for deletion, which will be removed from the object storage after a certain delay
-
-
Kafka_stream_s3_object_size_bytes
The total size of objects uploaded to object storage by the current cluster.
- Type: Gauge
Kafka_stream_stream_object_num
The number of StreamObjects uploaded to object storage by the current cluster.
- Type: Gauge
Kafka_stream_stream_set_object_num
The number of StreamSetObjects uploaded to object storage by each broker in the current cluster.
-
Type: Gauge
-
Labels:
- node_id: The corresponding broker node ID
Broker Metrics
Kafka_message_count_total
The rate of change over time of the total number of messages received by the Broker node determines the message throughput.
-
Type: Counter
-
Labels:
- topic
Kafka_network_io_bytes_total
The rate of change over time of the total size of messages received and sent by the Broker node represents the message size throughput.
-
Type: Counter
-
Labels:
-
topic
-
partition
-
direction:
-
in: Indicates receiving messages
-
out: Indicates sending messages
-
-
Kafka_topic_request_count_total
The total number of requests received for each Topic on the Broker node comprises only two types of requests: produce and fetch.
-
Type: Counter
-
Labels:
-
topic
-
type: Request Type
-
produce
-
fetch
-
-
Kafka_topic_request_failed_total
The total number of failed requests for each Topic on Broker nodes, including only produce and fetch request types.
-
Type: Counter
-
Labels:
-
topic
-
type: Request Type
-
produce
-
fetch
-
-
Kafka_request_count_total
The total number of requests received by Broker nodes.
-
Type: Counter
-
Labels:
-
type: Request Type
-
version: The API version for this type of request.
-
Kafka_request_error_count_total
The total count of request failures on the Broker node. Note that successful requests are also included in this metric, with the error code for successful requests being NONE.
-
Type: Counter
-
Labels:
-
type: Type of request.
-
error: Error code indicating the state of the request, where NONE denotes that the request was successful.
-
Kafka_request_size_bytes_total
The total size of requests received by the Broker node.
-
Type: Counter
-
Labels:
- type: Request type
Kafka_request_size_50p(99p/mean/max)_bytes
Size of the requests received by the Broker node, represented by different percentiles.
-
Type: Gauge
-
Labels:
- type: Request type
Kafka_request_time_milliseconds_total
Total processing time of requests handled by the Broker node.
-
Type: Counter
-
Labels:
- type: Request type
Kafka_request_time_50p(99p/mean/max)_milliseconds
Time taken by broker nodes to process requests, represented by different percentiles.
-
Type: Gauge
-
Labels:
- type: Request type
Kafka_request_queue_time_milliseconds_total
The total queue time for broker node requests. When the Kafka I/O thread is busy, it results in increased request queue time.
-
Type: Counter
-
Labels:
- type: Request type
Kafka_request_queue_time_50p(99p/mean/max)_milliseconds
Queue time for broker node requests, represented by different percentiles.
-
Type: Gauge
-
Labels:
- type: Request Type
Kafka_response_queue_time_milliseconds_total
When Kafka network threads are busy, the response queue time for the Broker node increases.
-
Type: Counter
-
Labels:
- type: Request Type
Kafka_response_queue_time_50p(99p/mean/max)_milliseconds
The Broker node's response queue time is shown across different percentiles.
-
Type: Gauge
-
Labels:
- type: Request Type
Kafka_request_queue_size
Request queue size of the Broker node.
- Type: Gauge
Kafka_response_queue_size
Response queue size of the Broker node.
- Type: Gauge
Kafka_purgatory_size
Number of requests in purgatory waiting for producer or fetch on the Broker node.
-
Type: Gauge
-
Labels:
-
type:
-
Produce
-
Fetch
-
-
Kafka_partition_count
Number of currently assigned partitions on the Broker node.
- Type: Gauge
Kafka_logs_flush_time_50p(99p/mean/max)_milliseconds
Log flush time of the Broker node, represented as Delta WAL flush time in AutoMQ, categorized by different percentiles.
- Type: Gauge
Kafka_log_end_offset
The maximum logical offset of each partition on the broker node.
-
Type: Gauge
-
Labels:
-
topic
-
partition
-
Kafka_log_size
The message size of each partition on the broker node.
-
Type: Gauge
-
Labels:
-
topic
-
partition
-
Kafka_group_commit_offset
The consumption offsets for each consumer group on the corresponding partition. Note that this metric is reported by the broker where the corresponding group coordinator for each consumer group resides.
-
Type: Gauge
-
Labels:
-
consumer_group
-
topic
-
partition
-
Kafka_group_count
The number of consumer groups managed by the broker node where each group coordinator resides.
- Type: Gauge
Kafka_group_preparing_rebalance_count
The number of consumer groups preparing for rebalance.
- Type: Gauge
Kafka_group_completing_rebalance_count
The number of consumer groups waiting for state assignment by the leader.
- Type: Gauge
Kafka_group_stable_count
The number of consumer groups in the Stable state.
- Type: Gauge
Kafka_group_empty_count
The number of consumer groups without any members but not yet expired.
- Type: Gauge
Kafka_group_dead_count
The number of consumer groups without any members and whose metadata has been removed.
- Type: Gauge
Kafka_stream_upload_size_bytes_total
The total size of data uploaded by broker nodes to object storage.
- Type: Counter
Kafka_stream_download_size_bytes_total
Total size of data downloaded from object storage by Broker nodes.
- Type: Counter
Kafka_stream_network_inbound_usage_bytes_total
Total inbound bandwidth usage of Broker nodes, including message reception and data downloaded from object storage. By differentiating with respect to time, you obtain inbound traffic throughput.
- Type: Counter
Kafka_stream_network_outbound_usage_bytes_total
Total outbound bandwidth usage of Broker nodes, including message consumption and data uploaded to object storage. By differentiating with respect to time, you obtain outbound traffic throughput.
- Type: Counter
Kafka_stream_network_inbound_available_bandwidth_bytes
Inbound traffic throughput reserved by Broker nodes for cold reads and Compaction. When this value is less than the inbound traffic demand for cold reads and Compaction, corresponding requests are placed in a rate-limiting queue for processing. Normal message sending and receiving traffic is not affected by this rate-limiting. Note that this metric represents only the instantaneous value during sampling and, limited by sampling intervals and rate-limiting implementation specifics, should be used for reference only.
- Type: Gauge
Kafka_stream_network_outbound_available_bandwidth_bytes
Outbound traffic throughput reserved by Broker nodes for cold reads and Compaction. When this value is less than the outbound traffic demand for cold reads and Compaction, corresponding requests are placed in a rate-limiting queue for processing. Normal message sending and receiving traffic is not affected by this rate-limiting. Note that this metric represents only the instantaneous value during sampling and, limited by sampling intervals and rate-limiting implementation specifics, should be used for reference only.
- Type: Gauge
Kafka_stream_network_inbound_limiter_queue_time_50p(99p/mean/max)_nanoseconds
The queue time in the rate-limited queue for incoming traffic requests during cold reads and compaction when executed.
- Type: Gauge
Kafka_stream_network_outbound_limiter_queue_time_50p(99p/mean/max)_nanoseconds
The queue time in the rate-limited queue for outgoing traffic requests during cold reads and compaction when executed.
- Type: Gauge
Kafka_stream_operation_latency_50p(99p/mean/max)_nanoseconds
The operation duration for each stage of the AutoMQ S3Stream module.
-
Type: Gauge
-
Labels:
-
operation_type
-
operation_name
-