Skip to Main Content

Prometheus Metrics

This article will introduce the observable metric definitions of AutoMQ, helping you better understand the performance and operational status of AutoMQ.

Info

AutoMQ metrics are defined and presented in Prometheus format; if other protocol formats are needed, you'll need to perform the conversion yourself.

General Metrics

Kafka_server_connection_count

The current number of connections established by the node.

  • Type: Gauge

Kafka_network_threads_idle_rate

The idle rate of Kafka SocketServer network threads, range: [0, 1.0].

  • Type: Gauge

Kafka_io_threads_idle_time_nanoseconds_total

The idle time for Kafka request handler threads, expressed as the cumulative value of Apache Kafka's native metric, RequestHandlerAvgIdlePercent, is recorded in nanoseconds. By calculating the derivative with respect to time (measured in nanoseconds), the thread idle rate can be determined. Note that when a node functions as a combined node (acting as both Controller and Broker), since each has its own request handlers, this metric represents the sum of the Controller and Broker values. The maximum derived thread idle rate in this scenario is 2.0.

  • Type: Counter

Controller Metric

Kafka_controller_active_count

This metric indicates whether the current Controller node is active. A value of 1 indicates an active status, whereas 0 denotes inactivity.

  • Type: Gauge

Kafka_broker_active_count

The number of active Brokers in the current cluster.

  • Type: Gauge

Kafka_broker_fenced_count

The number of Brokers currently fenced in the cluster.

  • Type: Gauge

Kafka_topic_count

Total number of topics in the current cluster.

  • Type: Gauge

Kafka_partition_total_count

Total number of partitions in the current cluster.

  • Type: Gauge

Kafka_partition_offline_count

Total number of partitions without leaders in the current cluster.

  • Type: Gauge

Kafka_stream_auto_balancer_metrics_time_delay_milliseconds

Latency time for AutoBalancer monitoring metrics reported by each Broker node in the cluster. When latency exceeds a certain threshold, the Broker node is considered out-of-sync by AutoBalancer and will no longer participate in partition scheduling by AutoBalancer.

  • Type: Gauge

  • Labels:

    • node_id: Node ID reporting AutoBalancer monitoring metrics.

Kafka_stream_s3_object_count

The current total number of objects uploaded to the object storage from the cluster, categorized by object state.

  • Type: Gauge

  • Labels:

    • state: The state of the object, divided into the following three categories:

      • prepared: Objects that are not yet fully written and have not been committed

      • committed: Objects that are fully written and have been committed

      • mark_destroyed: Objects marked for deletion, which will be removed from the object storage after a certain delay

Kafka_stream_s3_object_size_bytes

The total size of objects uploaded to object storage by the current cluster.

  • Type: Gauge

Kafka_stream_stream_object_num

The number of StreamObjects uploaded to object storage by the current cluster.

  • Type: Gauge

Kafka_stream_stream_set_object_num

The number of StreamSetObjects uploaded to object storage by each broker in the current cluster.

  • Type: Gauge

  • Labels:

    • node_id: The corresponding broker node ID

Broker Metrics

Kafka_message_count_total

The rate of change over time of the total number of messages received by the Broker node determines the message throughput.

  • Type: Counter

  • Labels:

    • topic

Kafka_network_io_bytes_total

The rate of change over time of the total size of messages received and sent by the Broker node represents the message size throughput.

  • Type: Counter

  • Labels:

    • topic

    • partition

    • direction:

      • in: Indicates receiving messages

      • out: Indicates sending messages

Kafka_topic_request_count_total

The total number of requests received for each Topic on the Broker node comprises only two types of requests: produce and fetch.

  • Type: Counter

  • Labels:

    • topic

    • type: Request Type

      • produce

      • fetch

Kafka_topic_request_failed_total

The total number of failed requests for each Topic on Broker nodes, including only produce and fetch request types.

  • Type: Counter

  • Labels:

    • topic

    • type: Request Type

      • produce

      • fetch

Kafka_request_count_total

The total number of requests received by Broker nodes.

  • Type: Counter

  • Labels:

    • type: Request Type

    • version: The API version for this type of request.

Kafka_request_error_count_total

The total count of request failures on the Broker node. Note that successful requests are also included in this metric, with the error code for successful requests being NONE.

  • Type: Counter

  • Labels:

    • type: Type of request.

    • error: Error code indicating the state of the request, where NONE denotes that the request was successful.

Kafka_request_size_bytes_total

The total size of requests received by the Broker node.

  • Type: Counter

  • Labels:

    • type: Request type

Kafka_request_size_50p(99p/mean/max)_bytes

Size of the requests received by the Broker node, represented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request type

Kafka_request_time_milliseconds_total

Total processing time of requests handled by the Broker node.

  • Type: Counter

  • Labels:

    • type: Request type

Kafka_request_time_50p(99p/mean/max)_milliseconds

Time taken by broker nodes to process requests, represented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request type

Kafka_request_queue_time_milliseconds_total

The total queue time for broker node requests. When the Kafka I/O thread is busy, it results in increased request queue time.

  • Type: Counter

  • Labels:

    • type: Request type

Kafka_request_queue_time_50p(99p/mean/max)_milliseconds

Queue time for broker node requests, represented by different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request Type

Kafka_response_queue_time_milliseconds_total

When Kafka network threads are busy, the response queue time for the Broker node increases.

  • Type: Counter

  • Labels:

    • type: Request Type

Kafka_response_queue_time_50p(99p/mean/max)_milliseconds

The Broker node's response queue time is shown across different percentiles.

  • Type: Gauge

  • Labels:

    • type: Request Type

Kafka_request_queue_size

Request queue size of the Broker node.

  • Type: Gauge

Kafka_response_queue_size

Response queue size of the Broker node.

  • Type: Gauge

Kafka_purgatory_size

Number of requests in purgatory waiting for producer or fetch on the Broker node.

  • Type: Gauge

  • Labels:

    • type:

      • Produce

      • Fetch

Kafka_partition_count

Number of currently assigned partitions on the Broker node.

  • Type: Gauge

Kafka_logs_flush_time_50p(99p/mean/max)_milliseconds

Log flush time of the Broker node, represented as Delta WAL flush time in AutoMQ, categorized by different percentiles.

  • Type: Gauge

Kafka_log_end_offset

The maximum logical offset of each partition on the broker node.

  • Type: Gauge

  • Labels:

    • topic

    • partition

Kafka_log_size

The message size of each partition on the broker node.

  • Type: Gauge

  • Labels:

    • topic

    • partition

Kafka_group_commit_offset

The consumption offsets for each consumer group on the corresponding partition. Note that this metric is reported by the broker where the corresponding group coordinator for each consumer group resides.

  • Type: Gauge

  • Labels:

    • consumer_group

    • topic

    • partition

Kafka_group_count

The number of consumer groups managed by the broker node where each group coordinator resides.

  • Type: Gauge

Kafka_group_preparing_rebalance_count

The number of consumer groups preparing for rebalance.

  • Type: Gauge

Kafka_group_completing_rebalance_count

The number of consumer groups waiting for state assignment by the leader.

  • Type: Gauge

Kafka_group_stable_count

The number of consumer groups in the Stable state.

  • Type: Gauge

Kafka_group_empty_count

The number of consumer groups without any members but not yet expired.

  • Type: Gauge

Kafka_group_dead_count

The number of consumer groups without any members and whose metadata has been removed.

  • Type: Gauge

Kafka_stream_upload_size_bytes_total

The total size of data uploaded by broker nodes to object storage.

  • Type: Counter

Kafka_stream_download_size_bytes_total

Total size of data downloaded from object storage by Broker nodes.

  • Type: Counter

Kafka_stream_network_inbound_usage_bytes_total

Total inbound bandwidth usage of Broker nodes, including message reception and data downloaded from object storage. By differentiating with respect to time, you obtain inbound traffic throughput.

  • Type: Counter

Kafka_stream_network_outbound_usage_bytes_total

Total outbound bandwidth usage of Broker nodes, including message consumption and data uploaded to object storage. By differentiating with respect to time, you obtain outbound traffic throughput.

  • Type: Counter

Kafka_stream_network_inbound_available_bandwidth_bytes

Inbound traffic throughput reserved by Broker nodes for cold reads and Compaction. When this value is less than the inbound traffic demand for cold reads and Compaction, corresponding requests are placed in a rate-limiting queue for processing. Normal message sending and receiving traffic is not affected by this rate-limiting. Note that this metric represents only the instantaneous value during sampling and, limited by sampling intervals and rate-limiting implementation specifics, should be used for reference only.

  • Type: Gauge

Kafka_stream_network_outbound_available_bandwidth_bytes

Outbound traffic throughput reserved by Broker nodes for cold reads and Compaction. When this value is less than the outbound traffic demand for cold reads and Compaction, corresponding requests are placed in a rate-limiting queue for processing. Normal message sending and receiving traffic is not affected by this rate-limiting. Note that this metric represents only the instantaneous value during sampling and, limited by sampling intervals and rate-limiting implementation specifics, should be used for reference only.

  • Type: Gauge

Kafka_stream_network_inbound_limiter_queue_time_50p(99p/mean/max)_nanoseconds

The queue time in the rate-limited queue for incoming traffic requests during cold reads and compaction when executed.

  • Type: Gauge

Kafka_stream_network_outbound_limiter_queue_time_50p(99p/mean/max)_nanoseconds

The queue time in the rate-limited queue for outgoing traffic requests during cold reads and compaction when executed.

  • Type: Gauge

Kafka_stream_operation_latency_50p(99p/mean/max)_nanoseconds

The operation duration for each stage of the AutoMQ S3Stream module.

  • Type: Gauge

  • Labels:

    • operation_type

    • operation_name