Prometheus Metrics
This document provides detailed observability metrics for AutoMQ, enabling you to better understand its performance and operational status.
AutoMQ metrics are defined and presented in the Prometheus format. If other protocol formats are needed, independent conversion is required.
General Metrics
Kafka_server_connection_count
The current number of connections established by the node.
- Type: Gauge
Kafka_network_threads_idle_rate
The idle rate of Kafka SocketServer network threads, ranging from [0, 1.0].
- Type: Gauge
Kafka_io_threads_idle_time_nanoseconds_total
The idle time of Kafka request handler threads is measured by the Apache Kafka native metric RequestHandlerAvgIdlePercent and is expressed as a percentage. By differentiating over time, you can determine the thread idle rate. Note that when a node functions as both a Controller and a Broker, each having its own request handler, this metric represents the combined value of both the Controller and Broker. The maximum idle rate derived from differentiation in this scenario is 2.0.
- Type: Counter
Controller Metrics
Kafka_controller_active_count
This indicates whether the current Controller node is active. A metric value of 1 signifies it is active, while 0 indicates it is inactive.
- Type: Gauge
Kafka_broker_active_count
The number of active Brokers in the current cluster.
- Type: Gauge
Kafka_broker_fenced_count
The number of Brokers that are fenced in the current cluster.
- Type: Gauge
Kafka_topic_count
Total number of topics in the current cluster.
- Type: Gauge
Kafka_partition_total_count
Total number of partitions in the current cluster.
- Type: Gauge
Kafka_partition_offline_count
Total number of partitions without leaders in the current cluster.
- Type: Gauge
Kafka_stream_auto_balancer_metrics_time_delay_milliseconds
The delay time for each broker node in the cluster to report AutoBalancer monitoring metrics. When this delay time exceeds a certain threshold, the broker node is deemed out-of-sync by the AutoBalancer and excluded from partition reassignment by the AutoBalancer.
-
Type: Gauge
-
Labels:
- node_id: The ID of the node reporting AutoBalancer monitoring metrics.
Kafka_stream_s3_object_count
The current total number of objects uploaded to object storage by the cluster, categorized by object state.
-
Type: Gauge
-
Labels:
-
State: Object state, divided into the following three categories:
-
- Prepared: Objects that have not yet completed writing and have not been committed
-
- Committed: Objects that have completed writing and have been committed
-
- Mark_destroyed: Objects marked for deletion, which will be removed from object storage after a certain delay
-
-
Kafka_stream_s3_object_size_bytes
The total size of objects uploaded to object storage by the current cluster.
- Type: Gauge
Kafka_stream_stream_object_num
The number of StreamObjects uploaded to object storage by the current cluster.
- Type: Gauge
Kafka_stream_stream_set_object_num
The number of StreamSetObjects uploaded to object storage by each Broker within the current cluster.
-
Type: Gauge
-
Labels:
- node_id: The corresponding Broker node ID
Broker Metrics
Kafka_message_count_total
The derivative of the total number of messages received by the Broker node with respect to time gives the message count throughput.
-
Type: Counter
-
Labels:
- topic
Kafka_network_io_bytes_total
The derivative of the total size of messages received and sent by the Broker node with respect to time gives the message size throughput.
-
Type: Counter
-
Labels:
-
topic
-
partition
-
direction:
-
"in": indicates incoming messages
-
"out": indicates outgoing messages
-
-
Kafka_topic_request_count_total
The total number of requests received for each Topic on the Broker node includes only the produce and fetch types of requests.
-
Type: Counter
-
Labels:
-
topic
-
type: Request Type
-
produce
-
fetch
-
-
Kafka_topic_request_failed_total
The total number of request failures for each topic on the Broker node, including only produce and fetch request types.
-
Type: Counter
-
Labels:
-
topic
-
type: Request Type
-
produce
-
fetch
-
-
Kafka_request_count_total
The total number of requests received by the Broker node.
-
Type: Counter
-
Labels:
-
type: Request Type
-
version: The API version for the request of this type
-
Kafka_request_error_count_total
The total number of failed requests on the Broker node. Note that this metric also accounts for successful requests, where the error code for a successful request is NONE.
-
Type: Counter
-
Labels:
-
type: Request Type
-
error: Error code, with NONE indicating a successful request
-
Kafka_request_size_bytes_total
The total size of requests received by the Broker node.
-
Type: Counter
-
Labels:
- type: Request Type
Kafka_request_size_50p(99p/mean/max)_bytes
The size of requests received by Broker nodes, represented by different percentiles.
-
Type: Gauge
-
Labels:
- type: Request Type
Kafka_request_time_milliseconds_total
The total time taken by Broker nodes to process requests.
-
Type: Counter
-
Labels:
- type: Request Type
Kafka_request_time_50p(99p/mean/max)_milliseconds
Processing time for broker node requests, shown in various percentiles.
-
Type: Gauge
-
Labels:
- type: Request Type
Kafka_request_queue_time_milliseconds_total
Total request queue time for broker nodes. When Kafka IO threads are busy, it results in longer request queue times.
-
Type: Counter
-
Labels:
- type: Request Type
Kafka_request_queue_time_50p(99p/mean/max)_milliseconds
Request queue time for broker nodes, illustrated in different percentiles.
-
Type: Gauge
-
Labels:
- type: Request Type
Kafka_response_queue_time_milliseconds_total
The response queue time on Broker nodes increases when Kafka Network threads are busy.
-
Type: Counter
-
Labels:
- type: Request Type
Kafka_response_queue_time_50p(99p/mean/max)_milliseconds
Broker node response queue time is represented by different percentiles.
-
Type: Gauge
-
Labels:
- type: Request Type
Kafka_request_queue_size
The request queue size for the broker node.
- Type: Gauge
Kafka_response_queue_size
The response queue size for the broker node.
- Type: Gauge
Kafka_purgatory_size
The number of requests in the producer or fetch purgatory on the broker node.
-
Type: Gauge
-
Labels:
-
type:
-
Produce
-
Fetch
-
-
Kafka_partition_count
The number of partitions currently assigned to the broker node.
- Type: Gauge
Kafka_logs_flush_time_50p(99p/mean/max)_milliseconds
The log flush time of the broker node; in AutoMQ, this indicates the flush time of Delta WAL, expressed by different percentiles.
- Type: Gauge
Kafka_log_end_offset
The maximum logical offset for each partition on the broker node.
-
Type: Gauge
-
Labels:
-
topic
-
partition
-
Kafka_log_size
The message size for each partition on the broker node.
-
Type: Gauge
-
Labels:
-
topic
-
partition
-
Kafka_group_commit_offset
The consumption offset for each Consumer Group on the corresponding partition; note that this metric is reported by the Broker where the Group Coordinator for each Consumer Group resides.
-
Type: Gauge
-
Labels:
-
consumer_group
-
topic
-
partition
-
Kafka_group_count
The number of Consumer Groups managed by the Broker node where each Group Coordinator is located.
- Type: Gauge
Kafka_group_preparing_rebalance_count
The number of Consumer Groups that are preparing to rebalance.
- Type: Gauge
Kafka_group_completing_rebalance_count
Number of Consumer Groups waiting for state assignment from the Leader.
- Type: Gauge
Kafka_group_stable_count
Number of Consumer Groups in a Stable state.
- Type: Gauge
Kafka_group_empty_count
Number of Consumer Groups with no members but not yet expired.
- Type: Gauge
Kafka_group_dead_count
Number of Consumer Groups with no members and metadata already removed.
- Type: Gauge
Kafka_stream_upload_size_bytes_total
Total size of data uploaded by Broker nodes to object storage.
- Type: Counter
Kafka_stream_download_size_bytes_total
The total size of data downloaded from object storage by the Broker node.
- Type: Counter
Kafka_stream_network_inbound_usage_bytes_total
The total inbound bandwidth usage of the Broker node, including received messages and data downloaded from object storage, can be analyzed over time to determine inbound throughput.
- Type: Counter
Kafka_stream_network_outbound_usage_bytes_total
The total outbound bandwidth usage of the Broker node, including consumed messages and data uploaded to object storage, can be analyzed over time to determine outbound throughput.
- Type: Counter
Kafka_stream_network_inbound_available_bandwidth_bytes
Inbound throughput reserved for cold reads and Compaction on the Broker node—when this value is less than what is required for cold reads and Compaction inbound traffic, the respective requests are queued in the rate limiting queue, thus not affecting the normal message sending and receiving traffic. Note that this metric only represents the instantaneous value at the time of sampling and is for reference only due to constraints from the sampling interval and rate limiting strategy implementation.
- Type: Gauge
Kafka_stream_network_outbound_available_bandwidth_bytes
Outbound throughput reserved for cold reads and Compaction on the Broker node—when this value is less than what is required for cold reads and Compaction outbound traffic, the respective requests are queued in the rate limiting queue, thus not affecting the normal message sending and receiving traffic. Note that this metric only represents the instantaneous value at the time of sampling and is for reference only due to constraints from the sampling interval and rate limiting strategy implementation.
- Type: Gauge
Kafka_stream_network_inbound_limiter_queue_time_50p(99p/mean/max)_nanoseconds
The queuing time for incoming requests for cold reads and Compaction in the rate limiting queue during execution.
- Type: Gauge
Kafka_stream_network_outbound_limiter_queue_time_50p(99p/mean/max)_nanoseconds
The queuing time for outgoing requests for cold reads and Compaction in the rate limiting queue during execution.
- Type: Gauge
Kafka_stream_operation_latency_50p(99p/mean/max)_nanoseconds
The operational duration of each stage in the AutoMQ S3Stream module.
-
Type: Gauge
-
Labels:
-
operation_type
-
operation_name
-
Kafka_stream_cert_expiry_timestamp_milliseconds
This metric shows the expiration UNIX timestamp of the TLS certificate, measured in milliseconds.
-
Type: gauge
-
Labels:
-
instance: Instance ID.
-
job: Task identifier.
-
host_name: System hostname.
-
cert_subject: Certificate subject.
-
cert_type: Certificate type, where
server_cert
represents a server certificate, andtruststore_cert
represents a CA certificate.
-
Kafka_stream_cert_days_remaining
This metric indicates the number of days remaining until the expiration of the TLS certificate from the current moment.
-
Type: gauge
-
Labels:
-
instance: Instance ID.
-
job: Task identifier.
-
host_name: System hostname.
-
cert_subject: Certificate subject.
-
cert_type: Certificate type, where
server_cert
represents a server certificate, andtruststore_cert
represents a CA certificate.
-