JD Case Study - AutoMQ

"AutoMQ's cloud-native architecture perfectly aligns with JD's strategy of running core infrastructure on Kubernetes. By offloading Kafka's data durability to CubeFS, we not only solved the severe storage and network redundancy issues inherent in traditional architectures but also achieved true second-level elasticity. This allows us to effortlessly handle e-commerce traffic floods while significantly reducing infrastructure costs."

Hou Zhong, Cloud Native Architect for Kafka @ JD.com

The Challenge

JD's streaming platform serves over 1,400 business lines and processes trillion scale records everyday. As the company aimed to migrate its infrastructure from physical machines to Kubernetes, the legacy "Shared-Nothing" architecture of Apache Kafka created severe inefficiencies:

The "Double Redundancy" Trap: JD utilizes CubeFS—its proprietary, highly reliable, S3-compatible object storage—as the underlying storage layer. Since legacy Kafka requires its own 3-replica ISR mechanism for durability, and CubeFS enforces its own internal 3-replica consistency, a single write resulted in 9 actual copies of data. This wasted ~67% of storage space and saturated network bandwidth with unnecessary replication traffic.
Inflexible Scaling for E-commerce Peaks: Scaling stateful Kafka brokers on Kubernetes was a high-risk operation requiring hours of data rebalancing. During massive promotional events like "618" or "Double 11," the team could not scale out fast enough to meet traffic surges.
Stalled Containerization: With a fleet size exceeding 8,000 nodes, the complexity of managing stateful storage on Kubernetes slowed down the migration of the core message queue clusters to a cloud-native architecture.

Why AutoMQ

JD.com chose AutoMQ to decouple storage from compute, treating CubeFS as the primary source of truth.

True Cloud-Native Storage Offloading: AutoMQ adopts a genuinely cloud-native architecture by offloading data durability entirely to JD's CubeFS. Unlike legacy Kafka, which relies on the ISR (In-Sync Replicas) mechanism for data safety, AutoMQ eliminates the need for inter-broker replication. Data is written once to the shared storage, relying on CubeFS's inherent durability. This architectural shift addresses the "double redundancy" issue, reducing the total data footprint from 9 copies to just 3 (managed solely by the storage layer) and slashing storage costs by over 50%.
Seconds-Level Elasticity via HPA: By removing local state, AutoMQ brokers function as stateless compute units. This allows JD to utilize Kubernetes Horizontal Pod Autoscalers (HPA) triggered by real-time broker monitoring metrics. Partition reassignment—which previously took hours—now involves only metadata updates and completes in seconds, enabling the cluster to dynamically "breathe" with traffic load.
100% Compatibility: Despite the architectural shift, AutoMQ remained 100% compatible with the Kafka protocol. This was critical for JD's ecosystem, allowing over 1,400 existing applications to migrate seamlessly without a single line of code change.

The Results

Accelerating Cloud-Native Transformation at Scale

The introduction of AutoMQ has been a catalyst for JD.com's infrastructure modernization, proving its stability during the world's largest online shopping festivals.

Metrics in Production

200+

AutoMQ pods (16 vCPU, 64 GB RAM)

250 TB

Data under management

40 GiB/s

Peak throughput during 618 / Double 11 events

Accelerated Containerization: AutoMQ's stateless design significantly lowered the barrier for migrating Kafka to Kubernetes. This acceleration has led to the successful containerization of 4,000+ Pods (standardized on 8 vCPU, 64 GB RAM specs), moving a substantial portion of JD's massive 8,000-node fleet to a manageable, cloud-native environment.
Battle-Tested at 40 GiB/s: The new architecture seamlessly handles extreme throughput demands, processing over 40 GiB/s during peak events. It has run stably in production for nearly two years, supporting critical business lines during high-pressure sales events.
From Hours to Seconds: Cluster scaling efficiency improved dramatically. By leveraging HPA and stateless brokers, scaling operations that used to take hours now finish in seconds. This agility eliminated the need for massive over-provisioning, directly contributing to a 33%+ reduction in network bandwidth costs and significant storage savings.

Ready for 40 GiB/s scale?

Discover how AutoMQ can power your Kubernetes-native streaming infrastructure at any scale. Get a personalized demo and see the difference.

Try Demo Talk to an Expert

Case Study

Why JD.com Replaces Kafka with AutoMQ at 40 GiB/s Scale

The Challenge

Why AutoMQ

The Results

Accelerating Cloud-Native Transformation at Scale

Metrics in Production

Ready for 40 GiB/s scale?

See More Customer Stories

Grab

Tencent

LG U+