"Our biggest concern was how to grow capacity without the operational headaches of traditional Kafka. Adding a broker used to mean hours of partition reassignment and nervous monitoring. AutoMQ completely changed this by decoupling storage to COS. Scaling our 100+ broker cluster is now instantaneous, allowing us to confidently face any traffic surge while cutting our total cost in half."
Infrastructure Team, Tencent Music
The Challenge
As a global leader in music streaming with platforms like QQ Music, Kugou Music, and Kuwo Music, Tencent Music Entertainment (TME) processes massive amounts of real-time data daily. With peak traffic often exceeding 1 GiB/s during major promotional events and concerts, the team faced critical infrastructure challenges:
- Scaling Bottlenecks: Traditional Kafka's stateful architecture meant that adding brokers required painful, time-consuming partition reassignment. Scaling the 100+ broker cluster was a high-risk, manual operation often taking hours, creating operational anxiety before major traffic events.
- Spiraling Costs: The legacy architecture demanded persistent high-performance local disks attached to each broker. This tight coupling of compute and storage led to massive over-provisioning—just to ensure the system could handle peak loads, expensive resources sat idle most of the time.
- Operational Complexity: Managing stateful brokers at scale required constant capacity planning, intricate rebalancing procedures, and specialized operational expertise that distracted engineering teams from building new features.
Why AutoMQ
TME chose AutoMQ to fundamentally re-architect their streaming infrastructure for the cloud era.
- True Cloud-Native Decoupling: AutoMQ offloads all data persistence to Tencent Cloud Object Storage (COS). The brokers become stateless compute units, eliminating the need for expensive, attached block storage. This directly addresses the cost issue by allowing independent scaling of compute and storage.
- Instantaneous Elasticity: With brokers no longer holding data locally, cluster scaling becomes a metadata operation. Adding or removing nodes takes seconds, not hours. This agility allows the team to scale proactively and confidently, even minutes before a major live-streaming event.
- Seamless Compatibility: Despite the radical architectural change, AutoMQ maintained 100% Kafka protocol compatibility. TME's existing producers, consumers, and tooling worked without modification, significantly de-risking the migration.
The Results
A New Era for Music Streaming Infrastructure
The migration to AutoMQ has delivered transformative results for TME's streaming platform.
Key Achievements
Total Cost of Ownership reduction
Peak traffic handled stably
Brokers in a single cluster
Scaling time (down from hours)
- Cost Halved: By decoupling storage to COS and eliminating the need for expensive attached disks, TME achieved a 50% reduction in total cost of ownership. Compute resources are now right-sized rather than over-provisioned.
- Eliminated Scaling Anxiety: The 100+ broker cluster can now be scaled instantly. The team no longer dreads traffic events; they can respond in real-time or even proactively adjust capacity based on anticipated demand.
- Rock-Solid Stability: The new architecture has run stably in production, handling peak traffic exceeding 1 GiB/s during major events. The operational burden of managing stateful Kafka is gone, freeing up the team to focus on delivering better experiences for millions of music fans.
Scaling a large Kafka cluster?
Learn how AutoMQ can eliminate your scaling bottlenecks and cut costs in half. Get a personalized demo.

