Qianxun SI Efficiently Processes Tens of Billions of Messages Daily Using AutoMQ, Replacing Alibaba Cloud Kafka

About Qianxun SI

Qianxun SI is a space-time intelligence infrastructure company founded in August 2015. The company leverages foundational positioning data from the BeiDou Satellite System (compatible with GPS, GLONASS, and Galileo), over 5000 globally distributed GNSS satellite/ground-based enhancement stations, self-developed positioning algorithms, and a large-scale internet service platform to provide centimeter-level positioning, millimeter-level sensing, and nanosecond-level timing services. In October 2019, the company completed a Series A financing round of 1 billion RMB, with a valuation exceeding 13 billion RMB.

Business Background

Since its establishment in 2015, Qianxun SI has been using the BeiDou Satellite System and its positioning algorithms to provide industry solutions in various fields such as water conservancy, mining, smart cities, and smart transportation, empowering multiple industries. The large volumes of data and corresponding monitoring, trace, and log information from internal positioning base stations and hardware terminal devices need to be distributed via Apache Kafka to downstream consumers for further analysis and processing. With rapid business growth, daily data processing has reached tens of billions.

Initially, Qianxun SI used Kafka, but with the growth of hardware terminals and the rapid development of the business, the corresponding data traffic also increased rapidly, leading to increasingly severe issues with Kafka. Thanks to AutoMQ's architecture, which offloads durability to shared storage such as object storage and block storage, it not only significantly reduces our costs compared to Kafka but also offers rapid elasticity and ease of maintenance. This makes AutoMQ the optimal choice for cost optimization and architectural improvement. With AutoMQ’s 100% compatibility with Apache Kafka, Qianxun SI can seamlessly migrate to AutoMQ without needing to adjust the existing architecture or other surrounding facilities of Apache Kafka, resolving a series of past pain points. The following diagram illustrates the data platform architecture of Qianxun SI after applying AutoMQ:

Qianxun Position writes data to AutoMQ in several main categories:

System Monitoring Data : A component similar to falcon_agent directly sends machine monitoring data to AutoMQ. This primarily includes virtual machine system monitoring data such as disk, CPU, and network metrics. Utilizing this data, Qianxun Position's downstream monitoring and alerting systems can promptly identify anomalies in virtual machines.
User Data : User data is mainly sent to AutoMQ via Kafka Producer or Logtail. The data from Logtail originates from hundred of millions of terminals on Qianxun SI. These include industry drones, surveying equipment, and vehicle-mounted positioning terminals. The write throughput is 600MB/s , with over 10 billion messages written daily. In addition to general user data, there are two critical data pipelines with a write throughput of 150MB/s , which must ensure low latency and no data loss to avoid impacting the accuracy of critical reports. Once ingested, user data is analyzed by other applications to produce important reports such as user behavior analysis, daily active user trends, key customer analysis, and base station data analysis. These reports aid in enterprise decision-making and optimize the service levels of Qianxun Position's various products.
Application Trace/Metric Data : Trace and metric information from applications is sent to AutoMQ through skywalking agent, used for real-time monitoring, alerting, and issue diagnosis of applications.

Why Choose AutoMQ

For Qianxun Technology, the primary goal is to quickly reduce costs and address Kafka's elasticity issues. The cost-related pain points of Kafka are mainly reflected in the following aspects:

High Storage Costs : Kafka, based on Apache Kafka's ISR multi-replica mechanism, ensures data durability. Including three replicas and the price of SSD cloud disks, the cost per unit storage reaches up to 3.x RMB(0.5 USD appr.)/GB per month . In the cloud, EBS already offers high durability with multiple replicas. However, Apache Kafka's ISR mechanism causes additional storage space waste. Additionally, EBS, being a high-performance and highly durable block storage, has relatively high unit storage costs.
Lack of Separation Between Storage and Compute Leading to Extra Costs : Kafka essentially follows a coupled storage and compute architecture. When needing to support higher write throughput by scaling up, both compute and storage need to be scaled simultaneously, which is unfriendly to Qianxun Position's application scenarios. All data from Qianxun Position is first stored in various types of databases, with Kafka retaining data for only 2 hours . When Kafka's capacity is insufficient to support write traffic and needs scaling, both compute and storage layers need to be expanded simultaneously, resulting in significant storage space waste. Clients still have to pay for these limited resources, causing costs to escalate substantially with increased write traffic.
Lack of Elasticity : The integrated architecture of Kafka for both storage and computation necessitates partition data replication during scaling operations. This process not only consumes significant disk and network I/O resources but is also time-consuming, preventing rapid cluster scaling. As the business grows, the Qianxun Position Kafka cluster inevitably faces capacity constraints requiring expansion to handle higher traffic. During Kafka scaling, partition data must be reassigned to new nodes, which is a time-intensive process requiring manual intervention and incurring high operational costs.

After thoroughly researching AutoMQ, we found that its innovative storage architecture can significantly reduce the cost of our Kafka cloud bill while providing strong elasticity, effectively addressing the current pain points of Qianxun Position:

Significantly Reduced Compute and Storage Costs : AutoMQ stores all data in object storage, priced at 0.12 CNY(0.017 USD appr.)/GB per month , offering a profound cost advantage over Kafka's SSD-based multi-replica storage . Additionally, AutoMQ's innovative shared storage architecture means that the compute layer Brokers are stateless. Coupled with AutoMQ's built-in self-balancing capability, we can quickly, safely, and automatically scale the compute layer independently. This allows us to downsize the cluster during off-peak periods or in clusters with reduced traffic, saving costs.
More Economical Separation of Compute and Storage : Unlike Kafka, AutoMQ completely decouples the compute and storage layers. The storage layer is used on-demand, while the compute layer can be independently scaled quickly and safely according to the required traffic. This flexible architecture ultimately translates to cost savings. Qianxun Position's traffic model primarily involves high write traffic with short storage retention times. Therefore, AutoMQ allows us to scale the compute layer independently without increasing storage costs, offering significant savings compared to Kafka.
Extreme Elasticity Frees Up Operations : The cost of Qianxun Position's data infrastructure not only comes from cloud resources and services but also includes human resources. AutoMQ thoroughly resolves Kafka's elasticity issues. AutoMQ offloads data durability to cloud storage, eliminating the need for partition data reassignment during scaling. Only metadata modifications are required, enabling partition reassignment in seconds. Additionally, its built-in automatic self-balancing component helps balance traffic during scaling, avoiding data skew issues. Previously, Kafka cluster scaling required tens of minutes and manual traffic redirection to complete partition reassignment. To ensure smooth scaling, we needed to prepare contingency plans and have the entire team on standby during off-peak hours. With AutoMQ, cluster scaling becomes a low-risk, automated, routine operation, significantly reducing the manpower required for Kafka cluster scaling and making the entire process safer, faster, and more reliable.

Another important reason for choosing AutoMQ is its 100% compatibility with Apache Kafka . Qianxun Position has built a large number of applications and data infrastructure around Apache Kafka. Thanks to AutoMQ's full compatibility with Apache Kafka, our overall data platform architecture and various upstream and downstream data infrastructure can migrate seamlessly without any modifications, greatly reducing migration difficulty and eliminating potential migration risks.

Implementation of AutoMQ at Qianxun Position

Thanks to AutoMQ's 100% compatibility with Apache Kafka, the entire migration process was very smooth. Components such as Skywalking and Logtail were fully compatible with AutoMQ after testing, requiring no modifications. After completing the PoC, we successfully migrated from Kafka to AutoMQ using a traffic-cutting approach.

Flume implements dual-write, writing to both the old Kafka and the new AutoMQ
All consumption and delivery systems switch to the new AutoMQ
Gradually replace the old logtail client on the servers with the new client, which points to AutoMQ
After the gradual switch, decommission the old Flume and Kafka

Benefits and Outlook

Overall, AutoMQ is a next-generation Kafka with significant advantages in cost, performance, and elasticity. After deploying AutoMQ, costs are expected to decrease by over 50% .

Join AutoMQ Community on Slack

Communicate with AutoMQ's experts and community contributors.

About Qianxun SI

Business Background

Why Choose AutoMQ

Implementation of AutoMQ at Qianxun Position

Benefits and Outlook

Table of contents

Start Your AutoMQ Journey Today

Why AutoMQ

AutoMQ vs Others

Customers

Product

Cloud Partner

Solutions

Technical

Industry

Resources

Documentation

Blog

Community

Policy

About

Company

Link