Skip to main content

Architecture

Apache Kafka was developed in 2011 for traditional data centers, employing a classic Shared Nothing architecture to solve horizontal scalability challenges. Over time, it has evolved into a Tiered Storage architecture to capitalize on the cost advantages of cloud storage. Today, AutoMQ introduces a more seamless Shared Storage architecture, fully leveraging the benefits of cloud-native technologies. Compared to Apache Kafka, it provides ten times more cost efficiency and a hundredfold increase in operational efficiency.

Shared Nothing Architecture

The Shared Nothing architecture, a hallmark of Apache Kafka, addresses scalability issues in distributed storage software within traditional data center environments via a compute-storage integration approach. Kafka utilizes an ISR-based replication mechanism to ensure data reliability and system availability. As cloud computing has matured, the demand for business elasticity has grown, revealing the limitations of the classic Shared Nothing architecture. Scaling out Apache Kafka’s Broker nodes involves extensive data replication to complete reassignment, which typically takes several hours. Additionally, Apache Kafka relies on three-replica replication. When deployed on the cloud, users are presented with two storage options:
  • Selecting cloud storage like EBS for Brokers is costly. EBS’s inherent three-replica mechanism, coupled with ISR replication, results in data being stored nine times, leading to significant waste of storage space.
  • Opting for local disks as the storage medium for Brokers offers more manageable costs, but users face high operational expenses, which negates the benefits of cloud adoption.

Tiered Storage Architecture

As cloud computing has matured and expanded in scale, the initial beneficiary has been object storage. Featuring low storage costs and a pay-as-you-go model, object storage has driven significant evolution in storage software towards the Tiered Storage architecture. As the name suggests, this architecture introduces a secondary storage layer. Here, data is asynchronously transferred to the secondary storage after being initially stored in the primary storage. This approach can partially leverage the cost advantages of object storage while mitigating the elasticity issues of the Shared Nothing architecture. However, the Tiered Storage architecture doesn’t fundamentally address the challenges of Apache Kafka for several reasons:
  • While space usage for primary storage can be decreased, the scope of this reduction varies depending on the scenario, necessitating a thorough capacity evaluation. Additionally, the high EBS costs related to ISR replication are not fully alleviated.
  • The problem of slow scalability persists; scaling up or down requires reassigning data in primary storage, potentially reducing the time needed from several tens of hours to just a few hours.
In simple terms, the primary storage in the Tiered Storage architecture isn’t fundamentally different from a Shared Nothing architecture; aside from space reductions, the partition storage layout on the file system and the ISR replication mechanism remain the same.

Shared Storage Architecture

AutoMQ’s Shared Storage architecture entirely replaces the Apache Kafka storage layer, innovatively offloading data to cloud storage, which makes Brokers stateless. AutoMQ’s shared storage architecture comprises both WAL storage and object storage, with all data being stored in near real-time into object storage. In this architecture:
  • Object storage serves as the primary data repository, offering elastic, pay-as-you-go, and low-cost data storage solutions.
  • Due to the high latency and low IOPS characteristics of object storage, AutoMQ introduces a WAL storage layer to enhance data write efficiency and reduce IOPS consumption.
  • WAL storage can utilize multiple storage services across different cloud platforms. Options include Regional EBS services with multi-AZ disaster recovery, file storage services like AWS EFS and FSx, and even object storage serving simultaneously as WAL and main data storage.
AutoMQ encapsulates these storage modules into a custom-developed streaming storage library—S3Stream—and replaces Apache Kafka’s native Log storage with S3Stream, resulting in completely stateless Broker nodes. This facilitates features like second-level partition reassignment, automatic scaling, and continuous traffic rebalancing. For this purpose, AutoMQ integrates several Controller components within the kernel, such as Auto Scaling and Auto Balancing components, which are responsible for cluster scaling and traffic rebalancing, respectively. It’s important to note that the open-source version of AutoMQ supports only S3-compatible storage services as WAL storage options. This means the open-source AutoMQ is a cloud-native Kafka implementation that relies solely on object storage, leading to a simplified deployment architecture. In contrast, the commercial version of AutoMQ offers different WAL options across various cloud providers to support a wider range of workloads.

Reference

[1]. Kafka ISR Replication Mechanism: https://kafka.apache.org/documentation/#replication [2]. WAL Wiki:https://en.wikipedia.org/wiki/Write-ahead_logging