Apache Kafka to Diskless Kafka | Storage Architecture Evolution

Kafka's storage architecture has always carried the assumptions of the infrastructure around it. The original design made excellent sense for clusters built from machines with attached disks: append records to local logs, replicate those logs across brokers, and let consumers replay from durable partitions. That model made storage a first-class part of the broker rather than an external dependency.

The pressure started to show when the same architecture moved into cloud infrastructure. Local disks became provisioned block volumes. Broker replacement became a data movement event. Retention collided with expensive hot storage. Replication still protected the log, but it also multiplied storage, network, and operational work in an environment where object storage and elastic compute had changed the economics.

The history of Kafka storage is not a story of old designs being wrong. It is a story of each generation solving the bottleneck that mattered most at the time, then exposing the next one.

Generation 1: Local Disk Logs Made Kafka Fast and Predictable

Classic Kafka was built around the local append-only log. A partition had a leader replica on one broker and follower replicas on other brokers. Producers wrote to the leader, followers fetched records, and consumers read from the retained log using offsets. The storage model was clear: a broker owned the data on its disks, and the cluster distributed partitions across brokers for throughput and fault tolerance.

That design solved several hard problems at once:

High-throughput sequential I/O. Kafka leaned into append-heavy workloads, page cache behavior, and batching instead of treating storage as random database writes.
Durability through replication. The cluster could survive broker failures because committed records existed on more than one broker.
Operational transparency. Engineers could reason about storage by inspecting brokers, partitions, log segments, and replica lag.

The trade-off was that data and compute were tightly coupled. Adding brokers often meant partition reassignment and replica movement. Replacing a broker meant rebuilding local state. Increasing retention meant carrying more data on broker-attached storage. These trade-offs were manageable in data-center environments where storage locality was a performance advantage and machine-to-machine replication was part of the cluster's normal cost model.

Cloud deployments changed that context. A cloud instance is meant to be replaceable, but a Kafka broker with local partition state is not fully replaceable until its data has been copied somewhere else. Object storage offers durable, low-cost retention, but classic Kafka keeps retained data on broker storage until operators add another layer. The architecture still works, yet the operating environment has moved.

Generation 2: Tiered Storage Split Hot Data From Historical Data

Tiered storage emerged as a practical response to the retention problem. Instead of keeping all log segments on broker-local disks, Kafka can keep recent data locally and offload older segments to remote storage. Apache Kafka's tiered storage work, described through KIP-405 and related documentation, adds a remote log tier so clusters can extend retention without sizing every broker for the full historical footprint.

That shift matters because retention and compute no longer have to grow in lockstep. A team can keep a smaller hot set on broker storage while older segments live in object storage or another remote tier. If historical segments are already remote, rebuilding a broker does not require copying every retained byte from peer brokers.

Tiered storage is a major architectural step, but it is still not the same as diskless Kafka. The broker remains stateful for the hot path. Leaders still write to local logs, followers still replicate active partition state, and operators still need to think about disk sizing, hot data placement, local recovery, and the boundary between local and remote reads.

The distinction is easy to blur, so it is worth making explicit:

Architecture generation	Where active writes land	What remote storage changes	What remains coupled
Local disk Kafka	Broker-local disks	Nothing by default	Compute, hot storage, replication, recovery
Tiered storage Kafka	Broker-local disks first	Older segments move to a remote tier	Active log, broker state, hot storage planning
Diskless Kafka	Shared storage path rather than durable broker disks	Storage becomes the primary durable layer	Brokers still serve Kafka protocol, but durable state moves out

Tiered storage therefore solves a real pain: long retention on expensive broker storage. It does not fully remove the operational shape of stateful brokers. If your main problem is historical retention, tiered storage can be enough. If your main problem is elastic compute, fast broker replacement, and replicated hot disks, you are still dealing with the original coupling.

Cloud Infrastructure Changed the Cost Assumptions

Kafka's first storage model assumed that the cluster itself should own replication and retention. In a data center, that was a reasonable boundary. Disks were attached to servers, and the streaming system could not assume a highly durable, elastic, object-storage substrate underneath it.

In cloud infrastructure, the substrate is different. Object storage is no longer an archival side system used after the primary path finishes its work. It is a durable storage primitive with its own replication, lifecycle controls, and elastic capacity model. Compute instances are easier to add and remove than the data attached to them. Replication is not an invisible internal wire; it appears in the architecture and in the bill.

This creates a mismatch between the old storage model and the cloud operating model:

Replication duplicates work already present in the infrastructure. Kafka still needs stream-level durability semantics, but application-layer replica copying can overlap with durability that object storage already provides.
Elastic compute is slowed by state movement. A broker cannot disappear cleanly if it owns partition data that must be rebuilt elsewhere.
Retention competes with hot-path resources. Long-lived data consumes capacity on systems that also serve active producers and consumers.
Scaling events become storage events. What looks like adding compute can trigger partition reassignment, network traffic, and operational risk.

None of this means classic Kafka is obsolete. It means the design center has shifted. A storage architecture designed for local disks can run in the cloud, but it often carries data-center assumptions into an environment where shared durable storage and elastic compute are first-class services.

Generation 3: Diskless Kafka Moves Durable State Out of Brokers

Diskless Kafka pushes the separation further than tiered storage. Instead of treating remote storage as a colder extension of broker disks, the architecture moves durable log state out of broker-local disks. Brokers continue to speak Kafka protocol to producers and consumers, but they are no longer the long-term owners of durable partition data in the same way classic brokers are.

The key architectural move is the split between the serving layer and the durable storage layer. Producers and consumers still interact with Kafka APIs. Brokers still coordinate reads, writes, metadata, and protocol behavior. The difference is that durable data is written through a shared storage path and persisted in cloud storage rather than being anchored to local broker volumes.

That changes the failure and scaling model:

Scaling brokers can focus more on throughput and connection handling than on moving durable partition bytes.
Retention can follow the economics of object storage instead of the economics of broker-attached disks.
The system can preserve Kafka compatibility while replacing the storage ownership model under the API.

This is why diskless Kafka is more than "tiered storage with a more aggressive offload policy." Tiered storage keeps the broker-local log as the active source of truth and uses remote storage as an additional tier. Diskless architecture makes shared storage part of the active design, so the broker becomes closer to stateless compute from an operational perspective.

What Each Stage Solved, and What It Left Behind

The cleanest way to evaluate Kafka storage evolution is to ask what bottleneck each generation optimized for. Classic Kafka optimized for predictable throughput on attached disks. Tiered storage optimized for retention economics. Diskless Kafka optimizes for cloud-native elasticity and durable storage separation.

That does not make the choice universal. A small cluster with stable traffic and short retention may not need a fully diskless architecture. A regulated environment with strict storage placement requirements may evaluate the trade-offs differently. A platform team running many large clusters across cloud regions will feel the storage-compute coupling earlier because every broker lifecycle event becomes part of the operational burden. That is why the conversation has moved from "Can Kafka use object storage?" to "Which parts of Kafka should still be stateful?" Once object storage is treated as a durable stream-storage substrate, the architectural question shifts from offloading old bytes to redefining broker responsibility.

AutoMQ as a Production Implementation of the Diskless Direction

AutoMQ fits into this evolution as a production implementation of diskless Kafka rather than as a separate streaming API. The goal is to preserve Kafka protocol compatibility for applications while changing the storage architecture underneath: brokers become more stateless, durable data moves to shared cloud storage, and the cost model follows cloud-native storage more closely than replicated broker disks.

That distinction is important. Many teams do not want a different event streaming API; they want the Kafka ecosystem without the operational drag that comes from tightly coupled broker storage. The compatibility layer protects existing producers, consumers, connectors, and operational knowledge. The storage layer changes where the expensive and failure-sensitive parts of the system live: broker compute can become replaceable while durable stream data lives outside a single broker's disk.

Choosing a Storage Architecture for Your Stack

The practical question is not whether local disk Kafka, tiered storage, or diskless Kafka is "right." The better question is which constraint is shaping your platform decisions. If your retention window is short, traffic is steady, and operational maturity is high, classic Kafka may remain a reasonable fit. If retention is growing faster than hot traffic, tiered storage addresses a specific part of the problem.

Diskless Kafka becomes more compelling when the bottleneck is the storage ownership model itself. Teams usually reach that point when they need frequent scaling, fast broker recovery, long retention, and lower operational coupling between compute and data. At that stage, the old broker identity starts to look heavy: a broker is not only a process serving traffic; it is also a container for durable state that the platform must preserve, move, and repair.

Use these checks as a decision frame:

Retention pressure: Are you buying hot broker storage mainly to keep historical data available?
Recovery pressure: Does broker replacement require long data rebuilds or careful partition movement?
Elasticity pressure: Does adding or removing brokers trigger operational work out of proportion to the compute change?
Cost pressure: Are replicated disks and cross-node data movement a major part of the Kafka footprint?
Compatibility pressure: Do applications need Kafka APIs even if the storage layer changes?

If most answers are yes, the diskless direction deserves serious evaluation. It aligns Kafka's storage architecture with the infrastructure many teams are already using: elastic compute in front of durable shared storage.

Kafka's storage history started with an elegant local log. Tiered storage extended that log into lower-cost historical capacity. Diskless Kafka asks whether durable stream data should belong to brokers at all. For teams running Kafka on cloud infrastructure, that is the real evolution: keeping the Kafka API while letting the storage model catch up with the cloud.