Kafka architecture reviews used to begin with brokers, partitions, and replication factor. Those still matter, but the more revealing question is now about storage: where does durable log data live, and how tightly is it bound to broker compute? The answer shapes disk capacity, recovery time, scaling operations, retention cost, and whether Kafka behaves like elastic cloud infrastructure or a carefully balanced stateful system.
This article uses a three-architecture frame: local broker disks, tiered storage, and shared object storage. It is an analytical frame, not an official industry standard or a Kafka project taxonomy. The point is to separate storage contracts that are often compressed into phrases such as "Kafka on object storage" or "Kafka with remote storage."
Why Kafka Storage Architecture Matters
Kafka is a log system, so storage is not an implementation detail hidden behind the broker. The storage model defines the write path, the read path, durability assumptions, replica placement, failure recovery, and day-two operations. A cluster that stores authoritative log data on broker-local disks has a different operational shape from a cluster that offloads older segments to a remote tier, and both differ from a system that treats object storage as the primary shared durable layer.
The distinction is especially important in cloud environments. Compute instances are replaceable, availability zones are failure domains, and object storage services are designed for massive durable capacity. Traditional Kafka can run well there, but its storage model was built around brokers owning local logs. That ownership creates predictable behavior and maturity, but it also creates cloud pain when teams need rapid elasticity, long retention, or fast recovery without moving large volumes of data.
For architects, the concrete question is: if a broker is removed, what durable data has to be rebuilt, copied, or reassigned before the cluster returns to a healthy state? The answer reveals more than a vendor label.
Architecture 1: Local Broker Disks
The traditional Kafka storage architecture places partition replicas on broker-local log directories. Producers write to the leader replica for a partition, follower replicas fetch from the leader, and durability depends on Kafka replication, leader election, and the in-sync replica set. This mature model works well when workload shape is stable and the operating team can plan capacity ahead of time. Hot reads benefit from local disk and page cache behavior, replication is explicit, and failure domains can be reasoned about through broker placement, rack awareness, replication factor, and controller metadata.
The tradeoff is coupling. Broker compute and durable storage scale together because each broker owns local replicas. If retention increases, local disk pressure increases. If the cluster adds brokers for throughput, operators may need partition reassignment and replica movement. If a broker or volume fails, recovery depends on Kafka's replica model and surviving in-sync replicas. This becomes expensive when the environment expects cloud-style elasticity.
Local disk Kafka also pushes teams to over-provision for traffic, retention, burst headroom, compaction, reassignment, and safety margins. In cloud deployments, that can mean paying for broker resources sized around retained history rather than active throughput.
Architecture 2: Tiered Storage
Kafka tiered storage changes the retention side of the problem. In Apache Kafka's remote log storage model, completed log segments can be moved to a remote tier while active and recent data remain on broker-local storage. The local tier serves the hot path, while the remote tier extends retention without requiring every retained byte to stay on broker disks. For workloads dominated by long retention rather than frequent resizing, this is often the most direct improvement.
Tiered storage should not be read as full storage-compute separation. The broker may still own active local replicas. Producer acknowledgment, leadership, replication, and tail reads are still shaped by the local Kafka write path. Adding brokers can still involve data placement. Broker replacement can still depend on active replica state. Tiered storage moves older closed segments away from broker disks; it does not automatically make brokers stateless compute nodes.
The nuance matters because "Kafka object storage" can describe several designs. A tiered-storage cluster may use object storage as a remote retention tier, while a shared-storage Kafka system may use it as the primary durable foundation. Similar cloud services can sit behind very different failure and scaling behavior.
Architecture 3: Shared Object Storage
Shared object storage starts from a different question: can durable stream data be separated from broker identity? In this model, brokers still serve Kafka clients, handle protocol requests, maintain runtime state, cache hot data, and participate in coordination. The architectural shift is that authoritative durable data is stored in a shared layer rather than permanently pinned to one broker's local disk. This is the storage architecture behind many "stateless Kafka" discussions: not stateless in every runtime sense, but no longer making durable log ownership depend on a particular broker disk surviving or being rebuilt.
Shared storage can make scaling and recovery more cloud-native because broker changes become closer to compute scheduling. A new broker can be assigned work over shared durable data, warm cache, and serve traffic without first copying the full historical log from another broker.
The hard part is correctness. Object storage by itself is not a Kafka storage engine. A shared-storage implementation needs a precise write path, fencing model, metadata layer, cache strategy, and read path. It must answer when an acknowledged record is durable, how stale writers are blocked, how readers find the right objects, and how the system behaves when object storage latency rises.
Comparing the Write Path
The write path is where architecture labels become real. In local-disk Kafka, the leader broker appends to its local log, followers replicate from the leader, and producer acknowledgment depends on the configured acknowledgment level and in-sync replicas. Tiered storage adds a remote lifecycle path after segments are closed, but the immediate write path still lands in the local Kafka log. Shared storage has to define a different durability contract, often combining a write-ahead path, cache, object storage layout, and metadata coordination so brokers can acknowledge records without treating local disks as the permanent home of the log.
| Architecture | Primary durable write target | Broker-local disk role | Main design strength |
|---|---|---|---|
| Local broker disks | Local partition replicas | Authoritative log storage | Mature behavior and predictable Kafka operations |
| Tiered storage | Local active log, remote closed segments | Hot data and active replicas | Longer retention with less local disk pressure |
| Shared object storage | Shared storage layer behind Kafka-compatible brokers | Cache or runtime acceleration | Storage-compute separation and faster compute replacement |
The practical review question is simple: after a produce request is acknowledged, what must survive for that record to remain durable? If the answer is the broker-local replica set, you are evaluating a local or tiered model. If the answer is the shared storage contract plus metadata and fencing, you are evaluating a shared-storage model.
Comparing the Read Path
Kafka reads are not all equal. Tail reads near the newest offsets are latency-sensitive and often served from memory or local hot data, while historical reads may be backfills, incident replay, audit jobs, or downstream rebuilds. In local-disk Kafka, reads come from broker-local logs and page cache, but long retention means the cluster must keep historical segments on broker storage. In tiered storage, recent reads still use the local tier, while older reads can be served from remote storage through Kafka.
In shared-storage Kafka, reads usually combine cache with object storage access through a storage layer. A well-designed system avoids sending every fetch directly to object storage, because raw object APIs are not the same as a low-latency stream read path. Cache, prefetch, batching, index metadata, and storage layout become core design choices.
This is why teams should test read behavior with real traffic patterns. A system can look attractive on retention cost and still disappoint if consumer catch-up, replay, or fan-out reads are not planned. Conversely, a shared-storage design can be strong when it uses object storage for durable capacity while keeping hot-path reads close to compute through cache.
Scaling and Recovery
Scaling exposes the core difference between retention optimization and storage-compute separation. With local broker disks, adding capacity often leads to partition reassignment and data movement. Tiered storage can reduce how much historical data remains local, but if active replicas still live on broker disks, scaling remains partly a data placement problem. Shared storage aims to move less durable history during compute changes; scaling becomes more about assigning ownership, fencing stale writers, warming cache, and ensuring network and storage service capacity.
For recovery, the same pattern holds. Local-disk Kafka depends on surviving replicas and rebuilding desired replica placement. Tiered storage can help with older segments, but active local state still matters. Shared storage should let a healthy broker resume responsibility after metadata and fencing checks, while local cache loss is treated as a performance event rather than durable data loss.
Where AutoMQ Fits
AutoMQ fits into the shared-storage architecture category. It is Kafka-compatible at the protocol layer, so existing Kafka clients, Kafka Connect, Kafka Streams, and operational patterns can remain relevant. Its storage architecture uses S3Stream as a shared streaming storage layer on object storage, while brokers are designed as a compute layer rather than the permanent home of durable Kafka logs.
That placement is important. AutoMQ is not best understood as "Kafka with a remote archive." It is closer to a Kafka-compatible compute layer over a shared object-storage-backed stream store. Local resources can still be used for cache and runtime efficiency, but durable log ownership is moved away from broker-local disks.
For enterprise architects, the relevant evaluation is not whether object storage appears in the diagram. It is whether the system preserves Kafka semantics while changing the operational contract around storage. In AutoMQ's case, examine S3Stream, stateless broker behavior, write-ahead durability, cache design, object storage layout, and broker replacement under production-shaped traffic. Tiered storage is a strong answer when the main question is retention. AutoMQ's shared-storage design is aimed at cases where the main question is storage-compute separation: elastic scaling, faster broker replacement, less local data movement, and a cloud storage cost structure based on object storage.
Decision Matrix
No storage architecture is universally right. The decision depends on workload stability, retention, latency targets, recovery objectives, and the operational model your platform team wants to own. Use local broker disks when your Kafka estate is stable, your team is comfortable with Kafka operations, low-latency tail reads dominate, and retention is manageable within broker capacity.
Use tiered storage when retention is the main pressure. If teams need longer replay windows or compliance retention but do not need to redesign broker ownership, tiered storage can be a pragmatic step. It keeps recent data local while moving older completed segments to a remote tier.
Use shared storage when elasticity and recovery dominate the decision. If scaling, broker replacement, disk management, and data movement are the recurring pain points, a Kafka-compatible shared-storage architecture deserves evaluation. It asks harder questions about storage-layer correctness, but it can better match cloud-native infrastructure assumptions.
| Decision factor | Local broker disks | Tiered storage | Shared object storage |
|---|---|---|---|
| Best fit | Stable clusters and familiar Kafka operations | Long retention with familiar broker model | Elastic cloud-native Kafka design |
| Retention pressure | Broker disks carry full retained history | Older closed segments move remote | Object storage is primary durable layer |
| Scaling behavior | Often data-movement-heavy | Less historical pressure, active data still matters | Compute assignment over shared durable data |
| Broker recovery | Replica election and rebuild paths | Remote history helps, active replicas still matter | Replace compute after fencing and metadata checks |
| Operational complexity | Known Kafka complexity | Kafka plus remote tier operations | Storage-layer correctness plus cloud storage operations |
| Key proof to demand | Reassignment and failure drills | Remote fetch and metadata consistency tests | Produce durability, fencing, cache, and recovery tests |
The cleanest proof is a workload-shaped drill. Define your hot window, retention window, target recovery time, consumer replay behavior, and scale-out scenario. Then measure what moves during failure and scaling: durable data, remote segments, metadata, cache, or compute assignments.
References
- Apache Kafka Tiered Storage documentation
- Apache Kafka documentation: Replication
- Amazon S3 data durability documentation
- Amazon EBS volumes documentation
- AutoMQ S3Stream overview
- AutoMQ Stateless Broker documentation
FAQ
What is Kafka storage architecture?
Kafka storage architecture describes where Kafka log data is written, retained, replicated, read, and recovered. The main patterns are broker-local disks, tiered storage with a remote tier for older segments, and shared storage where durable log data is separated from broker identity.
Is Kafka tiered storage the same as shared storage?
No. Kafka tiered storage usually keeps active and recent data on broker-local disks while moving older completed segments to remote storage. Shared storage treats the shared layer as the primary durable home of stream data, allowing brokers to behave more like replaceable compute.
Does Kafka object storage mean brokers are stateless?
Not automatically. Object storage can be used as a remote retention tier or as a primary shared storage layer. Brokers become more stateless only when durable log ownership is separated from broker-local disks and protected by correct metadata, fencing, and write-path semantics.
When should an enterprise choose local broker disks?
Local broker disks are a good fit when clusters are stable, the team has strong Kafka operations experience, retention is moderate, and predictable tail-read performance is more important than rapid elasticity.
When is tiered storage enough?
Tiered storage is often enough when the main problem is retention growth. It helps teams keep longer history without sizing broker disks for the entire retention window, while preserving the familiar Kafka operating model.
Where does AutoMQ fit in Kafka storage architecture?
AutoMQ fits in the shared-storage category. It provides Kafka-compatible brokers over S3Stream, a shared streaming storage layer on object storage, so durable Kafka log data is not permanently tied to broker-local disks.