Blog

Kafka Tiered Storage Architecture: What It Solves and What It Does Not

Kafka tiered storage is one of the most practical improvements to the Kafka storage model because it attacks a real operational problem: retention keeps growing faster than broker disks. Teams want longer replay windows, audit history, and backfill capacity, but traditional Kafka asks broker-local storage to carry that history.

Tiered storage changes that pressure point. Instead of keeping every retained segment on broker disks for the full retention window, Kafka can keep hot data locally and move older completed log segments to a remote storage tier. That remote tier may be object storage, HDFS, or another implementation provided through Kafka's remote storage interfaces. The result is not magical, but it is useful: broker disks can be sized around recent data and operational headroom rather than all historical data.

The mistake is treating tiered storage as the same thing as stateless Kafka. It is not. Tiered storage reduces local retention pressure; it does not automatically make brokers pure compute nodes, remove active local replicas, or eliminate data movement during every scaling and recovery scenario. That boundary matters when architects compare Kafka tiered storage with shared-storage Kafka systems such as AutoMQ.

Kafka Tiered Storage Data Path

What Kafka Tiered Storage Is

Apache Kafka's tiered storage work is commonly discussed through remote log storage. The basic idea is straightforward: Kafka logs are split into segments, and once segments are no longer active, they can be copied to a remote tier. Kafka then manages metadata so future fetches can locate records even if the corresponding segment is no longer present on the broker's local disk.

This model follows Kafka's access pattern. Producers append to the end of the log, many consumers read near the tail, and older data is often read for replay, compliance, incident recovery, training jobs, or rebuilding downstream state. Tiered storage lets the storage lifecycle reflect that difference.

The architecture has three important pieces:

  • Local hot data. Active log segments and recent records remain on broker-local storage. This local tier is still important for produce throughput, tail reads, page cache behavior, and replica coordination.
  • Remote segments. Completed segments can be copied to remote storage and later removed from the local tier according to local retention rules. Total topic retention can remain longer than local retention.
  • Remote log metadata. Kafka needs metadata to map topic partitions, offsets, leader epochs, and segment locations to the correct remote objects or files.

That combination makes tiered storage a storage lifecycle feature, not merely a backup feature. Consumers can still fetch older records through Kafka APIs, and operators can reason about retention without keeping all retained bytes on every broker that owns a replica. But the Kafka cluster still has to preserve ordering, durability, leader ownership, and fetch semantics above that remote tier.

How the Remote Log Path Works

The produce path normally starts on the broker leader for a partition. Producers send records, the leader appends them to the local log, and followers replicate according to Kafka's replication model. Tiered storage does not remove this active write path. It adds a second lifecycle path for segments after they are closed and eligible for remote storage.

Once a segment is copied to the remote tier and the relevant metadata is available, Kafka can delete the local copy when local retention allows it. This is where broker disk pressure improves: the cluster reserves local disk for the local retention window plus operational headroom, not the entire retained history.

Fetch behavior then depends on where the requested offset lives. A consumer reading recent offsets may be served from the local log or page cache. A consumer replaying older offsets may cause Kafka to read from the remote tier. This is why tiered storage is a good fit for workloads where historical reads exist but do not dominate the hot path.

The remote path is not a free background detail. It introduces several design surfaces that teams need to test:

  • Remote read latency. Historical replay may be slower or more variable than local disk reads, depending on the remote storage system and network path.
  • Metadata correctness. Segment metadata must remain consistent across leader changes, deletion, retention, and fetch requests.
  • Storage permissions and lifecycle. Object lifecycle rules, bucket policies, encryption settings, and access paths become part of Kafka operations.
  • Failure modes. Remote storage throttling, elevated latency, or temporary unavailability can affect historical fetches and segment lifecycle tasks.

Those concerns make tiered storage an architecture choice that should be validated with the same seriousness as replication factor, partition count, and broker sizing.

What It Solves Well

Tiered storage is strongest when the dominant pain is retention. Many Kafka clusters have a mismatch between hot data and retained data. The hot window may be measured in hours or a few days, while retention requirements stretch much longer. Without tiering, broker storage must absorb both. That can lead to larger disks, more brokers than traffic requires, longer maintenance operations, and awkward retention compromises.

Tiered storage gives platform teams a better set of tradeoffs:

ProblemHow Tiered Storage HelpsWhat Still Needs Design
Long retentionOlder closed segments can live outside broker disksRemote storage lifecycle and metadata consistency
Broker disk pressureLocal retention can be shorter than total retentionHot data, page cache, and active replica capacity
Historical replayOlder offsets can remain fetchable through KafkaRemote fetch latency and throughput planning
Cost controlLocal disk growth can slow as retention growsRequest costs, network path, and storage class choices

The most credible case for tiered storage is not that it makes Kafka effortless. It is that it lets teams stop using broker-local disks as the only place where retained history can live. For data infrastructure teams under pressure to keep more history without multiplying broker disk capacity, that is a meaningful improvement.

It also preserves the familiar Kafka operating model. Topics, partitions, offsets, consumers, and client APIs remain Kafka, so operators do not need to introduce a new event streaming abstraction to application teams.

What It Does Not Solve

The limit appears when the problem is not retention but broker-data coupling. Kafka's traditional design binds partition replicas to broker-local log directories. Even with tiered storage enabled, brokers may still be responsible for active local replicas and recent data. If a cluster needs to rebalance hot partitions, replace brokers, or scale around traffic spikes, local state can still shape the operation.

This is where the phrase "Kafka stores data in object storage" becomes too vague. A tiered-storage Kafka cluster may store older segments in object storage, while a shared-storage Kafka architecture may treat object storage as the primary durable layer. Both involve remote storage, but the broker contract is different.

Tiered Storage vs Shared Storage

Tiered storage does not automatically provide:

  • Stateless brokers. Brokers can still own active local log state. A broker process may be replaceable in an operational sense, but the architecture has not necessarily removed durable log ownership from broker-local storage.
  • Data-movement-free scaling. Adding brokers may still require partition reassignment, local catch-up, or replica movement for hot data and balance.
  • Instant broker recovery. Remote historical segments help with retention, but active data recovery still depends on Kafka's replication and leader election model.
  • A new write durability model. Producer acknowledgments are still tied to the active log and replication path, not to a remote object store becoming the primary write substrate.

That distinction changes capacity planning, failure drills, and the operational cost of cluster resizing. If the main problem is "we need to retain more data," tiered storage is directly relevant. If the main problem is "every broker operation turns into a data movement project," tiered storage may leave the core coupling in place.

Tiered Storage vs Shared Storage

Shared-storage Kafka starts from a different question: what if durable stream data is separated from broker identity? In that model, brokers still serve Kafka clients, handle protocol requests, manage runtime state, and cache data. The difference is that persistent log data is not treated as something permanently owned by one broker's local disk.

That is why shared storage is often associated with stateless or more stateless brokers. Stateless does not mean no state exists; brokers still have connections, metadata, assigned partitions, buffers, cache, and metrics. It means the broker should not be the long-lived durable home of a partition's authoritative log data.

The hard part is correctness. A shared-storage system needs a precise write path: when does an acknowledgment become durable, how are stale writers fenced, how are leader epochs handled, and how do readers find the correct data after ownership changes? Object storage alone does not answer those questions. The streaming storage layer above it does.

AutoMQ fits here as a Kafka-compatible system that goes beyond tiered storage. AutoMQ uses object storage as the primary storage foundation through S3Stream, its shared streaming storage layer, while brokers are designed to be more stateless. Durable data is separated from broker-local disks, and local resources are used for runtime needs such as caching rather than as the permanent home of Kafka log history. The architectural goal is less data movement during scaling, recovery, and broker replacement.

This is the decision boundary: tiered storage is a retention architecture, while shared storage is a broker ownership architecture. A team may need one, the other, or both.

When Tiered Storage Is Enough

Tiered storage is enough when retention is the main problem and the existing Kafka operating model is otherwise acceptable. If your clusters are stable, scaling events are planned, broker replacement is manageable, and application teams mainly need longer replay windows, tiered storage can be the right move. It lets you improve storage economics without changing the Kafka mental model your teams already use.

It is also a good fit when historical reads are important but not latency-critical. A consumer that replays old data for recovery or offline processing can often tolerate a different performance profile than a consumer reading the tail of a topic. In that case, keeping recent data local and older data remote matches the workload.

When Tiered Storage Is Enough

Evaluate shared-storage Kafka when the pain has moved beyond retention:

  • Elasticity pressure. You need to add or remove broker capacity frequently, and data movement dominates scaling time.
  • Recovery pressure. Broker replacement, disk loss, or replica restoration creates operational risk or long maintenance windows.
  • Cloud cost pressure. You are paying for broker resources sized around storage history rather than current traffic.
  • Platform simplification. You want Kafka compatibility but a storage model closer to cloud object storage and replaceable compute.

The cleanest evaluation starts with a failure drill. Pick a production-shaped workload, define the hot data window, trigger broker loss, run historical replay, and resize the cluster. Watch what moves: metadata, cache, remote objects, local replicas, or all of them.

Practical Evaluation Checklist

Before enabling tiered storage or choosing a shared-storage platform, turn the architecture into questions your team can test. The storage medium matters, but the system behavior matters more.

QuestionWhy It Matters
Which data stays local, and for how long?Local retention controls disk pressure, hot read behavior, and operational headroom.
What happens when a consumer fetches old offsets?Remote reads affect replay time, backfill throughput, and recovery expectations.
How is remote log metadata stored and recovered?Metadata errors can make valid data hard to locate or unsafe to serve.
Does broker replacement copy durable data?This reveals whether the architecture changed broker ownership or only retention.
Where does producer durability happen?Acknowledgment semantics define the real write path, not the storage marketing label.
How does the system behave under remote storage latency?Object storage and network variance must be part of the failure model.

For AutoMQ specifically, the relevant questions are about the shared-storage contract: how S3Stream stores durable stream data, how brokers are fenced, how cache affects read latency, and how scaling behaves when brokers are added or removed.

The Architectural Boundary

Kafka tiered storage deserves a fair reading. It solves a real problem, and for many teams that problem is urgent. Long retention on broker-local disks is expensive, operationally heavy, and often mismatched with how data is actually read. Moving older segments to a remote tier is a practical improvement.

But the words "remote storage" do not settle the architecture. Ask what remains local, what is authoritative, what happens on broker failure, and whether scaling still depends on moving durable data between broker disks. Those questions separate a retention feature from a storage-compute separation model.

If your Kafka estate mainly needs longer retention, start with tiered storage and validate remote fetch behavior under your workload. If your Kafka estate needs less data movement, faster broker replacement, and a more cloud-native storage model, evaluate shared-storage Kafka architectures such as AutoMQ, where object storage is the primary durable foundation rather than only the cold side of a tier.

References

FAQ

What is Kafka tiered storage?

Kafka tiered storage is an architecture that lets Kafka keep recent log data on broker-local storage while moving older completed log segments to a remote storage tier. It is designed to reduce local disk pressure and make longer retention more practical.

Is Kafka tiered storage the same as remote log storage?

They are closely related. Remote log storage is the mechanism Kafka uses to store log segments outside broker-local disks, while tiered storage describes the broader local-plus-remote storage lifecycle.

Is Kafka tiered storage stateless?

Not by itself. Tiered storage can reduce how much historical data stays on broker disks, but brokers may still own active local replicas and hot log data. Stateless Kafka requires a deeper separation between broker compute and durable log storage.

Does tiered storage improve Kafka cost?

It can improve the cost profile when retention drives broker disk sizing. Teams still need to account for remote storage capacity, requests, network paths, replay behavior, and operational overhead.

When should teams evaluate shared-storage Kafka instead?

Evaluate shared-storage Kafka when the main problem is broker-data coupling rather than retention alone. Common triggers include frequent scaling, slow broker replacement, expensive data movement, and a desire to run Kafka-compatible workloads on object storage as the primary durable layer.

How does AutoMQ go beyond tiered storage?

AutoMQ is Kafka-compatible but uses S3Stream to place durable stream data on object storage as the primary storage foundation. Brokers are designed to be more stateless, so scaling and recovery can involve less broker-local data movement than traditional Kafka designs.

Newsletter

Subscribe for the latest on cloud-native streaming data infrastructure, product launches, technical insights, and efficiency optimizations from the AutoMQ team.

Join developers worldwide who leverage AutoMQ's Apache 2.0 licensed platform to simplify streaming data infra. No spam, just actionable content.

I'm not a robot
reCAPTCHA

Never submit confidential or sensitive data (API keys, passwords, credit card numbers, or personal identification information) through this form.