Blog

Kafka Tiered Storage vs Diskless Kafka | Cloud Cost Guide

Moving older Kafka data to object storage can lower the cost of long retention. That part is not controversial. The harder question is whether it lowers the Kafka bill that actually bothers your platform team: broker compute held for peaks, primary storage kept for active partitions, cross-zone replication, slow rebalancing, and capacity that cannot be removed after traffic falls.

That is where "Kafka tiered storage vs diskless Kafka" gets interesting. Both architectures use remote or object storage, so procurement discussions often group them together. Architecturally, they attack different cost centers. Tiered storage reduces pressure from retained historical log segments. Diskless Kafka changes the relationship between brokers and durable data, which can affect scaling, replacement, and data movement.

Cost Driver Coverage Matrix

The practical answer is not "always tiered storage" or "always diskless." It is more useful to ask which part of your Kafka cloud cost is out of proportion. If the expensive part is old data that is rarely read, tiered storage may be enough. If the expensive part is broker-local ownership of active data, replicas, and rebalancing, object storage must be part of a deeper storage architecture.

The Cost Problem Each Architecture Tries to Solve

Traditional Kafka was built around a shared-nothing model: each broker stores local log segments for the partitions it owns, and durability comes from replication across brokers. In a cloud environment, that model turns storage, compute, and network behavior into one planning problem. You cannot size retention without thinking about broker disks. You cannot add or remove capacity without thinking about partition movement. You cannot reason about durability without thinking about replica traffic.

Tiered storage starts with a narrower diagnosis. Kafka's own KIP-405 describes a two-tier model where the local tier remains on broker disks and the remote tier stores completed log segments in systems such as S3 or HDFS. The goal is to keep older data available without making every broker carry the full retention window locally.

Diskless Kafka starts with a broader diagnosis. KIP-1150, accepted on March 2, 2026, says diskless topics are meant to avoid using broker disks as the primary durable storage for user data, while still preserving Kafka semantics through follow-up implementation KIPs. That does not mean there are literally no disks anywhere. It means the attached broker disk becomes less central to durability and operations.

The distinction matters because cloud cost is not one line item:

  • Cold retention cost is the cost of storing older segments for replay, audit, recovery, or delayed consumers.
  • Hot-path cost is the cost of serving active writes and reads with enough broker compute, memory, network, and primary storage headroom.
  • Replication and zone cost comes from copying data across failure domains and serving clients across network boundaries.
  • Elasticity cost is the capacity you keep because scale-in, reassignment, or recovery is too slow to trust during normal operations.
  • Operational cost is the engineering time and risk created by broker replacement, partition movement, disk pressure, and maintenance windows.

Tiered storage mostly targets cold retention and some rebalancing pressure from old segments. Diskless Kafka targets the broker-local statefulness that sits behind several of the other cost buckets. That is why the same object storage bill can mean two very different architectural outcomes.

How Tiered Storage Works

Apache Kafka tiered storage keeps Kafka's local log model and adds a remote tier for completed segments. In the Kafka 4.2 documentation, the local tier is the same broker-local disk tier Kafka already uses, while the remote tier stores completed log segments in external systems such as HDFS or S3. Topic-level controls such as remote.storage.enable, local.retention.ms, local.retention.bytes, retention.ms, and retention.bytes decide how long data remains local and how long it remains retained overall.

That design is valuable because many Kafka clusters carry far more historical data than their hot working set requires. A fraud analytics topic, audit topic, or event replay buffer may need weeks or months of retention, while most consumers read near the tail. Tiered storage lets the operator shorten local retention while keeping a longer remote retention window.

It solves a concrete set of problems:

  • Long retention no longer has to equal large broker disks. Completed segments can move to remote storage after local retention rules allow it.
  • Delayed consumers get a larger replay window. Applications can still fetch older records through Kafka APIs instead of switching to a separate archive pipeline.
  • Some recovery and reassignment work can shrink. KIP-405 notes that reducing local data can reduce the amount copied during recovery and rebalancing.
  • Primary storage pressure becomes easier to control. Operators can tune local retention around the hot working set instead of the full retention policy.

The boundary matters as much as the benefit. Tiered storage does not remove the local tier. Active segments are still part of broker operation. Leaders, followers, replication, client traffic, page cache behavior, and local retention still matter. A tiered Kafka cluster can be less storage-heavy than a fully local-retention cluster, but it is not a stateless compute fleet.

How Diskless Kafka Works

Diskless Kafka uses object or shared storage as part of the primary durability model, not only as a cold archive for completed segments. The Apache Kafka community's KIP-1150 frames diskless topics as a separate topic type with a distinct ingestion engine. The KIP also states that diskless is not "no disks" in a literal sense: broker disk may still be used for metadata, temporary buffering, or cache. The change is that broker-local disk is no longer the primary durable store for user data.

AutoMQ is an example of this broader diskless/shared-storage direction. AutoMQ's S3Stream documentation describes a storage layer that offloads Kafka's log storage to cloud storage and uses WAL storage for acceleration and fault recovery. Its stateless broker documentation explains that brokers become stateless because Kafka's storage layer is offloaded to cloud storage through S3Stream.

Data Path Comparison

That shift changes the operational cost model. In local-disk Kafka, a broker is both compute and a place where durable partition data lives. In a diskless architecture, brokers are closer to a compute layer over shared storage. Replacing a broker, scaling out, or scaling in no longer has to mean moving the durable history owned by that broker in the same way.

The tradeoff is not magic. Shared storage introduces its own engineering work: write path design, WAL behavior, object layout, metadata management, cache strategy, and read amplification control all matter. KIP-1150 also keeps the door open for latency and cost tradeoffs by topic, rather than pretending one mode fits every workload. The serious comparison is therefore mechanism against mechanism, not object storage as a slogan.

Cost Drivers Compared

The most useful cost model separates where bytes live from who must move them. A tiered architecture can make older bytes less expensive to retain, but active bytes still move through brokers and replicas. A diskless architecture changes the durable placement of those bytes, so the savings hypothesis extends beyond retention into elasticity and data movement.

Cost driverTiered storage impactDiskless Kafka impact
Cold retentionStrong fit. Older completed segments move to remote storage and can be retained longer.Strong fit when object storage is also the durable repository for retained data.
Hot storagePartial. Active and local-retention data still need broker-local storage.Broader. Broker disks are not the primary durable store for user data, though cache and WAL design still matter.
ReplicasPartial. Kafka still relies on broker roles and local hot-path behavior.Broader. Shared storage can reduce how much durable user data must be replicated through broker disks, depending on implementation.
Cross-AZ trafficWorkload-dependent. Remote storage helps old segments, but hot replication and client paths remain part of the design.Potentially broader. KIP-1150 explicitly targets inter-zone transfer cost, but the final effect depends on the implementation and cloud network topology.
RebalancingImproved for old data that does not need to be copied back to brokers.Broader. Stateless or lower-state brokers reduce data movement during replacement, scale-out, and scale-in.
Over-provisioningLimited if the main reason for headroom is hot traffic and slow scale-in.Better fit when capacity can be added and removed without large data migration windows.
API and retrieval costsMust be modeled. Remote storage operations, retrieval, and request patterns matter.Must be modeled more carefully because object storage is closer to the main data path.

Public cloud object storage pricing reinforces the same point. AWS S3, Azure Blob Storage, and Google Cloud Storage pricing pages all break cost into more than stored GiB: operations, retrieval, data transfer, storage class behavior, and replication can all matter. A Kafka architecture that writes many small objects, reads cold data frequently, or fans out historical scans has a different object storage profile from one that writes compacted larger objects and mostly serves tail reads from cache.

That is why unsupported "object storage has a lower unit price" math is dangerous. The storage unit price may look attractive, but the Kafka bill changes only if the architecture also reduces the expensive behavior around that storage. For tiered storage, the strongest case is retained cold data. For diskless Kafka, the case depends on whether statelessness and shared storage reduce broker capacity, replica traffic, and operational data movement enough to outweigh the added object storage path complexity.

Which Architecture Fits Your Workload

A stable Kafka cluster with long retention is a good tiered storage candidate. The workload writes steadily, consumers usually read near the tail, old data is needed for replay or compliance, and the team mainly wants to stop sizing broker disks around the oldest retained byte. In that situation, tiered storage is a focused answer to a focused problem.

A bursty or frequently resized cluster points in a different direction. If the pain is that brokers must remain over-provisioned because scale-in is hard, or that partition movement dominates maintenance planning, then reducing cold retention cost will not fix the architectural bottleneck. The durable log is still tied closely enough to broker placement that operations remain stateful.

Workload Fit Table

Use the table below as a first-pass filter before building a detailed cost model:

Workload signalUsually points to tiered storageUsually points to diskless Kafka
Long retention, rare historical readsYesAlso possible, but may be more architecture change than needed
CDC topics with predictable throughputOften yesConsider if broker replacement and scale-in are recurring pain
Bursty traffic with idle valleysLimitedStronger fit because elasticity affects cost directly
High fanout or frequent replayDepends on cold-read behavior and broker capacityDepends on cache, object layout, and remote fetch design
Strict low-latency hot pathOften safer because classic Kafka local path remains primaryEvaluate carefully; KIP-1150 itself calls out latency and cost tradeoffs
Frequent partition reassignmentSome improvement for old segmentsStronger fit if the main cost is data movement and stateful operations

The decision rule is simple but useful: choose tiered storage when old data is the cost problem; evaluate diskless Kafka when broker-local ownership is the cost problem. Many real clusters have both problems, but one is usually the reason the bill or the operations queue keeps getting worse.

A Practical Cost Checklist

Before comparing products or architectures, put your cost model into separate columns. That keeps the decision from collapsing into a single object storage price and makes hidden assumptions visible to SRE, FinOps, and architecture teams.

Start with the hot path:

  • What write throughput must the cluster absorb during normal operation and peak operation?
  • How much read fanout is served near the tail versus from older offsets?
  • How much broker headroom is kept because scaling or reassignment takes too long?
  • Which traffic crosses availability zones, regions, VPC boundaries, or cloud network billing boundaries?

Then model retention separately:

  • How long must data remain queryable through Kafka APIs?
  • What portion of retained data is read after it leaves the hot working set?
  • Are topics delete-retention, compacted, or mixed?
  • How do object storage requests, retrievals, minimum storage duration, and lifecycle rules affect the workload?

Finally, model operations:

  • How often do brokers fail, get replaced, or get patched?
  • How often does the team move partitions to rebalance hot spots?
  • How much duplicate capacity is run during maintenance or migration windows?
  • What is the business cost of keeping extra capacity because scale-in is slow?

If the retention column dominates, tiered storage deserves priority. If the hot path and operations columns dominate, diskless Kafka deserves a serious evaluation. AutoMQ's pricing calculator can be useful for that second category because it asks for workload shape rather than only retained GiB, but the result should still be checked against current cloud pricing and your own traffic patterns.

Where AutoMQ Fits in the Discussion

AutoMQ should not be introduced as "Kafka, but with lower object storage prices." That misses the architecture. The relevant point is that AutoMQ uses S3Stream to separate storage from compute while keeping Kafka protocol compatibility. Its stateless broker model changes the broker from a durable data owner into a compute layer over shared storage, with WAL used for acceleration and recovery.

That makes AutoMQ a diskless Kafka example for teams whose cost pain is tied to elasticity:

  • Scale-in after peaks. Capacity can be removed without treating every broker as a storage evacuation project.
  • Broker replacement. A failed or interrupted node is less likely to become a long data reconstruction event.
  • Traffic balancing. Storage-compute separation gives the platform more freedom to move work without duplicating retained data.
  • Retention plus elasticity. Object storage is not only a cold tier; it is part of the durable storage foundation.

There are still workloads where classic Kafka plus tiered storage is the cleaner answer. Low-latency topics with stable traffic and long but rarely read retention may not need a new storage engine. The point of the comparison is to avoid using the same phrase, "Kafka on object storage," for two different engineering decisions.

Sources

FAQ

Is Kafka tiered storage the same as diskless Kafka?

No. Kafka tiered storage adds a remote tier for completed log segments while keeping a local broker tier. Diskless Kafka changes the primary durability model so broker-local disks are not the main durable store for user data. Both may use object storage, but they solve different cost and operations problems.

Does tiered storage lower Kafka cloud cost?

It can lower cost when retained historical data is a major driver and old data is not read constantly. It is less likely to solve costs caused by broker compute, hot traffic, cross-zone replication, slow scale-in, or frequent reassignment. Those costs remain tied to the active Kafka path.

Does diskless Kafka always cost less than tiered storage?

No. Diskless Kafka changes more of the storage architecture, so it can help broader elasticity and data movement problems, but it also depends on object layout, WAL design, cache behavior, request patterns, and latency requirements. A stable cluster with simple long-retention needs may be better served by tiered storage.

What changed with KIP-1150?

KIP-1150, accepted on March 2, 2026, records Apache Kafka community agreement on the need for diskless topics and their end-user requirements. It does not by itself deliver the full implementation. The KIP explicitly points to follow-up KIPs for core implementation details.

When should I choose tiered storage?

Choose tiered storage when your main pain is retained historical data: long replay windows, audit retention, delayed consumers, or broker disk pressure from closed segments. It is a focused way to reduce local storage pressure without changing the entire Kafka operating model.

When should I evaluate AutoMQ or another diskless Kafka architecture?

Evaluate diskless Kafka when the cost problem is tied to broker-local ownership: over-provisioning for bursts, slow scale-in, partition movement, broker replacement, or the operational cost of stateful brokers. AutoMQ is one example of this architecture because it uses S3Stream and stateless brokers while preserving Kafka protocol compatibility.

Newsletter

Subscribe for the latest on cloud-native streaming data infrastructure, product launches, technical insights, and efficiency optimizations from the AutoMQ team.

Join developers worldwide who leverage AutoMQ's Apache 2.0 licensed platform to simplify streaming data infra. No spam, just actionable content.

I'm not a robot
reCAPTCHA

Never submit confidential or sensitive data (API keys, passwords, credit card numbers, or personal identification information) through this form.