Amazon MSK tiered storage is useful, especially when Kafka retention has grown beyond what broker-local disks were meant to carry. It lets older log segments move from primary broker storage into a lower-cost remote tier, while applications keep using Kafka APIs. For teams that need longer replay windows, audit retention, or a safety buffer for delayed consumers, that is a real improvement over sizing every broker around the oldest byte in the cluster.
The confusion starts when "data goes to S3" becomes shorthand for "Kafka is now elastic." Tiered storage changes where closed historical segments can live. It does not remove the active log from the broker path, erase partition ownership, or turn brokers into stateless compute. A tiered MSK cluster can reduce retention pressure and speed up some rebalancing scenarios, but it still has broker-local state, hot-path capacity planning, and operational boundaries that matter during scale-out, scale-in, failure recovery, and cold reads.
That distinction is not academic. It decides whether tiered storage is enough for your workload, or whether you are really looking for a diskless Kafka architecture where durable log storage is no longer bound to broker-local disks.
What Amazon MSK Tiered Storage Actually Solves
Amazon MSK describes tiered storage for Standard brokers as a low-cost storage tier that can scale to virtually unlimited storage. In the MSK model, streaming data first lands in a performance-optimized primary storage tier. When topic retention and segment rules make data eligible, MSK moves older data into the lower-cost tier. AWS also notes that applications may see higher latency for the first bytes when reading from tiered storage, before sequential reads behave more like reads from the primary tier.
That design targets a specific pain: retention growth. Traditional Kafka makes retention an expensive broker sizing problem because each broker owns local log segments for its assigned partitions. If teams want longer retention, they often add broker storage, increase broker count, or both. Tiered storage changes that equation for older closed segments.
It helps most when the workload has these characteristics:
- Long retention dominates storage footprint. Compliance logs, event replay buffers, and delayed processing windows can keep data around long after it leaves the hot write/read path.
- Cold reads are occasional. Reprocessing old data through Kafka APIs is valuable, but it is not the same as constantly serving historical scans at hot-path latency.
- Primary storage can be kept smaller. Local retention settings become a control point, so the broker does not need to keep every retained segment on local disk.
- Rebalancing does not need to move all historical bytes. AWS states that partitions can rebalance faster because data on secondary storage does not require replication across broker disks.
This is why tiered storage deserves a fair evaluation. It is not a gimmick bolted onto Kafka; it directly addresses a real retention cost and operations problem. The mistake is asking it to solve a different problem: making Kafka brokers stateless.
What Tiered Storage Does Not Change
Apache Kafka's tiered storage documentation describes two tiers: a local tier on Kafka brokers and a remote tier for completed log segments in external storage such as S3 or HDFS. That sentence is the key. Tiered storage adds a remote tier; it does not delete the local tier.
In MSK, the active log segment is not eligible for tiering. AWS documents that Kafka copies closed log segments to tiered storage, and active segments remain outside that copy path until segment size or segment roll time closes them. The broker still receives writes, serves hot reads, owns partitions, manages leadership, participates in replication, and depends on primary storage for the current working set.
The remaining constraints show up in four places:
| Area | What tiered storage improves | What still remains |
|---|---|---|
| Retention | Older closed segments can move to lower-cost remote storage | Active and local-retention data still need primary broker storage |
| Scaling | Historical bytes may not all move during reassignment | Partition ownership, hot traffic, and local state still have to rebalance |
| Failure recovery | Less old data may need reconstruction on broker disks | The cluster still depends on broker roles, leaders, replicas, and local hot data |
| Cold reads | Applications can replay older data through Kafka APIs | Remote reads can add latency and still flow through Kafka behavior and broker capacity limits |
AWS also lists concrete MSK constraints: tiered storage applies only to provisioned mode clusters, is documented for Standard brokers, does not support t3.small, does not support compacted topics, does not support multiple log directories on a broker, has a minimum low-cost storage retention period, and cannot be disabled for the whole cluster after enablement in the same way it can be disabled for individual topics. These are not reasons to reject the feature. They are reasons to treat it as a retention architecture, not a universal elasticity layer.
The operational question is therefore sharper than "does MSK use S3?" A better question is "which bytes and which broker responsibilities still remain local after tiered storage is enabled?"
Tiered Storage vs Diskless Kafka Architecture
Local-disk Kafka, tiered-storage Kafka, and diskless/shared-storage Kafka solve different parts of the storage problem. They can look similar in procurement conversations because all three may mention Kafka compatibility, object storage, and cloud economics. Architecturally, they are not the same.
Traditional Kafka is a shared-nothing system. Each broker owns local log segments for its partitions, and durability comes from replication across brokers. That model works well when disks are attached to servers and capacity changes are slow, but it couples compute placement with durable log placement.
Tiered storage reduces that coupling for older closed segments. The broker-local tier remains the primary path for active data, while remote storage becomes the retention tier. This makes long retention more practical, but the partition still has a broker-local life. The leader still writes locally first, hot reads still depend on broker resources, and scaling events still care about where active partitions live.
Diskless or shared-storage Kafka changes the premise. In this model, durable log storage is designed around shared storage from the beginning, and brokers are closer to stateless compute nodes. AutoMQ takes this path by replacing Kafka's local-disk log storage with S3Stream, while preserving Kafka protocol compatibility. Its documentation describes S3Stream as a shared streaming storage layer that writes data through a WAL for acceleration and fault recovery, then stores data in object storage as the durable repository.
That changes what a broker means. In AutoMQ's stateless broker model, persistent data is offloaded to shared storage instead of being owned by broker-local disks. The practical benefit is not "S3 is cheaper" as a slogan. The benefit is that broker replacement, scale-out, scale-in, and traffic balancing no longer have to copy the same durable log data as traditional local-disk Kafka.
| Architecture | Durable log placement | Broker statefulness | Best fit |
|---|---|---|---|
| Local-disk Kafka | Broker-local disks with replicated partitions | High | Predictable clusters where local storage and static capacity are acceptable |
| MSK tiered storage | Active/local data on brokers, older closed segments in remote tier | Medium | Long retention, replay windows, and reduced historical storage pressure |
| Diskless/shared-storage Kafka | Durable data in shared storage, brokers as compute layer | Lower | Elastic workloads, frequent scaling, fast replacement, and storage-compute separation |
The point is not that every MSK tiered storage user should move to diskless Kafka. Many should not. The point is that tiered storage and diskless Kafka answer different questions. Tiered storage asks, "How do we keep older Kafka data without overloading broker disks?" Diskless Kafka asks, "Why should durable Kafka data be bound to broker disks at all?"
Cost and Scaling Implications
Tiered storage can reduce the storage cost of long retention, but it does not make the whole Kafka bill disappear. MSK pricing still depends on the selected broker configuration, primary storage, provisioned throughput where applicable, data transfer patterns, monitoring choices, and tiered storage usage. Because AWS pricing varies by region and date, production cost models should use the current AWS pricing page rather than copied numbers from an article.
The cost model should separate hot path from cold retention:
- Hot path cost comes from broker compute, primary storage, replication behavior, network paths, and the throughput headroom needed to absorb peaks.
- Cold retention cost comes from retained historical data, remote storage, retrieval behavior, and the frequency of replay or backfill.
- Migration or rebalancing cost comes from running excess capacity, moving partitions, operational time, and risk windows while the cluster catches up.
Tiered storage mostly attacks the second bucket. It can also reduce some movement of old data during partition reassignment, but it does not remove the first and third buckets. If the cost pain is "we retain months of data, but rarely read old segments," tiered storage may be a strong fit. If the pain is "we over-provision brokers for burst traffic, scale-in is hard, and rebalancing active partitions takes too long," the answer may require a different storage architecture.
This is where shared storage becomes relevant. AutoMQ's architecture overview frames the difference as shared-nothing, tiered storage, and shared storage. In shared storage, object storage is not only a cold tier for older data. It is the durable storage foundation, with WAL storage used to handle write efficiency and recovery. That is a deeper design change than adding remote retention to an otherwise local-log architecture.
When Tiered Storage Is Enough
MSK tiered storage is often enough when the cluster is stable and the main problem is retention. A team that already runs MSK well, has predictable traffic, uses delete-retention topics, and wants longer replay windows can get meaningful value from tiered storage without replatforming Kafka. The feature keeps Kafka APIs intact, reduces pressure to size brokers for all retained bytes, and gives delayed consumers a larger safety buffer.
It is also a reasonable choice when operational simplicity means staying inside MSK. If your team values AWS-managed Kafka operations, does not need aggressive scale-in, and can live within the documented tiered storage constraints, the right move may be to tune local retention, segment rolling, monitoring, and replay expectations rather than adopt a new platform.
The checklist should be practical:
- Are the topics using
cleanup.policy=deleterather than compaction? - Is the workload compatible with the documented MSK version, broker, and mode requirements?
- Is old-data replay occasional enough that remote-read latency is acceptable?
- Can the team monitor remote storage lag, local storage pressure, cold-read latency, and consumer behavior?
- Is the main cost driver retained bytes rather than broker compute, hot traffic, or slow elasticity?
If those answers are mostly yes, tiered storage is not a half measure. It is a focused solution to a focused problem.
When You Need Diskless Kafka Instead
Diskless Kafka becomes the more interesting option when the problem is not retention alone. Teams usually reach this point after they have already tuned retention and broker sizing, yet still struggle with slow scaling, hot partitions, underutilized capacity, failure replacement, or the operational cost of moving data during reassignment.
A diskless evaluation is worth doing when these signals appear:
- Scale-in matters as much as scale-out. Cloud cost control often requires removing capacity after a peak, not only adding capacity before one.
- Partition movement is a recurring operational event. If reassignment windows shape maintenance plans, the storage model is already affecting operations.
- Broker replacement should not be a storage recovery project. Stateless compute is easier to schedule, replace, and automate than nodes that own persistent log data.
- Retention and hot traffic both create pressure. Tiered storage helps cold data, but hot working-set pressure still needs broker resources.
- The platform team wants Kafka compatibility without inheriting all local-disk Kafka constraints. This is the architectural opening for AutoMQ and similar shared-storage designs.
The cleanest decision rule is this: choose tiered storage when old data is the problem; evaluate diskless Kafka when broker-local ownership is the problem. Those problems can overlap, but they are not identical.
Sources
- Amazon MSK tiered storage for Standard brokers
- How log segments are copied to tiered storage for an Amazon MSK topic
- Amazon MSK pricing
- Apache Kafka tiered storage documentation
- KIP-405: Kafka Tiered Storage
- AutoMQ architecture overview
- AutoMQ S3Stream overview
- AutoMQ stateless broker
FAQ
Does Amazon MSK tiered storage store Kafka data in S3?
Amazon MSK documentation describes tiered storage as a lower-cost remote tier for older Kafka data, with data first stored in primary broker storage and then moved after it reaches topic retention and segment rules. The feature is compatible with the APIs introduced by KIP-405, but the active local tier remains part of the architecture.
Does tiered storage make MSK brokers stateless?
No. Tiered storage moves eligible closed segments to remote storage, but brokers still own active partition work, local hot data, leadership, replication behavior, and client traffic. It reduces retention pressure; it does not turn brokers into stateless compute nodes.
Is MSK tiered storage good for reducing Kafka cost?
It can help when retained historical data is a major cost driver, especially if old data is read infrequently. It is less likely to solve costs driven by broker compute, hot traffic peaks, over-provisioned capacity, or frequent scaling events. Always model costs with current AWS pricing for your region and workload.
What are important MSK tiered storage limitations to check?
As of the AWS documentation checked on May 20, 2026, MSK tiered storage applies only to provisioned mode clusters, is documented for Standard brokers, does not support compacted topics, does not support multiple log directories, does not support t3.small, and has operational constraints around disabling tiered storage. Recheck the AWS docs before publication because service scope can change.
How is diskless Kafka different from tiered storage Kafka?
Tiered storage Kafka keeps a local broker tier for active data and moves older closed segments to remote storage. Diskless or shared-storage Kafka designs make shared storage the durable log foundation, so brokers do not own persistent data in the same way. AutoMQ uses S3Stream and stateless brokers to follow that shared-storage model while keeping Kafka compatibility.
When should a team choose tiered storage instead of AutoMQ?
Tiered storage may be the right answer when the team is already happy with MSK operations and mainly needs longer retention at lower storage pressure. AutoMQ becomes more relevant when the problem is broader: elastic scaling, fast broker replacement, scale-in, storage-compute separation, or reducing operational dependence on broker-local disks.