Blog

MSK Tiered Storage Pricing and Tradeoffs

Kafka retention has a way of turning from a product requirement into a storage problem. A team starts by keeping a few days of data for replay, incident recovery, or downstream backfills. Then analytics teams ask for a longer window, fraud models need reprocessing, and compliance wants more history available. On Amazon MSK, that requirement lands first on broker-attached storage and then on the monthly bill.

Amazon MSK tiered storage is designed for exactly this pain. AWS describes it as a low-cost storage tier for MSK Standard brokers that moves older log data out of primary broker storage after topic retention conditions are met. That can make long retention practical without adding brokers or provisioning large local disks for every extra day of history. But tiered storage changes storage line items; it does not turn Kafka into a fully elastic shared-storage system.

MSK Tiered Storage Cost Flow

Why Kafka Retention Gets Expensive on MSK

Apache Kafka stores records in append-only log segments on broker storage. KIP-405, the Apache Kafka proposal for tiered storage, states the cost problem directly: cluster storage is driven by topics, partitions, message rate, and especially retention period. When the retention window grows, broker-local data grows with it.

That is operationally awkward because Kafka brokers are not pure compute nodes. A broker owns partition replicas, serves reads and writes, and stores log segments. If storage is the bottleneck, traditional Kafka-style scaling often means adding brokers even when the cluster does not need proportional CPU or memory.

MSK Standard broker pricing reflects that shape. AWS charges for active broker instance usage, provisioned primary storage, optional provisioned storage throughput, and, when tiered storage is enabled, low-cost tier storage plus retrieval from that tier. AWS also notes that standard data transfer charges apply for data transferred in and out of MSK clusters. The exact numbers vary by region and can change, so use the official AWS MSK pricing page for live rates.

For a high-retention workload, the practical cost model usually separates into these surfaces:

Pricing surfaceWhat changesWhat still needs scrutiny
Broker hoursUsually unchanged directlyBroker count, partitions, throughput, and hot reads
Primary storageCan shrink with a shorter hot windowActive segments, bursts, and headroom
Low-cost tier storageNew surface for older dataRetained volume, topic coverage, deletion lifecycle
Low-cost tier retrievalNew surface for old readsBackfills, replay jobs, and recovery
Data transferNot eliminated by tieringVPC, AZ, region, PrivateLink, and egress paths

Tiered storage is therefore not a single "turn it on and the bill drops" setting. It is a retention architecture choice with a pricing model attached.

How MSK Tiered Storage Works at a High Level

MSK tiered storage keeps recent data in a performance-optimized primary storage tier and moves older data to a low-cost tier. AWS says applications can connect to a tiered-storage-enabled cluster through the same client process they use for a cluster without tiered storage. The feature is compatible with APIs introduced in KIP-405, Kafka's remote log storage architecture.

The key idea from KIP-405 is a two-tier log. Kafka keeps local storage for latency-sensitive tail reads and ships completed log segments to a remote tier such as object storage. Local retention can be shorter, while remote retention can be much longer. When a consumer reads older offsets, Kafka serves that data from the remote tier through the broker path.

That split is useful because most Kafka reads are near the tail, but not all of them. Incident recovery, backfill, and late consumer recovery often need older offsets. Tiered storage keeps those offsets in Kafka longer without forcing every broker to hold the full retention window locally.

AWS calls out an important latency characteristic: the first few bytes read from tiered storage can see increased latency; sequential reads after that can be similar to primary storage. Tiered storage is attractive for long retention and sequential recovery. It is less attractive for sparse historical reads where every lookup is latency-sensitive.

What to Verify in MSK Tiered Storage Pricing

The right pricing review starts with the live AWS pages, not a copied number in a deck. For MSK tiered storage, calculate the hot local window and cold remote window independently.

Local Storage

Local storage does not disappear. It becomes the hot tier. AWS's MSK documentation says primary storage keeps streaming data until topic retention limits move data into the low-cost tier. The local tier must still absorb current writes, segment rolling, recovery behavior, and active reads.

That creates two sizing questions. How much data must remain local to protect tail latency and common recovery paths? How much headroom does the cluster need during traffic bursts or consumer catch-up? A too-aggressive local window can move too many operational reads into the remote tier.

EBS pricing is relevant because AWS EBS charges by provisioned storage, and some volume types add charges for provisioned IOPS or throughput beyond baseline. MSK abstracts broker management, but primary storage is still capacity for the broker tier, while tiered storage is usage-based remote storage in the MSK pricing model.

Remote Storage

The remote tier is where long retention becomes more economical. AWS's MSK pricing page lists low-cost tier storage as a per-GB-month dimension for Standard brokers with tiered storage. It also lists a per-GB retrieval dimension. The remote tier is not only "how much history do we keep?" It is also "how often do we read that history?"

For a topic written continuously, retained bytes grow roughly with ingest rate multiplied by retention time, adjusted for compression and deletion. A high-level model can stay simple:

plaintext
remote retained data ~= compressed ingest rate x remote retention window
remote retrieval data ~= historical read volume x replay frequency

Model this per topic group. A security audit topic with long retention and rare reads behaves differently from a clickstream topic replayed every week. Both may benefit from tiering, but their retrieval risk is different.

S3 pricing is a useful comparison point, but MSK tiered storage is priced as an MSK feature, not as a bucket you manage directly. AWS S3 pricing separates storage, requests, retrieval for certain storage classes, and data transfer, so read patterns still matter.

Reads and Network Costs

The most common mistake is modeling tiered storage as a pure storage discount and ignoring reads. Backfill-heavy teams should estimate how often consumers read offsets outside the hot tier. Historical reads may be rare during normal operation and intense during incidents, so test both conditions.

Network charges also deserve a separate line. AWS's MSK pricing page says standard AWS data transfer charges apply for data transferred in and out of MSK clusters. AWS EC2 pricing documentation is the authoritative place to verify regional data transfer rules. If consumers sit in different VPCs, accounts, or regions, tiered storage does not make that traffic free.

The pricing checklist should include these questions:

  • Which topics actually need long Kafka retention, and which can keep shorter retention with a data lake copy?
  • Does the local retention window cover normal consumer recovery?
  • How often do backfills read old offsets, and are those reads sequential or sparse?
  • Are any topics compacted, transactional, or affected by documented MSK limits?
  • Which clients or connectors read across VPC, AZ, account, or region boundaries?

Feature Limits and Operational Tradeoffs

MSK tiered storage has documented constraints. AWS states that tiered storage applies only to provisioned mode clusters. It requires Apache Kafka version 3.6.0 or higher, or 2.8.2.tiered. It is not available for t3.small broker size or in AWS GovCloud (US) Regions. AWS also documents limitations around JBOD, compacted topics, and cleanup policy changes.

Those constraints matter because Kafka estates are rarely uniform. A platform team may have delete-retention topics, compacted metadata topics, transactional consumers, and connector topics. Tiered storage may fit a subset well while being inappropriate for others.

The operational model also changes observability. AWS notes that kafka-log-dirs cannot report tiered storage data size and only reports primary storage log segments. Teams need metrics for remote read throughput, remote errors, and local storage pressure.

There is also a rollback consideration. AWS documents cases where tiered storage can be disabled for a topic but not re-enabled for that same topic. Treat the change as an architecture migration for selected topics, with a test plan and explicit acceptance criteria.

Tiered Storage vs Shared Storage Kafka

Tiered storage is often described as "Kafka on object storage," but that phrase hides an important distinction. In tiered storage, Kafka still writes first to broker-local primary storage. Older completed segments are copied to the remote tier. The broker remains a storage owner for the hot part of the log.

Shared storage Kafka starts from a different premise. Durable data is placed in shared storage as the primary repository, and brokers move closer to stateless compute. The broker still serves the Kafka protocol, but it no longer owns the full durable dataset in local disks.

Tiered Storage vs Shared Storage Kafka

The architectural difference shows up in failure and scaling paths. With tiered storage, reducing local data can reduce recovery and rebalancing time because less data must be copied between brokers. KIP-405 makes that benefit explicit. But the local tier is still part of the storage architecture. With shared storage, broker replacement and partition movement can avoid large broker-to-broker data migration.

The tradeoff is not that one pattern is universally better. Tiered storage is a conservative extension of Kafka's existing log model. It fits when the main problem is long retention and most hot-path traffic stays near the tail. Shared storage is the category to evaluate when the recurring pain is also slow scaling, broker recovery, replica movement, over-provisioned compute, and cross-zone replication economics.

QuestionMSK tiered storageShared storage Kafka
Primary write pathBroker storage first, then remote tierShared storage is primary durable design
Best cost targetLong retentionStorage, elasticity, recovery, and data movement
Read pattern fitTail local; older reads remoteImplementation-dependent
Scaling implicationLess local data reduces movementBrokers can be closer to stateless compute
Operational changeExtension for eligible MSK topicsArchitecture evaluation and migration

If the pain is "we need 30 or 90 days of Kafka retention," MSK tiered storage may be the most direct AWS-native move. If the pain is "Kafka capacity is permanently overbuilt because scaling and recovery are data-movement events," tiered storage may only reduce part of the problem.

Where AutoMQ Fits

The architecture-level alternative is object-storage-backed shared storage Kafka. AutoMQ is one implementation of that category: a Kafka-compatible cloud-native streaming platform that replaces Kafka's broker-local log storage with S3Stream, using object storage as the primary data repository and a WAL layer for write persistence.

AutoMQ is not a drop-in answer for every MSK cluster. MSK is a managed AWS service with clear operational advantages, and tiered storage is useful for the right retention-heavy workloads. AutoMQ becomes relevant when the cost conversation moves beyond retention into Kafka's shape: brokers sized for peak, local replicas, rebalancing windows that discourage scale-in, and recovery paths that copy data across compute nodes.

In that scenario, the question changes from "Can we store cold data elsewhere?" to "Why should brokers own durable data at all?" Object-storage-backed shared storage answers by making durable storage a cloud service responsibility and letting Kafka-compatible brokers focus on traffic. That is a deeper change than MSK tiered storage, so evaluate it with a workload model, migration plan, and compatibility tests.

Retention Cost Questions

A practical evaluation sequence is straightforward: enable tiered storage for eligible high-retention topics; measure storage reduction, retrieval, latency, and network cost; then decide whether the remaining cost is storage retention or broker-local architecture. If the remaining cost is still dominated by always-on brokers, slow scaling, and replica movement, a shared-storage Kafka option such as AutoMQ deserves a side-by-side model.

MSK tiered storage can cut Kafka cost when retention is the problem. It is not enough when the expensive part is Kafka's broker-local ownership of durable state. Separating those two problems leads to better pricing decisions.

For the architecture track, review AutoMQ's shared storage documentation or contact the AutoMQ team through the verified page: AutoMQ contact.

References

FAQ

Does MSK tiered storage reduce Kafka cost?

It can reduce cost when the main driver is long retention on broker primary storage. It moves older data into a low-cost tier so retention can grow without expanding local broker storage in the same way. It does not automatically reduce broker hours, retrieval charges, or data transfer.

How is MSK tiered storage priced?

AWS lists tiered storage pricing for Standard brokers using separate dimensions for low-cost tier storage and retrieval from that tier, alongside broker instance and primary storage charges. Rates vary by region, so use the live AWS MSK pricing page.

Is MSK tiered storage the same as S3-backed Kafka?

No. MSK tiered storage keeps recent data in broker primary storage and moves older completed segments to a remote tier. S3-backed shared storage Kafka uses object storage as part of the primary durable storage design, which changes broker scaling and recovery behavior more deeply.

What workloads benefit most from MSK tiered storage?

Retention-heavy workloads with mostly tail reads and occasional sequential replay are the strongest fit, such as audit streams and event history for recovery.

What should I test before enabling tiered storage?

Test topic eligibility, local retention settings, historical read latency, retrieval volume, observability, and rollback behavior. Also verify Kafka version, broker mode, compacted topics, cleanup policy, and regional availability.

When should I evaluate AutoMQ instead?

Evaluate AutoMQ when the main issue is not only retention cost but the structural cost of broker-local storage: slow scaling, permanent headroom, large recovery movement, and replica-related cloud infrastructure cost. AutoMQ preserves Kafka compatibility while using object-storage-backed shared storage, so it belongs in architecture-level cost comparisons.

Newsletter

Subscribe for the latest on cloud-native streaming data infrastructure, product launches, technical insights, and efficiency optimizations from the AutoMQ team.

Join developers worldwide who leverage AutoMQ's Apache 2.0 licensed platform to simplify streaming data infra. No spam, just actionable content.

I'm not a robot
reCAPTCHA

Never submit confidential or sensitive data (API keys, passwords, credit card numbers, or personal identification information) through this form.