Blog

AWS Kafka Pricing Explained: MSK, Storage, Traffic, and Hidden Cost Drivers

The first surprise in an AWS Kafka bill is that Kafka is rarely billed as one thing. A team searches for AWS Kafka pricing, lands on the Amazon MSK pricing page, and sees broker hours, storage, Serverless throughput, MSK Connect, Replicator, tiered storage, and private connectivity. Then the AWS bill arrives with lines from EC2, EBS, S3, VPC, NAT Gateway, PrivateLink, and Data Transfer. None of those lines is wrong. Kafka's architecture spreads cost across compute, disk, replication, networking, and operations.

That is why a useful pricing model starts with the workload, not the service name. Kafka cost follows four questions: how much data is written, how long it is retained, how many copies exist, and where every byte crosses a network boundary. Amazon MSK removes much of the operational burden of running Kafka, but replication factor, multi-AZ placement, long retention, remote consumers, and connector fleets can change the bill more than a modest broker-rate difference.

AWS Kafka cost stack

What People Mean by AWS Kafka Pricing

"Kafka on AWS" can mean several deployment models. Amazon MSK Provisioned runs managed Apache Kafka clusters with broker and storage pricing. MSK Serverless changes the model toward cluster capacity and data throughput. MSK Connect and MSK Replicator add managed integration and replication charges. Self-managed Kafka on EC2 exposes EC2, EBS, S3, VPC, and data transfer pricing directly. BYOC and shared-storage Kafka platforms introduce another model: the control plane may be managed, while compute and storage still live in the customer's cloud account.

The pricing comparison gets distorted when teams compare one line item against another. Broker hours are easy to see, so they become the anchor. But for production Kafka, broker hours are usually the beginning of the model:

  • Broker compute sets the fixed cost floor.
  • Storage and retention turn ingress into a durable footprint multiplied by replicas, retention windows, compaction, and remote tiers.
  • Data transfer appears when clients, brokers, connectors, or replicated clusters cross AZ, VPC, Region, or internet boundaries.
  • Managed services around Kafka, such as MSK Connect, Replicator, NAT Gateway, and PrivateLink, can be correct architectural choices while still becoming meaningful cost lines.
  • Operations cost shows up as engineering time, rebalancing windows, partition movement, and incident risk. It may not be on the AWS invoice, but it belongs in the TCO model.

AWS pricing pages change by Region, instance family, and service option. For that reason, this article avoids hard-coding price figures. Use the current AWS pricing pages and AWS Pricing Calculator for final numbers.

The Main Cost Components of Amazon MSK

Amazon MSK pricing is not a single meter. The official MSK pricing page separates Provisioned clusters, Serverless, MSK Connect, MSK Replicator, broker storage, tiered storage, and optional private connectivity. The important FinOps move is to map each AWS meter to a Kafka behavior.

Cost componentWhere it comes fromWhat changes it
Broker computeMSK broker hours, or EC2 for self-managed KafkaBroker count, instance type, headroom, partition count
Primary storageMSK broker storage, or EBS volumesRetention, replication factor, compaction, headroom
Storage throughputMSK provisioned throughput, or EBS throughput/IOPSWrite rate, read amplification, catch-up reads
Tiered or object storageMSK tiered storage, S3 for self-managed/tiered/shared storage designsLong retention, historical reads, retrieval patterns
Data transferAWS standard data transfer, PrivateLink, NAT Gateway, cross-Region movementClient placement, AZ locality, replication, consumers, connectors
Managed add-onsMSK Connect, MSK Replicator, private connectivityWorker scale, data processed, replication topology

Broker Compute

For MSK Provisioned, broker compute is billed by broker instance usage, with rates depending on broker type, size, and Region. AWS distinguishes Standard brokers and Express brokers on the MSK pricing page. Express brokers include storage used and data written to Express brokers, while Standard brokers keep the broker-hour plus storage model. For self-managed Kafka, the equivalent line is EC2 usage plus the software and operations you run around it.

Broker compute is not driven by average ingress alone. Kafka brokers need CPU, memory, disk throughput, and network throughput for producers, consumers, replication, compaction, and recovery. A cluster sized for average write traffic can still be underpowered if consumer fan-out is high or partition leadership is uneven.

Storage and Retention

MSK Standard broker storage is billed by provisioned GB-months. Express broker storage is billed based on storage used, according to AWS's pricing description. MSK Standard can also expose provisioned storage throughput separately. Self-managed Kafka adds EBS volume, IOPS, throughput, and snapshot pricing. Tiered or shared storage adds S3 storage, request, retrieval, and data transfer considerations.

The architectural reason storage cost grows quickly is Kafka's local log design. With replication factor 3, the same record is stored on three brokers before overhead, compaction effects, and free-space headroom. This is how Kafka provides availability in a shared-nothing design. It also means retention policy is a financial control, not housekeeping.

Data Transfer

AWS's MSK pricing page says customers are not charged by MSK for data transfer used for replication between brokers or between metadata nodes and brokers, while standard AWS data transfer charges still apply for data transferred in and out of MSK clusters. Internal broker replication is not the same thing as client traffic, cross-VPC access, PrivateLink, NAT Gateway paths, internet egress, or cross-Region replication.

The hidden pattern is locality. Producers in one AZ writing to leaders in another AZ, consumers reading across AZs, connectors outside the cluster VPC, or analytics jobs in a different Region can all move large volumes through AWS networking meters.

Connect and Replication Services

MSK Connect and MSK Replicator reduce the work of operating connector fleets and replication pipelines. They still deserve rows in the pricing model. Connector cost depends on worker capacity, task parallelism, and data volume. Replication cost depends on placement, throughput, topic selection, and cross-Region or cross-account networking.

Why Kafka Architecture Changes the Bill

Kafka was designed around brokers that own local partitions. That design is elegant in a data center: add brokers, spread partitions, keep replicas on separate machines, and recover by copying data to healthy nodes. In the cloud, the same design runs on metered compute, block storage, and network boundaries. A natural data-center choice becomes a pricing multiplier when every retained byte and transferred byte has a meter attached.

Replication factor cost amplification

The multiplier starts with replication factor. A topic with replication factor 3 turns 1 TiB of logical retained data into roughly 3 TiB of broker log data before overhead and safety headroom. If the cluster spans three AZs, replicas are usually placed across zones for availability. That is good engineering, but the pricing model must track where clients are located and how traffic moves around the Region.

Partition movement adds another layer. When a Kafka cluster scales out, rebalances, replaces a broker, or changes partition assignment, data moves between brokers. During that window, the cluster consumes compute, disk throughput, and network capacity for background work that does not represent new business data.

Here is an illustrative workload model. It is not a price quote and should not be used without current pricing pages.

AssumptionIllustrative value
RegionUS East (N. Virginia)
DeploymentAmazon MSK Provisioned, Standard brokers
Broker typekafka.m7g.large class used only as an example; verify current availability and pricing
AZ count3
Replication factor3
Logical ingress100 MiB/s sustained
Consumer egress2x read fan-out, colocated when possible
Retention72 hours primary retention
Tiered storageOff in base case; evaluate separately for longer retention

At 100 MiB/s, logical ingress is about 8.4 TiB/day using binary units. With 72 hours of retention, the logical retained footprint is about 25.3 TiB before Kafka overhead. With replication factor 3, broker log storage is about 75.9 TiB before headroom. A cautious plan then adds margin for segment rolling, compaction, bursts, and recovery.

The same workload can produce different networking outcomes. AZ-aware producers and consumers reduce cross-boundary bytes. Centralized analytics VPCs, PrivateLink paths, or remote Regions increase them.

How Shared Storage Kafka Changes the Cost Model

The deeper cost question is whether brokers should be long-lived storage owners. Traditional Kafka couples compute and local persistent storage: a broker is both a network server and the place where partition replicas live. Tiered storage moves older data to object storage, but the primary log and ISR replication model still shape the cluster. Shared-storage Kafka changes the premise more directly: brokers become closer to stateless compute, while the durable log lives in shared cloud storage.

AutoMQ is an example of this architectural category. It is Kafka-compatible, but its storage layer is built around shared object storage and a WAL layer rather than broker-owned local logs. AutoMQ's public architecture documentation describes S3Stream as the replacement for Kafka's native log storage, with object storage as the primary repository and stateless brokers. The cost implication is that scaling compute no longer has to mean copying large local partition replicas from broker to broker.

MSK vs shared storage cost drivers

This does not make cost disappear. It moves the questions. Instead of asking how much broker storage must be provisioned per broker for peak retention, the model asks how much object storage is retained, what WAL option is used, what request and retrieval patterns look like, and how much compute is needed for active traffic.

The network story also changes. In a shared-storage design, the system can reduce the need for application-level cross-AZ replica copying because durability is delegated to cloud storage primitives. AutoMQ documentation also describes approaches for reducing inter-zone traffic through storage architecture and zone-aware access. The exact result depends on deployment mode, workload shape, WAL choice, and client placement, so model it rather than assume it.

MSK vs Shared Storage Cost Modeling Checklist

Use the same workload assumptions across every option. Changing retention, consumer fan-out, or replication policy while comparing platforms creates a tidy spreadsheet and a bad decision.

Modeling questionMSK / broker-local KafkaShared-storage Kafka such as AutoMQ
Fixed compute floorBrokers sized for throughput, failover, and storage ownershipCompute sized mainly for active traffic
Retained dataBroker-attached primary storage; optional tiered storageObject storage as primary repository, plus WAL
Byte multiplierReplication factor, headroom, compaction, local logsObject storage footprint, WAL footprint, metadata
Scaling behaviorPartition reassignment moves local dataStateless brokers reduce large partition-copy operations
Network blind spotsAZ mismatch, VPC boundaries, cross-Region replicationClient placement, object storage requests, WAL paths
Operating surfaceKafka capacity, brokers, storage, upgrades, rebalancesPlatform control plane plus cloud resources

Start with the AWS MSK pricing page for broker and MSK-specific meters. Add EC2 and EBS if self-managed Kafka is on the table. Add S3 if tiered storage, object storage, or shared-storage Kafka is in scope. Add EC2 data transfer, VPC, NAT Gateway, and PrivateLink pages for the network paths your architecture actually uses. Then add engineering time, because slow rebalances and emergency scaling events are real costs even when they do not appear as a service SKU.

A Practical Way to Build the Estimate

For each candidate architecture, calculate:

  1. Logical data volume: ingress MiB/s x seconds per day x retention days.
  2. Storage multiplier: replication factor, primary storage policy, tiered storage, object storage, WAL, and free-space headroom.
  3. Compute floor: minimum broker or compute-node count across AZs, then peak headroom.
  4. Read amplification: consumer fan-out, catch-up reads, reprocessing, and connector reads.
  5. Network boundaries: AZ, VPC, Region, internet, PrivateLink, NAT Gateway, and peering paths.
  6. Managed service meters: MSK Connect, MSK Replicator, private connectivity, monitoring, backup, and support.
  7. Operational exposure: partition movement time, recovery behavior, upgrade windows, and on-call load.

That sequence prevents a common mistake: optimizing the broker line before understanding why it is large. Sometimes the answer is a different instance family. Sometimes it is shorter hot retention, AZ-aware clients, or a shared-storage architecture because the real driver is the coupling between compute, local replicas, and retained data.

If your current AWS bill has a Kafka-shaped mystery in it, build the model from bytes rather than SKUs. AutoMQ provides a Kafka-compatible shared-storage path for teams evaluating a different cost structure. The useful next step is a workload-specific calculation using your Region, retention, ingress, fan-out, AZ layout, and recovery requirements. Start with the AutoMQ website and pair it with AWS's current pricing pages before making the platform call.

FAQ

Is Amazon MSK the same thing as AWS Kafka pricing?

Not exactly. Amazon MSK is AWS's managed Apache Kafka service, so MSK pricing is central. But Kafka on AWS may also include EC2, EBS, S3, VPC networking, NAT Gateway, PrivateLink, MSK Connect, MSK Replicator, monitoring, and operational labor.

Does MSK charge for broker-to-broker replication traffic?

AWS states on the MSK pricing page that customers are not charged for data transfer used for replication between brokers or between metadata nodes and brokers. Standard AWS data transfer charges still apply for data transferred in and out of MSK clusters.

Why does replication factor affect cost so much?

Replication factor multiplies retained broker log data. With replication factor 3, each logical record is stored on three brokers before overhead and headroom. Plan storage and recovery capacity from replicated bytes, not logical ingress alone.

Is MSK tiered storage the same as shared-storage Kafka?

No. Tiered storage offloads older data while the primary Kafka log still depends on broker-owned storage. Shared-storage Kafka changes ownership more deeply by placing the durable log in shared cloud storage and making brokers closer to stateless compute.

When should a team evaluate AutoMQ instead of tuning MSK?

Evaluate shared-storage Kafka when retained data is large relative to active throughput, scaling involves heavy partition movement, cross-AZ replication pressure is material, or the team wants Kafka compatibility with a cloud-native storage model.

References

Newsletter

Subscribe for the latest on cloud-native streaming data infrastructure, product launches, technical insights, and efficiency optimizations from the AutoMQ team.

Join developers worldwide who leverage AutoMQ's Apache 2.0 licensed platform to simplify streaming data infra. No spam, just actionable content.

I'm not a robot
reCAPTCHA

Never submit confidential or sensitive data (API keys, passwords, credit card numbers, or personal identification information) through this form.