KIP-1150 Diskless Topics Explained | Kafka's Future

KIP-1150 is an Apache Kafka Improvement Proposal that introduces diskless topics — a new storage mode where partition data is written directly to object storage (like Amazon S3) instead of local disks. Accepted by the Kafka community on March 2, 2026, with 9 binding votes and 5 non-binding votes, it represents the first time the Apache Kafka community has officially endorsed object storage as the future of Kafka's data layer.

That endorsement matters. For years, a growing number of Kafka-compatible platforms — AutoMQ, WarpStream, and others — have been building on the premise that local disk replication is the wrong storage model for the cloud. KIP-1150 settles the directional debate: the community agrees. But agreeing on the destination and arriving there are very different things. The Apache Kafka community's consensus-driven development model, while one of open source's greatest strengths, means that major architectural changes take years — sometimes many years — to reach production readiness.

So what does KIP-1150 actually propose, what are the engineering challenges ahead, and — based on the community's own track record — when might diskless topics be production-ready?

What KIP-1150 Proposes: The Technical Vision

Diskless Topics: A New Storage Mode

At its core, KIP-1150 introduces a new type of topic that bypasses Kafka's traditional storage model entirely. Instead of writing data to local disks and replicating it across brokers in different availability zones, diskless topics write data directly to object storage. The proposal envisions a cluster where both classic (disk-based) and diskless topics coexist — operators choose the storage mode per topic based on their latency and cost requirements.

The name "diskless" is slightly misleading. Brokers still use small amounts of local disk for KRaft metadata, caching, and short-term staging. What disappears is the use of local disks as the primary, durable store for partition data. Object storage (S3, GCS, Azure Blob) becomes the source of truth.

What Changes Architecturally

The implications ripple through several layers of Kafka's design:

Storage layer: Data durability shifts from broker-to-broker replication to object storage's built-in redundancy (11 nines for S3). This eliminates the need for replication.factor=3 and the cross-AZ network traffic it generates.
Broker role: Brokers become lighter. Without terabytes of local partition data, they move closer to stateless compute nodes — easier to scale, replace, and rebalance.
Metadata model: Diskless topics require a different coordination mechanism. KIP-1150 delegates the detailed design to sub-proposals: KIP-1163 (Diskless Core) covers the produce and consume paths, while KIP-1164 defines the coordination layer.

What KIP-1150 Does Not Settle

It's important to understand what the vote on March 2, 2026 actually approved. KIP-1150 is what the community calls a "meta KIP" or "motivational KIP" — it establishes directional consensus on what Kafka should do, not how it should do it. The Aiven team behind the proposal described this strategy explicitly: they split the proposal to isolate the "do we want this?" question from the technical implementation details.

This means several critical questions remain open:

How exactly will the produce path handle the latency gap between local disk writes and S3 HTTP round-trips?
What consistency guarantees will diskless topics provide under failure scenarios?
How will the batch coordinator — the component that tracks which data lives where in object storage — be implemented and scaled?
What happens to compacted topics, transactions, and exactly-once semantics in a diskless world?

These aren't minor details. They're the engineering challenges that will determine whether diskless topics actually work in production — and how long it takes to get there.

The Long Road from KIP to Production

Most articles about KIP-1150 stop at explaining the proposal. But for engineering teams making technology decisions, the more important question is: when can I actually use this? The Apache Kafka community's own history provides a sobering answer.

Case Study: KRaft (KIP-500)

KIP-500 — the proposal to replace ZooKeeper with a self-managed metadata quorum — is the most relevant precedent. It was a fundamental architectural change driven by a major vendor (Confluent), with strong community support from day one.

Milestone	Date	Time from Proposal
KIP-500 proposed	Late 2019	—
Early Access (Kafka 2.8)	April 2021	~1.5 years
Marked stable (Kafka 3.3)	Late 2022	~3 years
ZK migration path (Kafka 3.5/3.6)	2023	~4 years
ZooKeeper fully removed (Kafka 4.0)	March 2025	~5.5 years

From proposal to the point where teams could confidently run KRaft in production without ZooKeeper: roughly five and a half years. And this was a project with Confluent's full engineering weight behind it, a clear migration path, and broad community consensus from the start.

Case Study: Tiered Storage (KIP-405)

Tiered Storage is an even more instructive comparison because it touches the same layer that KIP-1150 targets: Kafka's storage architecture.

Milestone	Date	Time from Proposal
KIP-405 first draft	December 2018	—
KIP accepted	February 2021	~2 years 3 months
Early Access (Kafka 3.6)	October 2023	~5 years
GA release (Kafka 3.9)	~2025	~7 years

Seven years from first draft to GA — and the GA release still carries significant limitations. Compacted topics aren't supported. Only one remote partition's data can be served per fetch request, which can limit client throughput. The feature requires KRaft mode. These are the kinds of rough edges that take additional release cycles to smooth out.

The Aiven team behind KIP-1150 acknowledged this history directly. In their blog post announcing the acceptance, they noted that KIP-405's long timeline was a reference point for their strategy of splitting KIP-1150 into a meta-KIP and separate implementation KIPs.

Why KIP-1150 May Take Even Longer

KIP-1150 faces challenges that neither KRaft nor Tiered Storage encountered:

The scope of change is larger. KRaft replaced one metadata system with another — a significant but bounded change. Tiered Storage added a new storage tier alongside the existing one. KIP-1150 introduces an entirely different storage paradigm that must coexist with the classic model. Local disk replication and object storage direct-write are fundamentally different approaches to data durability, and the metadata models they require are fundamentally different too.

Community consensus is still forming. The acceptance of KIP-1150 triggered something unprecedented in Kafka's history: three competing proposals (KIP-1150, KIP-1176, KIP-1183) addressing the same problem space simultaneously. While the authors of KIP-1183 (from Slack) eventually dropped their proposal and endorsed KIP-1150, this level of competing activity signals that the community hasn't fully converged on implementation details. KIP-1176, which proposes extending Tiered Storage to handle active segments, represents a meaningfully different architectural approach.

The implementation KIPs are still under discussion. KIP-1163 (Diskless Core) and KIP-1164 (Diskless Coordinator) — the proposals that define how diskless topics actually work — have not yet been accepted. The community is still debating the produce path, consume path, and coordination mechanisms. These discussions will take months, possibly years, before reaching consensus.

Based on the KRaft and Tiered Storage precedents, a reasonable projection for KIP-1150's timeline looks something like this: implementation KIPs accepted by late 2027, early access in a Kafka release around 2028–2029, and production-grade GA no earlier than 2029–2031. This is not a criticism of the community — it's the natural pace of consensus-driven development for a project used by thousands of organizations worldwide. Getting it right matters more than getting it fast.

The Aiven Question: Who's Driving KIP-1150?

KIP-1150's primary champion is Aiven, a company whose core business is managed open-source data services, with Apache Kafka as a major revenue driver. Aiven has been transparent about their motivations — they want Kafka to remain competitive in the cloud era, and they see diskless topics as essential to that goal.

Aiven has also built Inkless, an implementation of KIP-1150 as a fork of Apache Kafka. The repository is active: as of late April 2026, it shows recent commits (the latest just days ago), 583+ pull requests, 10 releases (the most recent being Inkless 0.37 in March 2026), and 90 GitHub stars. The Inkless team has stated explicitly that the fork is intended to be temporary — a proving ground for the diskless concept that will be deleted once the feature is merged upstream.

That said, the commercial dynamics raise questions worth considering. Aiven's business model is built on managing infrastructure that customers find complex and expensive to run themselves. Diskless Kafka, if it dramatically reduces operational complexity and cost, could erode the value proposition of managed Kafka services — including Aiven's own. This doesn't mean Aiven's commitment to KIP-1150 is insincere; their blog posts and engineering investment suggest genuine conviction. But it does mean the pace of upstream contribution may be influenced by factors beyond pure engineering capacity. The relatively modest community engagement with the Inkless repository (90 stars, 9 forks) compared to the scale of the ambition raises questions about how broad the contributor base will be when the heavy implementation work begins.

It remains to be seen whether Aiven will sustain the multi-year engineering investment required to shepherd KIP-1150 through the full Apache community process — from implementation KIP acceptance through code review, testing, and multiple release cycles.

AutoMQ: Diskless Kafka in Production Today

KIP-1150 validates the direction. But if your team needs the cost savings and operational benefits of diskless Kafka now — not in 2029 or 2031 — waiting for the community implementation isn't the only option.

AutoMQ has been running diskless Kafka in production since 2024. Built on the Apache Kafka codebase (not a protocol-compatible reimplementation), AutoMQ replaced only the lowest-level storage implementation with a cloud-native engine that writes directly to object storage. Everything above the storage layer — the wire protocol, consumer group semantics, exactly-once delivery, Kafka Connect, Schema Registry compatibility — remains unchanged.

The architectural approach differs from KIP-1150 in a key way: AutoMQ introduces a pluggable WAL (Write-Ahead Log) layer that absorbs writes at low latency before asynchronously flushing to S3. This means teams can choose their latency profile per deployment:

S3 WAL (default): ~500ms end-to-end latency, lowest cost, maximum elasticity. Suitable for log aggregation, batch ETL, and data lake ingestion.
Regional EBS or NFS WAL: sub-10ms latency with Multi-AZ durability. Suitable for real-time analytics, fraud detection, and microservice communication.

KIP-1150, by contrast, is still working through how to handle the latency gap between local disk writes and object storage round-trips — one of the hardest unsolved problems in the proposal.

What This Means in Practice

The production results speak to what diskless Kafka delivers when the engineering challenges are solved:

100% Kafka protocol compatibility: AutoMQ passes 387 Apache Kafka test cases and tracks upstream within a two-month code gap. Existing Kafka clients, Strimzi Operator, Connect, and Schema Registry work without modification.
Seconds-level scaling: Partition reassignment is a metadata-only operation. Grab reduced their rebalancing time from 6+ hours to under 1 minute.
Zero cross-AZ traffic cost: All data sharing happens through regional storage services. No broker-to-broker replication across availability zones.
Production-proven at scale: JD.com runs 13 trillion messages per day across 8,000+ nodes. Grab, Tencent, LG U+, HubSpot, and others have validated the architecture in production.

The Cost Difference

For a concrete comparison, consider a workload with 300 MiB/s average write throughput, 2× read fanout, and 72-hour retention on AWS (Multi-AZ):

Platform	Monthly Cost
Apache Kafka (self-managed)	~$103,195
AutoMQ (BYOC)	~$21,804

That's a 79% cost reduction, driven primarily by eliminating cross-AZ replication traffic and replacing 3× EBS replication with single-copy S3 storage. The gap widens at higher throughput because cross-AZ traffic — Kafka's single largest cost driver — scales linearly with throughput in traditional Kafka but stays near zero with AutoMQ.

Cost data generated using the AutoMQ pricing calculator based on AWS us-east-1 pricing.

Don't Wait — The Future Is Already Here

KIP-1150's acceptance is a milestone worth celebrating. It means the Apache Kafka community — the stewards of the most widely deployed streaming platform in the world — has officially recognized that object storage is the future of Kafka's data layer. The directional debate is over.

But the engineering work is just beginning. If KRaft and Tiered Storage are any guide, production-ready diskless topics in upstream Apache Kafka are still years away. Your Kafka infrastructure bill, meanwhile, arrives every month.

The vision that KIP-1150 describes — brokers decoupled from storage, zero cross-AZ replication, elastic scaling without data migration — is already running in production. AutoMQ has been delivering these capabilities since 2024, built on the same Apache Kafka codebase, with the same protocol compatibility, under the Apache License 2.0.

Explore the architecture: AutoMQ Diskless Engine
Calculate your savings: AutoMQ Pricing Calculator
Try it yourself: AutoMQ on GitHub (Apache 2.0, 9.7K+ stars)
Start a free trial: AutoMQ Cloud — 14-day free trial, no credit card required

KIP-1150 confirmed the direction. AutoMQ is already there.