KIP-1150 turns a familiar Kafka buying question into a roadmap question. Platform teams are asking when broker-local disks stop being the default durability boundary for selected topics, what operational evidence will exist at each milestone, and how much architectural commitment is reasonable before the implementation details settle.
A storage roadmap can influence contract timing, migration sequencing, FinOps assumptions, and vendor shortlists long before a feature is ready for broad production use. If the team treats "diskless topics" as a single checkbox, it will miss the harder questions: which workloads can tolerate the emerging latency profile, which Kafka semantics must be proven topic by topic, and which operating model remains stable if the roadmap shifts.
Why Teams Search for KIP-1150
The search intent behind KIP-1150 is practical. Buyers are trying to understand whether Apache Kafka's storage model is moving toward a lower-cost cloud architecture, and whether they should wait, pilot, or choose a Kafka-compatible shared-storage platform now. The pain is already visible: retained data consumes broker-attached storage, replicas move bytes between zones, and recovery or rebalancing can turn storage placement into an operations project.
KIP-1150, Diskless Topics, is marked Accepted in the Apache Kafka wiki. It defines the motivation and end-user requirements for a diskless topic type where local broker disks are not the primary durable store for user records. Implementation details belong in follow-up KIPs, which is why buyers need a roadmap lens. Accepted direction is useful. It is not a release vehicle, managed service commitment, or completed migration guide.
The buyer discipline is to separate category direction from procurement evidence. Category direction says the ecosystem recognizes a real cloud cost and elasticity problem. Procurement evidence says a specific implementation meets your compatibility, latency, governance, and operations requirements.
The Roadmap Is a Sequence of Evidence
A serious roadmap review starts by asking what must become observable before adoption. In classic Kafka, topic durability, broker ownership, replication, and local log storage are tightly coupled. Diskless topics change that coupling by moving the durable data path toward shared or object storage while keeping Kafka clients and topic behavior recognizable.
For roadmap planning, the sequence is more useful than a binary yes/no answer:
- Design acceptance: the community or vendor has described the user-visible goals, non-goals, and compatibility expectations. This is where KIP-1150 sits for upstream Apache Kafka.
- Core mechanics: the implementation defines how writes are acknowledged, how offsets are assigned, how batches are indexed, how readers find data, and how failures are repaired.
- Operational surface: metrics, alerts, quotas, admin APIs, upgrades, and recovery procedures are documented enough for SRE ownership.
- Workload coverage: transactions, compaction, Kafka Streams state stores, high fan-out reads, cold replay, retention, and deletion are either supported or explicitly out of scope.
- Production evidence: release notes, managed service guarantees, reference architectures, and failure-mode guidance exist for the exact deployment pattern the buyer will use.
This sequence prevents a common mistake: assuming that an accepted architecture direction has the same risk profile as a supported production feature.
Coexistence Is the First Buyer Question
KIP-1150 describes diskless topics as a topic-level capability rather than a claim that every Kafka workload must move at once. That creates an important roadmap question: will your estate run classic, tiered, and diskless topic classes side by side, or will you prefer a platform where the storage architecture is consistently shared-storage-first?
Mixed topic classes can reduce migration risk because teams can move append-heavy or long-retention topics first and keep latency-sensitive or semantically complex topics on the familiar path. The cost is operational complexity. Runbooks must explain which topic class uses which durability path, which metrics matter, which limitations apply, and which rollback procedure belongs to each class.
A shared-storage-first platform makes the model more uniform. The buyer evaluates one storage architecture across the estate rather than a special topic type. The trade-off is that the platform must prove the full Kafka surface earlier, because there is less room to route difficult topics back to a classic local-disk path.
| Roadmap choice | Buyer benefit | Buyer risk to manage |
|---|---|---|
| Mixed classic and diskless topics | Incremental adoption by workload class | More runbook branches and topic-policy rules |
| Diskless only for long retention | Early storage-cost relief with limited scope | Hot-path latency and replay assumptions may differ by topic |
| Shared-storage-first platform | Consistent operating model and independent storage scaling | Requires broad compatibility and failure testing up front |
| Wait for upstream maturity | Less early implementation risk | Opportunity cost if current Kafka costs and recovery limits are already material |
The table is a forcing function. Buyers should identify which operating model they are willing to own before treating the roadmap as a feature list.
Cost Claims Need Workload Classes
Diskless topic discussions often start with cost because cloud block storage, replica placement, and inter-zone traffic are visible on a bill. The economic intuition is sound: when Kafka replicas are tied to broker disks, retained data and recovery movement can scale in expensive ways. Object storage can change the storage curve, and a design that reduces broker-to-broker replication can change the network curve.
Be skeptical of any roadmap argument that presents one universal savings number. The cost result depends on topic class, write rate, retention, fan-out, cache behavior, region, object storage request pattern, and network topology.
Build the model around workload classes, not cluster averages:
| Workload class | Roadmap question | Cost model focus |
|---|---|---|
| Append-heavy ingestion | Can acknowledged writes recover cleanly after broker or zone failure? | Replication traffic, retained storage, WAL capacity, object writes |
| Long-retention audit log | Can cold replay and deletion governance remain predictable? | Object storage, lifecycle policy, read request cost, restore time |
| Analytics fan-out | Can many consumers read without destabilizing the cache path? | Cache hit rate, remote read latency, cross-zone reads |
| Compacted state topic | Are tombstones, key churn, and restore semantics supported? | Compaction work, metadata indexing, storage amplification |
| Transactional pipeline | Are idempotency, fencing, and transaction markers preserved? | Operational risk first, cost second |
FinOps can still ask for a storage and network forecast, but the platform team can refuse to collapse semantically different topics into one number. Cost savings that apply only to some topic classes are still valuable; they are dangerous only when presented as estate-wide assumptions.
Latency Is a Product Requirement, Not a Footnote
Object storage is durable and elastic, but Kafka buyers do not buy durability alone. They buy a streaming contract that includes produce latency, fetch latency, consumer lag behavior, replay performance, and predictable degradation under failure. Diskless topic roadmaps therefore need explicit latency classes.
The practical split is between the acknowledgment path and the read path. On writes, the buyer needs to know which local, WAL, metadata, and object storage operations complete before producer success. On reads, the buyer needs to know when data comes from memory, local cache, staged storage, or object storage, and how that path changes for lagging consumers or cold replay.
Do not reduce the review to p50 latency. The workloads that break trust are usually at the edge: a consumer group that falls hours behind, a broker that fails during a burst, a cache that turns over during a large replay, or a topic with high fan-out after an incident.
A diskless topic is not production-ready because the average write path looks acceptable. It is ready when the failure and replay paths are measurable, documented, and owned.
That sentence belongs in the buyer's evaluation plan. It keeps latency from being treated as a benchmark slide instead of an operating contract.
Migration Risk Is a Roadmap Dependency
Migration planning should begin before the roadmap reaches production maturity, because buyers need time to classify topics, owners, clients, and rollback procedures. Waiting for a release announcement and then asking which topics can move leaves too little time for rollback design.
Capture the details that change risk:
- Client surface: client libraries, protocol versions, producer settings, consumer group behavior, admin tooling, schemas, ACLs, quotas, and observability integrations.
- Semantic dependencies: ordering assumptions, compaction, transactions, idempotent producers, retention, deletion, replay windows, and stream-processing state.
- Operational dependencies: dashboards, alert thresholds, backup assumptions, broker replacement, incident ownership, and cloud IAM boundaries.
- Rollback dependencies: offset handling, mirrored data, dual-write behavior, topic deletion, backfill procedures, and the owner who can approve a backout.
This inventory informs contract timing. If the first useful workload class is append-heavy ingestion, the buyer may be able to pilot earlier. If the business value depends on compacted topics or transactional pipelines, the roadmap must provide stronger semantic proof before procurement can treat the option as a near-term replacement.
Procurement Should Ask for Architecture Evidence
Cloud streaming procurement often overweights commercial packaging and underweights architecture evidence. Diskless topics make that imbalance risky. A buyer needs to know not only whether the service supports Kafka APIs, but also how storage durability, network boundaries, and recovery ownership work.
Ask each candidate architecture for the same artifacts:
| Evidence area | What buyers should request |
|---|---|
| Compatibility | Kafka protocol scope, supported admin APIs, client versions, transaction and compaction status |
| Durability | Acknowledgment contract, WAL behavior, object storage commit model, metadata consistency |
| Recovery | Broker loss, zone impairment, object storage throttling, cache rebuild, and restore procedures |
| Cost | Storage classes, request patterns, cross-zone paths, retained data model, scaling assumptions |
| Governance | IAM roles, encryption, audit logs, deletion behavior, data residency, lifecycle policy |
| Operations | Metrics, alerts, quotas, upgrade process, runbooks, SLO boundaries |
If a candidate cannot explain a row, mark it as a dependency rather than an objection. Dependencies can be acceptable in a pilot. They are not acceptable as unstated assumptions in a production migration.
How AutoMQ Fits the Evaluation
Once the roadmap questions are clear, AutoMQ is a relevant implementation to test because it is a Kafka-compatible cloud-native streaming platform built around Shared Storage architecture rather than broker-local disks as the primary durable data home.
Evaluate AutoMQ with the same scorecard. Bring a representative topic class, keep existing Kafka clients where possible, configure realistic retention, run broker and zone failure drills, observe read behavior during lag and replay, and check whether network traffic stays within the intended boundaries. AutoMQ's Kafka compatibility documentation, BYOC and Software deployment options, and zero cross-AZ traffic guidance give teams concrete material to test rather than a future-state promise.
This is where AutoMQ can be useful for buyers deciding whether to wait. If the immediate pain is storage elasticity, broker recovery, or cross-zone traffic, a shared-storage Kafka-compatible platform gives the team something measurable now. If the requirement is to adopt upstream Apache Kafka diskless topics when they mature, AutoMQ can still serve as a comparison point.
A Roadmap Decision Worksheet
The final output of a buyer review should be a decision worksheet, not a generic "diskless Kafka strategy." A worksheet records what is known, what remains uncertain, and what would trigger the next step.
Use this structure for each topic class:
| Field | Decision content |
|---|---|
| Workload class | Throughput, retention, latency sensitivity, consumer fan-out, semantic dependencies |
| Current pain | Storage growth, cross-zone traffic, recovery time, scaling friction, operational toil |
| Candidate path | Wait for upstream, adopt managed diskless topics, pilot shared-storage platform, keep classic Kafka |
| Required evidence | Compatibility, latency class, failure recovery, governance, cost model, SRE ownership |
| Revisit trigger | Release milestone, cost threshold, incident pattern, workload growth, roadmap change |
The worksheet should also state what will not move. A team may decide to pilot diskless architecture for audit and ingestion topics while keeping transactional or compacted topics on a proven path until stronger evidence exists. That partial decision is often more valuable than a broad strategy that nobody can execute.
KIP-1150 makes the Kafka storage roadmap more explicit, but it does not remove the buyer's responsibility to validate architecture. Strong teams will not ask "is diskless ready?" in the abstract. They will ask which topic class is ready, which evidence is missing, and which operating model they are willing to own. To test a shared-storage implementation against that worksheet, start with the AutoMQ Cloud deployment path and bring one representative workload through compatibility, latency, recovery, and cost validation.
References
- Apache Kafka KIP-1150: Diskless Topics: https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A%2BDiskless%2BTopics
- Apache Kafka KIP-1163: Diskless Core: https://cwiki.apache.org/confluence/display/KAFKA/KIP-1163%3A%2BDiskless%2BCore
- Apache Kafka documentation, design and storage: https://kafka.apache.org/documentation/#design_storage
- Apache Kafka documentation, Tiered Storage: https://kafka.apache.org/documentation/#tiered_storage
- AWS EC2 On-Demand Pricing, data transfer: https://aws.amazon.com/ec2/pricing/on-demand/
- AWS S3 Pricing: https://aws.amazon.com/s3/pricing/
- Amazon MSK Developer Guide, Tiered Storage: https://docs.aws.amazon.com/msk/latest/developerguide/msk-tiered-storage.html
- AutoMQ documentation, What Is AutoMQ: https://docs.automq.com/automq/what-is-automq/overview?utm_source=blog&utm_medium=reference&utm_campaign=gs100-0015
- AutoMQ documentation, S3Stream Shared Streaming Storage: https://docs.automq.com/automq/architecture/s3stream-shared-streaming-storage/overview?utm_source=blog&utm_medium=reference&utm_campaign=gs100-0015
- AutoMQ documentation, Compatibility with Apache Kafka: https://docs.automq.com/automq/what-is-automq/compatibility-with-apache-kafka?utm_source=blog&utm_medium=reference&utm_campaign=gs100-0015
- AutoMQ documentation, Eliminate Inter-Zone Traffic: https://docs.automq.com/automq-cloud/eliminate-inter-zone-traffics/overview?utm_source=blog&utm_medium=reference&utm_campaign=gs100-0015
FAQ
Is KIP-1150 a production feature in Apache Kafka?
KIP-1150 is marked Accepted and defines the direction and end-user requirements for Diskless Topics. Buyers should still verify the status of implementation KIPs, Apache Kafka releases, and managed service support before treating it as a production feature.
Should buyers wait for upstream diskless topics?
Waiting can make sense when the organization strongly prefers upstream Apache Kafka and the current cost or recovery pain is manageable. If storage growth, cross-zone traffic, or broker recovery already creates material business pressure, it is reasonable to test Kafka-compatible shared-storage platforms while tracking the upstream roadmap.
Are diskless topics the same as Tiered Storage?
No. Tiered Storage usually moves older log segments to remote storage while the active Kafka log can still depend on broker-local disks. Diskless topics shift the primary durability model for selected topic data toward shared or object storage, which changes the write, read, recovery, and operations questions.
Which workloads should be evaluated first?
Append-heavy ingestion, audit logs, and long-retention topics are often better first candidates because their behavior is easier to validate. Topics that depend on transactions, compaction, Kafka Streams state stores, or strict tail latency need deeper evidence before migration.
How should AutoMQ be compared with a diskless topic roadmap?
Use the same buyer worksheet. Test Kafka client compatibility, produce and fetch latency, broker failure, zone impairment, replay, retention, governance, cross-zone traffic, and rollback. AutoMQ is most relevant when the buyer wants Kafka-compatible streaming with shared storage and independent compute and storage scaling.
