Blog

Migrate from Confluent to Kafka-Compatible Shared Storage: What Changes and What Should Not

The hardest question in a Confluent migration is rarely whether another platform can store events. The hard question is whether the migration changes the contract your applications already depend on. Producers expect the Kafka protocol. Consumers expect partition ordering, committed offsets, group rebalances, and familiar error handling. Platform teams expect Admin APIs and operational runbooks that still feel like Kafka rather than a new streaming system wearing a Kafka label.

That concern is healthy. A migration from Confluent to Kafka-compatible shared storage should not ask application teams to rewrite producer or consumer logic. It should change the infrastructure model underneath Kafka: where durable data lives, how brokers recover, how partitions move, and how capacity is scaled.

What changes vs what stays the same

Kafka-compatible shared storage is attractive because it draws a clean line. Above the line, topics, partitions, offsets, clients, and consumer groups should remain recognizable. Below the line, broker-local disks are no longer the center of the durability model. Object storage becomes the primary data repository, a write-ahead log absorbs the write path, and brokers become easier to replace because less durable state is trapped inside each node.

What should stay the same after migration

The first migration principle is simple: preserve the Kafka contract before changing the operational substrate. Apache Kafka clients use a binary protocol over TCP, fetch metadata from brokers, and route produce or fetch requests to partition leaders. Kafka's protocol documentation describes how clients and servers negotiate supported API versions, which is the basis for rolling upgrades and mixed client fleets.

In practice, a Confluent migration plan should treat the following behaviors as compatibility gates rather than nice-to-have checks:

  • Producers should continue to publish to the same topic and partition model, including key-based partitioning assumptions that preserve ordering within a partition.
  • Consumers should continue to use the same consumer group concepts, rebalance behavior, committed offsets, and reset policies.
  • Admin workflows should continue to create topics, inspect metadata, update configurations, and observe cluster state through Kafka-compatible interfaces.
  • Existing client libraries should remain viable unless the team is already planning a separate client upgrade for support or security reasons.
  • Schema, connector, and governance dependencies should be handled explicitly because they may sit beside Kafka rather than inside the Kafka broker API itself.

This is where teams sometimes blur two different migrations. Moving from Confluent can involve the Kafka data plane, networking, identity, Schema Registry, Connect, Flink, and observability. A Kafka-compatible shared-storage platform mainly addresses the broker and storage architecture. The clean migration plan inventories surrounding services separately, then tests the Kafka compatibility surface with real clients before moving traffic.

Offsets deserve special attention because they are where compatibility becomes operationally visible. Confluent's consumer documentation describes how a consumer starts from an offset reset policy when there is no committed position, commits offsets as it reads, and resumes from the last committed offset after reassignment or restart. Those semantics decide whether a cutover causes duplicate processing, missed records, or a controlled replay.

What changes below the Kafka API

If everything above the API line stayed exactly the same and nothing underneath changed, there would be little reason to migrate. The value of Kafka-compatible shared storage comes from changing the broker's relationship with durable data. Traditional Kafka and many Kafka-derived services place the replicated log on broker-local storage. Brokers own local log segments that must be replicated, retained, rebalanced, replaced, and recovered.

Shared storage changes that ownership model. Brokers still serve Kafka requests, coordinate leaders, and handle reads and writes, but durable log data is no longer coupled as tightly to each broker's local disk. Object storage provides the elastic persistence layer, while a WAL absorbs the low-latency write path before data is organized into object storage.

That difference changes the cost and failure domains of a Kafka platform:

Migration dimensionBroker-local Kafka modelKafka-compatible shared-storage model
Durable stateLog segments live on broker-attached storage and are replicated between brokers.Durable data is externalized into shared storage, with a WAL for the write path.
Broker replacementReplacing a broker can require replica catch-up and data movement.Brokers can be replaced with less local state reconstruction.
ScalingAdding brokers often triggers partition reassignment and storage balancing work.Scaling focuses more on compute capacity and traffic placement.
Cost modelStorage, replication, and network movement are tied to broker sizing.Object storage economics become central, while broker capacity can track workload pressure.
Operational focusDisk fullness, replica movement, and broker recovery dominate many runbooks.WAL health, object storage behavior, cache efficiency, and stateless broker orchestration matter more.

The table is not saying local-disk Kafka is obsolete. It is saying the operational center of gravity moves, with many Kafka operations problems pushed into more elastic cloud primitives.

Broker-local storage vs shared object storage

In a broker-local design, data durability comes from replication across brokers. That model made sense when Kafka was built for clusters where disks were close to compute and cross-node replication was a natural way to survive machine failure. In cloud environments, the broker often runs on rented compute, attached block storage, and metered networking, while the log still behaves as if every broker should own a slice of durable data.

Broker-local storage vs shared storage

Shared storage pulls the durable layer away from broker identity. Object storage becomes the main repository because it is elastic and designed for massive durability and retention. The catch is that object storage does not behave like a low-latency local disk, so the write path still has to protect latency and durability.

This is where AutoMQ fits as a Kafka-compatible shared-storage implementation. AutoMQ replaces Kafka's native log storage layer with S3Stream, a stream storage library that offloads Kafka log storage to object storage and adds WAL storage capabilities. Its architecture combines object storage as the primary repository, WAL storage for efficient writes, and stateless broker nodes above that storage layer. The important point for migration planning is the boundary: the Kafka API stays above it, while the storage engine below it changes.

The WAL is the piece that keeps the architecture honest. Without it, "put every Kafka write directly into object storage" would run into the latency and IOPS profile of object storage APIs. With it, writes can be durably staged and organized before being uploaded to object storage. AutoMQ's documentation describes data being written to the WAL first and uploaded to S3 storage almost in real time.

Partition movement and stateless scaling

Partition reassignment is one of the places where architects feel the difference most directly. In broker-local Kafka, moving partition ownership is tied to moving or rebuilding replica data. Even when the tooling is mature, the operation consumes disk, network, and time, especially for topics with large retained logs.

Shared storage changes the question from "how do we copy enough data to make this broker safe?" to "how do we move serving responsibility while durable data remains externalized?" It does not remove leader election, cache warm-up, metadata propagation, or traffic balancing. It does reduce the broker-local durable state that must be reconstructed before a node can be useful.

Partition movement without broker-local data copy

This is also why the word "stateless" needs precision. A stateless broker in this context still participates in the Kafka data plane and still needs CPU, memory, network, cache, configuration, credentials, and observability. The narrower and more useful claim is that durable log ownership is no longer anchored primarily to the broker's local disk.

For migration decision makers, the implication is clear. Test compatibility at the Kafka API layer, then test load and failure behavior in the changed storage layer: object storage latency, WAL pressure, broker loss, and hot partitions.

Migration risks to validate before cutover

A good Confluent replacement plan separates semantic risk from operational risk. Semantic risk asks whether the same application behavior is preserved. Operational risk asks whether the new platform can carry the workload under real failure, retention, security, and cost constraints. Mixing the two leads to vague acceptance tests.

Start with a compatibility test plan that uses your real client mix. Confluent documents client options across Java, C/C++, Python, Go, .NET, and JavaScript, and its support pages distinguish between client versions, platform versions, and support windows. "Kafka-compatible" still requires knowing the client versions in production.

The validation plan should include five checks:

  • Produce and consume with the same client libraries, authentication mechanism, compression settings, idempotence settings, and batching profile used in production.
  • Verify topic creation, partition count, replication-related configuration behavior, retention policies, and Admin API workflows.
  • Migrate or mirror data in a way that preserves offset expectations, then test consumer restart, rebalance, replay, and reset scenarios.
  • Exercise failure paths: broker restart, broker replacement, object storage latency, WAL pressure, and network interruption.
  • Compare observability: consumer lag, broker request latency, produce and fetch errors, storage latency, cache behavior, and operational alerts.

The offset portion should not be treated as a spreadsheet exercise. If consumers move to a new cluster, committed positions may need to be translated, mirrored, reset, or intentionally replayed. Dual-write, mirror, and pause-and-cutover approaches each need an explicit authority for consumer progress and rollback.

There is also a service boundary risk. Confluent often sits inside a larger operating model: Schema Registry, Connect, stream processing, private networking, identity, audit, quotas, and support processes. The Kafka-compatible broker migration is a foundation, not a complete inventory of every adjacent dependency.

How AutoMQ implements Kafka-compatible shared storage

AutoMQ's architecture is useful to examine because it shows what "Kafka-compatible shared storage" means as an implementation, not just a category. AutoMQ keeps the Kafka compute layer visible to applications while replacing the broker-local log storage model with S3Stream. Its shared storage architecture uses object storage as the primary data repository, a WAL layer for efficient writes, and stateless brokers above that storage layer.

The most important design choice is the separation between API compatibility and storage mechanics. Applications still interact through Kafka clients and Kafka concepts. Underneath, S3Stream implements stream storage APIs such as append, fetch, trim, and position management over object storage. That lets AutoMQ preserve the Kafka-facing model while moving durable data into cloud storage primitives.

This changes operations in three ways. Broker replacement becomes less dominated by local log recovery. Scaling can focus more on compute and traffic because storage capacity is no longer expanded mainly by adding broker disks. The cost model shifts toward object storage retention and WAL choices rather than overprovisioning broker-attached storage.

There are trade-offs. Object storage behavior becomes part of the streaming system's behavior, and the WAL layer becomes critical infrastructure. Shared storage reduces one class of Kafka operations work, but it introduces a different architecture that should be tested with production-shaped workloads.

That is the right standard. The goal is not to pretend migration is effortless. The goal is to avoid rewriting Kafka applications while changing the parts of the platform that create long-term cloud friction.

When shared storage is a good Confluent replacement path

Kafka-compatible shared storage is strongest when a team wants to keep Kafka as the application contract but change the economics and operations of the platform. It fits organizations with large retained streams, frequent scaling pressure, uneven cluster utilization, or painful broker recovery and reassignment windows.

It is less compelling when the primary reason for using Confluent is a managed ecosystem service rather than the Kafka data plane itself. If the key dependency is managed connectors, governance workflows, hosted stream processing, or support, the business case has to include those adjacent services.

The decision framework is therefore not "Confluent or AutoMQ" in the abstract. It is a more concrete architecture question:

If your current pain is...Shared storage helps when...Validate carefully...
Broker scaling takes too longDurable data does not need to be rebuilt on every new broker.Partition leadership, cache warm-up, and rebalance impact.
Storage cost grows faster than trafficRetention can move toward object storage economics.WAL configuration, object storage requests, and read amplification.
Recovery windows are operationally expensiveBrokers carry less durable local state.Failure drills and SLO behavior under degraded dependencies.
Application teams fear migration workKafka clients and semantics remain the contract.Real client versions, offsets, security, and ecosystem dependencies.

A successful migration keeps the familiar parts boring. Producers produce. Consumers commit offsets. Operators inspect topics and lag. The interesting changes happen below that surface: data lands in shared storage, brokers become easier to replace, and capacity planning starts to look less like a disk-placement puzzle.

For teams evaluating this path, the most useful next step is a proof of compatibility rather than a feature checklist. Pick representative topics, keep the same client libraries, mirror enough data to test offsets and replay, and run broker failure drills before discussing production cutover.

References

FAQ

Does migrating from Confluent to shared storage require application rewrites?

It should not require rewriting producer or consumer logic if the target platform is genuinely Kafka-compatible and your application depends on standard Kafka APIs. You still need to validate client versions, authentication, serializers, offset behavior, Admin API usage, and surrounding services such as Schema Registry or Connect.

Are consumer offsets preserved automatically?

Not automatically in every migration pattern. Offsets are part of the consumer group state and must be handled deliberately through mirroring, translation, reset, replay, or a controlled cutover. The right method depends on whether you dual-write, mirror topics, pause traffic, or move consumers in phases.

Is shared storage the same as Kafka tiered storage?

No. Tiered storage usually keeps the broker-local log as the hot path and offloads older segments to remote storage. Kafka-compatible shared storage changes the primary storage architecture more deeply by externalizing durable log storage and making brokers less tied to local disks.

What should be tested first?

Start with the application contract: produce, consume, commit offsets, rebalance, reset offsets, administer topics, and observe lag with your actual clients. Then test the changed infrastructure model: broker replacement, WAL pressure, object storage behavior, cache efficiency, and failure recovery.

Where does AutoMQ fit in this migration category?

AutoMQ is a Kafka-compatible shared-storage implementation. It keeps Kafka-facing APIs and concepts while replacing Kafka's native log storage layer with S3Stream, which combines object storage, WAL storage, and stateless brokers to change the operational and cost model underneath Kafka.

Newsletter

Subscribe for the latest on cloud-native streaming data infrastructure, product launches, technical insights, and efficiency optimizations from the AutoMQ team.

Join developers worldwide who leverage AutoMQ's Apache 2.0 licensed platform to simplify streaming data infra. No spam, just actionable content.

I'm not a robot
reCAPTCHA

Never submit confidential or sensitive data (API keys, passwords, credit card numbers, or personal identification information) through this form.