Blog

Kafka on Ceph: Using S3-Compatible Storage for Private Cloud Kafka

Private cloud Kafka teams often meet Ceph through a practical infrastructure question: can the storage platform already used by OpenShift and internal services also support Kafka? The answer depends less on the word "Ceph" than on which Ceph interface Kafka is touching. Ceph block volumes, Ceph RGW object storage, and a Kafka-compatible shared-storage layer built on S3-compatible APIs are three different architectures with different failure behavior.

That distinction matters because Kafka is not a generic stateful application. Broker-local logs sit on the produce path, influence follower replication, determine recovery time, and shape how operators scale clusters. A Ceph-backed PersistentVolumeClaim can make broker storage easier to provision in Kubernetes, but it does not turn Kafka into a stateless service. Ceph RGW can provide S3-compatible object storage, but Apache Kafka does not become shared-storage Kafka merely because an object endpoint exists. A storage-compute separated system must redesign the log ownership model around shared storage, fencing, caching, and metadata.

Kafka and Ceph Architecture Options

Three Patterns Hidden Behind "Kafka on Ceph"

The first pattern is the most common in OpenShift and private Kubernetes environments: each Kafka broker receives a Ceph RBD-backed block volume through a storage class. Kafka still writes its local log directories. Replication still happens through Kafka ISR. Ceph is underneath the volume, not part of Kafka's log protocol.

The second pattern uses Ceph RGW as S3-compatible object storage. This is relevant for Kafka tiered storage, backup pipelines, data lake exports, or any Kafka-adjacent component that can write objects. Apache Kafka tiered storage, as defined in KIP-405, offloads completed log segments to a remote tier while keeping local storage for active data. That can reduce local retention pressure, but broker disks remain operationally important.

The third pattern is shared-storage Kafka. In this model, durable log data is externalized into a shared storage layer, and brokers behave more like replaceable compute nodes. AutoMQ fits this category: it is Kafka-compatible at the protocol and ecosystem layer, while its storage layer is designed around shared object storage. In a private cloud, Ceph RGW can be evaluated as the S3-compatible object storage candidate for that shared layer.

The patterns sound adjacent because they can all include Ceph. They are not interchangeable:

PatternCeph interfaceKafka storage modelWhat changes most
Broker local logs on Ceph blockRBD / CSI block volumeShared-nothing Kafka with broker-local replicasKubernetes volume provisioning and disk operations
Tiered or adjacent object storageRGW S3-compatible APIKafka local hot tier plus remote object tierRetention and historical reads
Kafka-compatible shared storageRGW as object storage candidateDurable log state externalized from brokersBroker replacement, scaling, and storage ownership

Architects should decide which question they are solving before selecting a pattern. "We need OpenShift to provision disks for brokers" points to Ceph block. "We need longer retention without sizing every broker for cold data" points to tiered storage or object offload. "We need brokers to scale and recover without moving partition data between local disks" points to shared storage.

Ceph Block Volumes for Broker Local Logs

Running Kafka broker logs on Ceph block volumes is conceptually straightforward. OpenShift schedules broker pods. The CSI driver attaches persistent block volumes. Kafka formats log directories and writes segments as it would on any mounted disk. Existing Kafka tools and operational mental models remain familiar.

This familiarity is the main advantage. Teams can keep Apache Kafka's replication model, topic configuration, controller behavior, and consumer semantics. Storage teams can expose standardized classes with replication, encryption, snapshots, and capacity policies. Platform teams can automate broker replacement because a pod can reattach its volume, subject to Kubernetes and storage-layer constraints.

The tradeoff is that the Kafka broker is still stateful. A partition replica is still a broker-local asset, even if the block device is network-backed. Rebalancing partitions still means copying Kafka data through Kafka replication. A broker volume issue still creates Kafka replica recovery work. Scaling storage independently from compute remains hard because the volume is sized and attached to a broker identity.

Performance analysis also changes. Kafka's local log path rewards predictable latency, sequential throughput, page cache efficiency, and fast fsync behavior. A Ceph RBD volume adds a networked storage path between the broker and OSDs. That path can be excellent when the Ceph cluster, network, and client tuning are designed for it, but it is not the same failure or latency profile as local NVMe. Kafka teams should test produce latency, follower fetch lag, controller operations during broker restart, and consumer catch-up under the same noisy-neighbor conditions expected in production.

Failure domains need special care. Kafka may place replicas across racks, zones, or worker pools. Ceph may replicate blocks across OSDs, hosts, racks, or rooms according to CRUSH rules. If those two placement systems are planned independently, the platform can accidentally concentrate risk. For example, Kafka may believe three replicas are rack-diverse while their underlying Ceph replicas share a power domain or top-of-rack network segment. The safe design is to align Kubernetes topology, Kafka broker placement, Ceph CRUSH hierarchy, and network redundancy as one capacity model.

Ceph RGW as Object Storage for Kafka-Adjacent Data

Ceph RGW exposes object storage with an S3-compatible API. For private cloud teams, this is attractive because it offers an internal object endpoint without sending Kafka data to a public cloud service. It also gives platform teams a common API for backups, analytics exports, tiered data, and cloud-native services running on OpenShift.

For Apache Kafka tiered storage, the key point is that the object store usually receives closed segments rather than replacing the hot local log path. KIP-405 describes remote log storage as a way to move older data to a remote tier while local storage continues to serve latency-sensitive tail reads. That architectural boundary is important when sizing brokers: the object store can reduce long-retention pressure, but it does not remove the need for local disk throughput, local retention headroom, and replica recovery planning.

Object gateway performance depends on a different set of bottlenecks than block volumes. The RGW layer must handle S3 requests, authentication, bucket namespace operations, object metadata, and object payload movement into the Ceph cluster. Kafka-like workloads can create many objects, frequent reads during backfill, and bursts when retention or compaction policies shift data. Operators should measure request rates, object size distribution, gateway CPU, gateway horizontal scaling, bucket index behavior, and the latency seen by Kafka clients using the S3 API.

Network topology can dominate the outcome. A broker writing to a local block volume over one storage network is not equivalent to a broker uploading objects through RGW over an application network, through load balancers, and then into OSD networks. The gateway becomes an explicit part of the data path. In private clouds, that may be a benefit because teams can isolate networks and keep traffic internal, but it also means the architecture needs clear bandwidth reservations and observability across brokers, RGW pods or nodes, monitors, managers, OSDs, and client-facing endpoints.

Operational complexity is also different. The Kafka team owns topic retention, broker health, and consumer impact. The Ceph team owns pools, placement groups, OSD capacity, CRUSH policy, RGW scaling, bucket lifecycle, and cluster recovery. The OpenShift team owns scheduling, storage classes, node labels, ingress or service routing, and certificate management. Kafka on Ceph works best when these teams share SLOs instead of handing incidents across a thin API boundary.

Ceph as a Shared-Storage Backend Candidate

Shared-storage Kafka is the pattern that changes the most. Instead of treating Ceph as a virtual disk for each broker, the Kafka-compatible system treats object storage as the durable substrate for log data. Brokers process Kafka protocol requests, cache hot data, and coordinate leadership, but persistent data is designed to be accessible through shared storage under strict ownership control.

This is where AutoMQ enters the architecture discussion naturally. AutoMQ is a Kafka-compatible shared-storage layer that separates compute from storage and uses S3-compatible object storage as its durable storage foundation. In a private cloud, Ceph RGW can be evaluated as one candidate S3-compatible layer for AutoMQ, alongside other object storage systems that meet the required latency, throughput, consistency, and operational expectations.

The evaluation is not "does Ceph support S3?" It is more specific:

  • Can the RGW endpoint sustain the write and read patterns produced by the storage layer, including object creation, fetch, metadata access, and retries?
  • Are object durability, bucket configuration, lifecycle rules, authentication, TLS, and endpoint availability compatible with the Kafka availability target?
  • Does the storage network have enough headroom for produce traffic, catch-up reads, recovery, and background compaction without starving client traffic?
  • Can the platform isolate failure domains so broker compute, gateway nodes, and OSD placement do not collapse under the same incident?
  • Are metrics, logs, and alerts available at the Kafka layer, object gateway layer, and Ceph cluster layer?

The benefit, when the storage substrate is proven, is architectural. Broker replacement no longer has to mean rebuilding the authoritative log from a failed machine. Scaling compute no longer has to be coupled to expanding broker-local disks. Partition movement can become a metadata and ownership operation rather than a bulk data copy. For private cloud architects, that is the reason to consider a Kafka-compatible shared-storage layer instead of stopping at RBD-backed broker volumes.

Failure Domain Planning

Ceph has its own placement intelligence through CRUSH. Kafka has its own placement logic through partition assignment, rack awareness, and replica sets. OpenShift adds node pools, zones, taints, topology spread constraints, and storage class behavior. A reliable Kafka-on-Ceph design aligns all three views.

Ceph Failure Domain Planning for Kafka

Start from the blast radius you are willing to tolerate. If a rack switch fails, should Kafka remain available? If an RGW group fails, should object-backed reads or writes continue through another gateway? If a Ceph host is drained, should Kafka broker latency remain inside the target envelope? These questions force the design to name failure domains explicitly rather than assuming replication at multiple layers automatically composes into availability.

For block-backed Kafka, check whether Kafka replica placement and Ceph replica placement are correlated. For RGW-backed object storage, check gateway load balancing, multisite or zone design where applicable, bucket placement, and OSD recovery behavior. For shared-storage Kafka, add the storage-layer metadata and fencing path: a broker failover is safe only if stale writers are prevented and readers can discover the authoritative log position.

The most dangerous design is the one that counts replicas without mapping dependencies. Three Kafka replicas on three worker nodes can still depend on the same storage network. A replicated Ceph pool can still experience degraded performance during OSD recovery. Multiple RGW endpoints can still share a load balancer or certificate path. Availability planning must draw the dependency graph, not only the component count.

Performance and Network Questions to Ask Early

Kafka performance testing on Ceph should be workload-shaped. A benchmark that only measures sequential write throughput on an empty volume does not represent a broker under producer bursts, follower replication, consumer catch-up, compaction, controller events, and storage recovery. Likewise, an object-store benchmark that only uploads large objects may miss the metadata and request-rate behavior of Kafka-adjacent storage.

The evaluation should include:

  • Tail latency for produce acknowledgments under steady load and during Ceph recovery.
  • Follower fetch lag and ISR stability when one broker or one storage node is degraded.
  • Consumer catch-up reads from historical data, especially when many consumers replay at once.
  • RGW request latency, error rates, throttling behavior, and gateway saturation.
  • Cross-zone or cross-rack traffic paths between brokers, gateways, and OSDs.
  • Operational events such as broker restart, pod rescheduling, OSD backfill, gateway rollout, certificate rotation, and bucket permission changes.

These tests are not bureaucracy. They reveal whether the architecture has hidden coupling. If a Ceph recovery event causes Kafka ISR churn, the block-volume pattern needs better isolation or tuning. If RGW saturation slows historical reads but not hot writes, tiered storage may still be acceptable with clear SLOs. If shared-storage metadata operations depend on an overloaded object path, the platform needs capacity isolation before production.

Choosing the Right Kafka-on-Ceph Pattern

Ceph Kafka Evaluation Matrix

Use Ceph block volumes when the priority is to run conventional Kafka on OpenShift with familiar broker-local logs and centralized storage operations. This is often the lowest-change path for teams that already operate Kafka and want Kubernetes-native provisioning. Treat it as stateful Kafka on networked block storage, not as storage-compute separation.

Use Ceph RGW with Kafka tiered storage or adjacent pipelines when the priority is long retention, internal object storage, and historical replay economics. This can be valuable for audit, observability, and analytics workloads. Keep the hot path sizing honest: local broker disks, page cache, and Kafka replication still matter.

Evaluate AutoMQ with Ceph RGW when the goal is Kafka-compatible shared storage in a private cloud. This path is most relevant when broker elasticity, faster replacement, storage-compute separation, and object-storage economics are strategic requirements. The decision should be backed by tests of the Ceph S3-compatible layer under the expected Kafka workload, not by a generic object-store checklist.

The architecture choice is therefore not "Kafka on Ceph: yes or no." It is a choice of where Kafka's durable log ownership lives. If ownership remains inside broker-local log directories, Ceph is providing infrastructure storage. If ownership moves into a Kafka-compatible shared storage layer, Ceph RGW becomes part of the streaming storage substrate. That is a deeper design decision, and it deserves a deeper evaluation.

References

FAQ

Can Kafka use Ceph block storage for broker logs?

Yes, Kafka can run with broker log directories placed on Ceph-backed block volumes when the Kubernetes, storage, and Kafka configurations are designed for that workload. The important caveat is architectural: Kafka still behaves as broker-local, shared-nothing Kafka. Ceph changes how volumes are provisioned and protected, but Kafka replicas, ISR behavior, reassignment, and broker recovery remain part of the Kafka operating model.

Is Ceph RGW enough to make Kafka stateless?

No. Ceph RGW provides an S3-compatible object API, but stateless Kafka requires a storage layer that is designed to externalize durable log state, coordinate ownership, fence stale writers, and serve reads correctly through broker changes. Object storage is a substrate, not the complete architecture.

When should a private cloud team evaluate AutoMQ with Ceph?

Evaluate AutoMQ with Ceph RGW when the platform goal is Kafka protocol compatibility plus shared-storage architecture in a private cloud. The strongest signals are frequent broker scaling, expensive partition movement, strict recovery objectives, and a desire to use an internal S3-compatible object storage layer. The evaluation should include RGW performance, network isolation, failure-domain mapping, and operational ownership across Kafka, Ceph, and OpenShift teams.

Newsletter

Subscribe for the latest on cloud-native streaming data infrastructure, product launches, technical insights, and efficiency optimizations from the AutoMQ team.

Join developers worldwide who leverage AutoMQ's Apache 2.0 licensed platform to simplify streaming data infra. No spam, just actionable content.

I'm not a robot
reCAPTCHA

Never submit confidential or sensitive data (API keys, passwords, credit card numbers, or personal identification information) through this form.