Kafka Log Pipeline at 2.2 Billion Messages a Day: How LG U+ Made Brokers Stateless with AutoMQ

Log pipelines are easy to underestimate because the first version usually looks boring. Agents collect application logs, Kafka buffers the stream, search and analytics systems consume it, and platform teams move on. Then the pipeline becomes the place production systems depend on during deployments, incidents, security analysis, and debugging. At that point, a Kafka cluster is no longer a passive buffer. It is part of the operational nervous system.

That is the useful lens for reading LG U+'s AutoMQ story. According to the public LG U+ customer case, the Korean telecommunications company runs a log pipeline that processes 2.2 billion messages per day. The pipeline connects Fluentd, Kafka-compatible streaming, OpenSearch, and Sumo Logic, and it works across AWS and private network boundaries. The search problem behind this case is not "how do I run Kafka on ECS?" The better question is: what happens when the brokers in a large log pipeline stop being the machines that own the data?

Why Log Pipelines Expose Kafka's Stateful Operating Model

Traditional Kafka operations inherit a strong assumption: brokers are compute and storage together. A broker accepts traffic, serves reads, participates in replication, and owns local log segments. That model is battle-tested, but it makes operational change heavier than platform teams often want. Replacing a broker is not the same as replacing a stateless service instance, because the broker's identity is tied to the partitions and data it carries.

Log pipelines make that coupling visible because their traffic pattern is continuous and operationally sensitive. Logs may not be the core business transaction, but they become critical the moment something breaks. Capacity changes, rolling upgrades, and recovery workflows still have to preserve a consistent stream. When brokers hold durable local state, every infrastructure operation has to ask a storage question before it can answer a compute question.

The practical pain shows up in a few recurring places:

Broker replacement carries data context. A failed broker is not only a container or VM to restart; the cluster has to consider partition placement and log state.
Rolling updates are constrained by data ownership. Teams can automate deployment steps, but the safe pace still depends on preserving availability and catching up replicas.
Capacity changes can trigger operational work. Adding compute is easy in cloud infrastructure; redistributing broker-owned state turns a small scaling request into a planned operation.
Hybrid environments narrow the margin for improvisation. Private network access and downstream integrations make ad hoc routing changes less attractive.

This is why "stateless Kafka brokers" is more than a diagram label. Broker lifecycle starts to look like compute lifecycle again: brokers can be managed as interchangeable service instances while durable log data lives in shared storage.

LG U+'s 2.2-Billion-Message Daily Log Pipeline

LG U+ is a major telecommunications operator, so its log pipeline supports an environment where operational visibility has to scale with customer-facing systems. The public case states that the workload processes 2.2 billion messages per day and uses Kafka as the central streaming layer between Fluentd and downstream analysis systems. Fluentd collects and forwards log data, while OpenSearch and Sumo Logic consume the stream for search, monitoring, and analytics.

That combination describes a familiar production shape. The log agents are already deployed, the consumers are already chosen, and Kafka's value is the compatibility layer between them. A replacement that asks the team to rewrite producers or rebuild consumers would create its own migration risk.

Public case detail	What it tells a Kafka operator
2.2 billion messages per day	Operational mechanics matter, not only raw throughput.
Fluentd integration	Existing log collection paths needed to keep working.
OpenSearch and Sumo Logic	Downstream search and observability systems depended on Kafka-compatible delivery.
Terraform and ECS deployment	Infrastructure change was expected to be repeatable.

The table stays within public facts. It does not assume LG U+'s topic count, partition count, retention policy, or private architecture details. The evidence is enough to study the architecture decision: keep the Kafka interface without making broker-local state the center of operations.

Keeping Existing Log Tools Compatible

Log pipelines rarely fail because one component cannot be improved in isolation. They fail because the edges are sticky: producers sit across application teams, consumers belong to different groups, and every integration has assumptions. A clean-room streaming system may look elegant, but the migration path is where engineering reality pushes back.

In LG U+'s case, AutoMQ's Kafka compatibility was central because the surrounding tools could remain part of the pipeline. Fluentd could continue forwarding log events into a Kafka-compatible interface, and OpenSearch and Sumo Logic could remain downstream consumers. The important change was below that contract: AutoMQ separates compute from storage, using object storage as the durable data layer.

That distinction keeps the story grounded. AutoMQ did not remove the need to design producer throughput, consumer behavior, retention, or monitoring. It changed the ownership boundary: brokers serve Kafka traffic while durable data lives outside their lifecycle.

This is also where AutoMQ differs from a narrow "tiered storage" story. Tiered storage can move older data away from broker disks, but the broker-local primary log often remains. AutoMQ's shared storage architecture is designed around stateless brokers: the Kafka-facing compute layer can change without treating local disk as the durable center.

Stateless Brokers as Interchangeable Compute Resources

The phrase "stateless broker" can sound too neat until you connect it to maintenance work. A log pipeline owner cares about what happens when a broker task is replaced, a service is updated, or capacity moves. In a broker-local Kafka model, those events can drag storage state into the change window. In a stateless model, the broker is closer to a compute worker that can leave and rejoin while the durable log remains in shared storage.

AutoMQ's stateless broker documentation describes that mechanism as storage-compute separation for Kafka-compatible workloads. S3Stream stores Kafka log data in shared storage, while brokers handle protocol serving and active compute. Operational workflows can focus more on service health and less on where historical data sits.

For LG U+, that matched the shape of a large log pipeline. A broker instance should be special only while it is healthy, reachable, and useful to the running service, not because it carries yesterday's log segments. Teams still need to validate latency, failure behavior, network access, monitoring, and recovery. The benefit is that hard state moves to a durable shared layer instead of broker-local disks.

Rolling Updates, Terraform, and Operational Control

The public LG U+ case includes deployment details that are easy to overread as the headline: AWS ECS, Terraform, a private VPC environment, and ECS Circuit Breaker. They matter as supporting evidence. The broader lesson is that LG U+ treated the Kafka log pipeline as infrastructure to deploy, check, roll forward, and roll back with clear controls.

Terraform gave the team a way to define repeatable infrastructure. ECS provided the container deployment substrate. AWS documents ECS deployment circuit breaker as a mechanism that can detect failed deployments and optionally roll them back. For a log pipeline, this guardrail matters because failed deployments can interrupt the data path engineers use to understand other failures.

The stateless-broker architecture makes these controls more meaningful. If broker tasks are tied to local durable data, a deployment controller can restart containers but cannot remove the stateful consequences. If broker tasks are interchangeable compute resources, health checks, deployment stages, circuit breakers, and rollback policies line up better with the service model.

That alignment is the engineering story. LG U+ had a large Kafka-compatible log pipeline and needed operational clarity: integrations should keep working, brokers should be replaceable, and deployment automation should have a clean failure model. ECS and Terraform were tools in that model, not the reason the architecture is interesting.

A Checklist for Large Kafka Log Pipelines

LG U+'s workload is specific, but the checklist applies to many teams running Kafka for logs, events, or observability data. The starting point is not vendor selection. It is a map of where state lives and how much operational work that state creates.

Ask these questions before the next expansion or deployment redesign:

Which parts of the pipeline require Kafka compatibility? Identify producers, consumers, connectors, security settings, and operational tools that assume Kafka protocol behavior.
Where does durable log data live during normal operation? If the answer is broker-local disk, every lifecycle workflow has to account for broker-owned state.
What happens when a broker is replaced during peak log traffic? A good design should make the expected behavior boring enough to rehearse.
Can deployment automation detect and reverse a bad rollout? Circuit breakers and rollback policies are useful when the service architecture gives them a clear unit to replace.
Can the team test replay, catch-up, and failure inside real network boundaries? Hybrid and private VPC deployments need verification where the actual connectivity constraints exist.

These questions are deliberately operational. Kafka log pipelines do not become reliable because an architecture diagram says "cloud-native." They become reliable when ordinary maintenance work stops being scary.

What Other Kafka Teams Can Learn from LG U+

The strongest part of the LG U+ case is its restraint. The public facts describe a practical workload: 2.2 billion log messages per day, familiar log and observability tools, hybrid cloud constraints, and infrastructure-as-code deployment. The architectural move was to keep the Kafka-facing contract while changing the broker's relationship to state.

That pattern is relevant for teams searching for "Kafka log pipeline" or "stateless Kafka brokers" because it reframes the decision. Kafka remains a strong interface for buffering, fan-out, and compatibility across a messy ecosystem of agents and consumers. The question is whether the traditional broker-local storage model is still the right operational unit for a cloud log pipeline.

AutoMQ's answer is to preserve Kafka compatibility while separating broker compute from durable storage. In LG U+'s case, that made brokers easier to treat as replaceable infrastructure resources and allowed the deployment workflow to lean on Terraform, ECS, health checks, and rollback controls. For teams with similar pressure, the lesson is to make broker state visible in the design review and ask whether it belongs on the broker at all.

The next time a log pipeline looks healthy because dashboards are green, look at the maintenance path. If replacing a broker still feels like moving part of the data platform by hand, the pipeline is showing where the next reliability problem will come from.

FAQ

What did LG U+ use AutoMQ for?

LG U+ used AutoMQ in a Kafka-compatible log pipeline that processes 2.2 billion messages per day, according to AutoMQ's public customer case. The pipeline includes Fluentd for log collection and downstream systems such as OpenSearch and Sumo Logic.

Why are stateless Kafka brokers useful for log pipelines?

Stateless brokers reduce the coupling between broker lifecycle and durable log data. In a large log pipeline, that can make broker replacement, rolling updates, and capacity changes easier to reason about.

Did LG U+ have to replace Fluentd, OpenSearch, or Sumo Logic?

The public case presents AutoMQ as part of a Kafka-compatible pipeline with Fluentd, OpenSearch, and Sumo Logic. It does not state that LG U+ replaced those tools, so compatibility with the existing log pipeline is the safe reading.

Is the main lesson about running Kafka on AWS ECS?

No. ECS, Terraform, private VPC deployment, and ECS Circuit Breaker are supporting details. The main lesson is broader: a high-volume Kafka log pipeline becomes easier to operate when brokers are stateless and durable data is separated from broker lifecycle.

How is AutoMQ different from Kafka tiered storage?

Tiered storage usually keeps Kafka's broker-local primary log model while moving older segments to remote storage. AutoMQ uses shared storage as the durable data layer for Kafka-compatible streaming, which changes broker operations because brokers no longer need to own local durable log state.

What should teams validate before moving a Kafka log pipeline to stateless brokers?

Teams should test producer compatibility, consumer catch-up behavior, latency, object storage access, private network routing, rollback behavior, monitoring, and recovery. LG U+'s public case is a useful reference, but each production log pipeline has its own traffic shape and constraints.

Kafka Log Pipeline at 2.2 Billion Messages a Day: How LG U+ Made Brokers Stateless with AutoMQ

Why Log Pipelines Expose Kafka's Stateful Operating Model

LG U+'s 2.2-Billion-Message Daily Log Pipeline

Keeping Existing Log Tools Compatible

Stateless Brokers as Interchangeable Compute Resources

Rolling Updates, Terraform, and Operational Control

A Checklist for Large Kafka Log Pipelines

What Other Kafka Teams Can Learn from LG U+

FAQ

What did LG U+ use AutoMQ for?

Why are stateless Kafka brokers useful for log pipelines?

Did LG U+ have to replace Fluentd, OpenSearch, or Sumo Logic?

Is the main lesson about running Kafka on AWS ECS?

How is AutoMQ different from Kafka tiered storage?

What should teams validate before moving a Kafka log pipeline to stateless brokers?

Sources

Trusted by teams running Kafka at scale

Grab

Tencent

LG U+

Kafka Log Pipeline at 2.2 Billion Messages a Day: How LG U+ Made Brokers Stateless with AutoMQ

Why Log Pipelines Expose Kafka's Stateful Operating Model

LG U+'s 2.2-Billion-Message Daily Log Pipeline

Keeping Existing Log Tools Compatible

Stateless Brokers as Interchangeable Compute Resources

Rolling Updates, Terraform, and Operational Control

A Checklist for Large Kafka Log Pipelines

What Other Kafka Teams Can Learn from LG U+

FAQ

What did LG U+ use AutoMQ for?

Why are stateless Kafka brokers useful for log pipelines?

Did LG U+ have to replace Fluentd, OpenSearch, or Sumo Logic?

Is the main lesson about running Kafka on AWS ECS?

How is AutoMQ different from Kafka tiered storage?

What should teams validate before moving a Kafka log pipeline to stateless brokers?

Sources

Trusted by teams running Kafka at scale

Grab

Tencent

LG U+

Newsletter