Skip to Main Content

Kafka MirrorMaker 2(MM2): Usages & Best Practices

Overview

Kafka MirrorMaker 2 (MM2) represents a significant evolution in Kafka's cross-cluster replication capabilities. Introduced in Kafka 2.4.0 through KIP-382, MM2 addresses the limitations of its predecessor by leveraging the Kafka Connect framework to provide a robust, scalable solution for data replication across Kafka clusters. This comprehensive guide explores MM2's architecture, configuration, and best practices for optimal implementation.

Core Architecture and Components

MirrorMaker 2 is built on the Kafka Connect framework, which provides a distributed, fault-tolerant foundation for data replication. Unlike the original MirrorMaker that used a simple consumer-producer pair, MM2 employs specialized connectors to handle different aspects of the replication process.

The Four Essential Connectors

MM2's architecture consists of four primary connectors, each serving a distinct purpose in the replication workflow:

These connectors work together to create a comprehensive replication system that not only transfers data but also maintains configuration consistency and enables consumer migration between clusters.

Key Features

Automated Topic Management

MM2 automatically detects new topics and partitions in the source cluster, eliminating the need for manual intervention when adding new topics. It also synchronizes topic configurations between clusters, ensuring consistent settings across environments.

Consumer Group Offset Translation

One of MM2's most valuable features is its ability to translate consumer group offsets between clusters. This enables seamless consumer migration during failover scenarios, maintaining exactly-once semantics for consumer groups.

Flexible Replication Topologies

MM2 supports various replication topologies, including:

  • Active-passive for disaster recovery

  • Active-active for bidirectional replication

  • Hub-and-spoke for data aggregation

  • Multi-region for geographic distribution

Common Use Cases

Disaster Recovery

MM2 enables organizations to implement robust disaster recovery strategies by continuously replicating data to a backup cluster. In the event of a primary cluster failure, applications can seamlessly switch to the backup cluster with minimal disruption since MM2 maintains consumer offset mappings between clusters.

Geographic Replication

For organizations with global operations, MM2 facilitates replicating data across multiple regions, improving data locality and reducing access times for geographically distributed applications. This follows the "Best Practice: Consume from remote, produce to local" pattern as noted in the Instaclustr blog [1].

Data Isolation

MM2 allows organizations to create separate environments for different purposes such as testing, development, or compliance requirements. Each environment can have its own Kafka cluster with necessary data replicated from production.

Data Aggregation

For organizations with multiple Kafka clusters, MM2 can aggregate data into a central cluster for analytics, reporting, or other purposes, enabling centralized processing while maintaining decentralized production environments.

Implementation Guide

Configuration and Deployment

Basic Configuration Parameters

A typical MM2 configuration file includes the following essential parameters:


# Specify cluster aliases
clusters = source, destination

# Connection information
source.bootstrap.servers = source-kafka-1:9092,source-kafka-2:9092
destination.bootstrap.servers = dest-kafka-1:9092,dest-kafka-2:9092

# Enable replication flow
source->destination.enabled = true

# Topics to replicate (regex)
source->destination.topics = topic-pattern-.*

# Replication factor settings
replication.factor = 3
checkpoints.topic.replication.factor = 1
heartbeats.topic.replication.factor = 1
offset-syncs.topic.replication.factor = 1


Deployment Methods

MM2 can be deployed using several methods:

  1. Using the connect-mirror-maker.sh script included in the Kafka distribution:

./bin/connect-mirror-maker.sh ./config/mirror-maker.properties

  1. Using container orchestration platforms like Kubernetes with operators such as Strimzi that provide a KafkaMirrorMaker2 custom resource.

  2. Through managed services like Aiven or Instaclustr that offer MirrorMaker 2 as a managed service.

Performance Tuning for High Throughput

When dealing with high-throughput topics, default MM2 configurations often prove insufficient. Based on recommendations from Klarrio [2] and other sources, the following parameters require careful tuning:

For replication across high-latency networks, larger batch sizes become essential. Research shows that with a 100ms round-trip time, small batches severely limit throughput. For optimal performance over WAN connections, batch sizes should be at least 32MiB.

Client Configuration Override in Kafka Connect

Because MM2 is built on Kafka Connect, configuring client settings requires special attention:

  1. First, enable configuration overrides in Connect's

connector.client.config.override.policy=All

  1. Then use the correct prefixes for client configurations:

    • For source mode: producer.override.\* and source.consumer.\*

    • For sink mode: consumer.override.\* and sink.producer.\*

For example, to configure the consumer in source mode:


source.consumer.fetch.max.bytes=50000000
source.consumer.max.partition.fetch.bytes=50000000

Known Issues and Limitations

Offset Translation Issues

MM2 may translate offsets incorrectly if LAG shows negative on the target cluster. According to Aiven's documentation [3], this issue is expected to be resolved in version 3.3.0.

Configuration Synchronization Limitations

MM2 doesn't fully replicate all topic configurations:

  • It always sets min.insync.replicas = 1 in destination topics, regardless of source configuration

  • The replication factor of target topics may not match the source cluster's configuration

Topic Naming

By default, MM2 prefixes source cluster names to replicated topics (e.g., "source.topic-name"). While this helps prevent conflicts in complex replication topologies, it may not be desired in all cases.

Best Practices

Deployment Architecture

  • Deploy MM2 in the target data center to minimize latency for the critical consumer path [4]

  • Run MM2 in a separate cluster from your Kafka brokers to isolate resource usage

  • Allocate sufficient resources based on the volume of data being replicated

Security Configuration

  • Implement proper security credentials for both source and target clusters

  • For the source cluster, configure read-only access to protect source data

  • For Kerberos-enabled clusters, properly configure JAAS

Monitoring and Management

  • Implement JMX monitoring for MM2 using tools like Prometheus and Grafana

  • Monitor key metrics including:

    • Replication latency

    • Consumer lag

    • Worker task states

    • Task errors

Topic Management

  • Use carefully designed topic patterns to include only necessary topics

  • Consider using blacklist patterns to exclude internal topics

  • For high-priority topics, consider dedicated MM2 instances with specific topic patterns

Failover Testing

  • Regularly test failover scenarios to ensure recovery procedures work as expected

  • Validate consumer offset translation during failover tests

  • Document and automate failover procedures for operational teams

Comparison with Alternatives

Several alternatives to MM2 exist for cross-cluster replication:

Conclusion

Kafka MirrorMaker 2 significantly improved over its predecessor, offering robust functionality for cross-cluster data replication through the Kafka Connect framework. Its ability to maintain topic configurations, preserve consumer offsets, and support complex replication topologies makes it suitable for various use cases from disaster recovery to geo-replication.

While MM2 requires careful configuration and tuning for high-throughput scenarios, particularly over high-latency networks, it provides a flexible open-source solution for Kafka cluster replication. By following the best practices outlined in this guide and properly addressing known limitations, organizations can successfully implement MM2 for their Kafka replication needs.

If you find this content helpful, you might also be interested in our product AutoMQ. AutoMQ is a cloud-native alternative to Kafka by decoupling durability to S3 and EBS. 10x Cost-Effective. No Cross-AZ Traffic Cost. Autoscale in seconds. Single-digit ms latency. AutoMQ now is source code available on github. Big Companies Worldwide are Using AutoMQ. Check the following case studies to learn more:

References:

  1. Apache Kafka MirrorMaker 2 in Practice

  2. Kafka MirrorMaker Overview

  3. MirrorMaker2 Configuration Exercise

  4. Cluster Switching with Conduktor Gateway

  5. Data Migration Guide

  6. Kafka Alternatives: Understanding MirrorMaker

  7. How to Pass Apache Kafka MirrorMaker2 Config for the Producer

  8. Kafka MirrorMaker 2 Theory

  9. Known Issues - Kafka MirrorMaker

  10. Kafka MirrorMaker2 Implementation Analysis

  11. Mirror Maker 2.0 Performance Tuning & Optimisation

  12. Use MirrorMaker 2.0 with Apache Kafka - Azure HDInsight

  13. Setup Bidirectional Kafka MirrorMaker2 Replication

  14. Use Kafka MirrorMaker 2.0 with Event Hubs

  15. Mirroring High Throughput Topics with Kafka MirrorMaker 2

  16. MirrorMaker 2 Performance Tuning

  17. Mastering Kafka Migration with MirrorMaker 2

  18. Configuring Failover in Conduktor Gateway

  19. Create MirrorMaker Source Connector

  20. Kafka MirrorMaker 2.0 Guide - Azure Docs

  21. Getting Up to Speed with MirrorMaker 2

  22. Kafka Multi-cluster and MirrorMaker

  23. Create MirrorMaker Checkpoint Connector

  24. Understanding Kafka MirrorMaker 2.0

  25. Introducing MirrorMaker 2.0

  26. Scale MirrorMaker2 to Run Multiple Instances

  27. Kafka Administration Guide

  28. Create MirrorMaker Heartbeat Connector

  29. Kafka MirrorMaker2 Topics Replication Factor

  30. Use Kafka MM2 to Synchronize Data Across Clusters

  31. How to Setup a Dedicated Mirror Maker 2 Cluster

  32. Kafka Topics CLI Tutorial

  33. MirrorMaker2 Issue #3207 - Strimzi

  34. Unveiling MirrorMaker2 Principles

  35. Demystifying Kafka MirrorMaker 2

  36. MirrorMaker 2.0 Tutorial

  37. Complete Guide to Kafka MirrorMaker

  38. Optimizing Kafka Performance

  39. Kafka Users Mailing List - MirrorMaker Discussion