Table Topic Integration with AWS S3 Table

AutoMQ Table Topic offers seamless integration with Iceberg for streaming data analytics in lakes, removing the necessity for ETL configuration and upkeep. This article guides you through configuring the integration of Table Topic with the AWS S3 Table Bucket in an AWS environment.

Prerequisites

To utilize the AutoMQ Table Topic feature in an AWS environment, the following conditions must be fulfilled:

Version Constraint: The AutoMQ instance version must be >= 1.4.1.
Instance Constraint: The Table Topic feature needs to be enabled during the creation of the AutoMQ instance, as it cannot be enabled retrospectively once the instance is established.
Resource Requirements: On AWS, when using the Table Topic feature, you can leverage AWS Glue as the Data Catalog or utilize an AWS S3 Table Bucket for the Data Catalog.

Operation Steps

Step 1: Create an S3 Table Bucket

To integrate AutoMQ with the S3 Table Bucket, you need to first create a Table Bucket through the AWS S3 console. Ensure it's in the same deployment region as AutoMQ. This Table Bucket will be configured in Step 2.

Step 2: Create S3 Table Catalog Integration

Before using Table Topic, you need to access the AutoMQ console to create an S3Table Catalog integration for recording Catalog information. The operation instructions are as follows:

Select Create S3Table Catalog Integration and fill in the following information:
1. Name: Enter a distinctive integration configuration name.
2. Deployment Configuration: Choose the deployment configuration that the integration belongs to, ensuring consistency when creating instances later.
3. Warehouse: Enter the S3 TableBucket assigned for object storage within the data lake. This bucket is intended for long-term data storage.

After entering the Warehouse parameters, AutoMQ will generate the necessary IAM Policy to access the bucket and display the IAM Role used by the AutoMQ instance. Please visit the cloud provider's IAM console to create authorization using this policy as a reference.
Once the authorization is established, you can proceed to create the S3Table Catalog integration.

Step 3: Create an AutoMQ Instance and Enable the Table Topic Feature

The AutoMQ Table Topic feature must be enabled during the creation of the instance to support streaming data into the lake. Therefore, follow the instructions below when configuring the instance:

Note:

After enabling Table Topic in an AutoMQ instance, not all Topics will automatically activate stream tables. You still need to configure each Topic individually to facilitate streaming data into the lake.

To use Table Topic, you must enable it while creating the instance. Once the instance is created, this configuration cannot be altered.

Step 4: Create Topic and Configure Stream Table

Once Table Topic functionality is enabled in the AutoMQ instance, you can configure stream tables as needed during the Topic creation process. The specific steps are as follows:

Access the instance in Step 2, find the Topic list, and click Create Topic.
In the Topic creation configuration, enable Table Topic conversion and configure the following parameters:
1. Namespace: The namespace is used to isolate different Iceberg tables and corresponds to the Database in the Data Catalog. It is recommended to set this parameter based on business affiliation.
2. Schema Constraint Type: Specifies whether Topic messages comply with schema constraints. By selecting Schema, you activate schema constraints, which necessitate that message schemas be registered with the AutoMQ built-in SchemaRegistry. Any subsequent message sent must strictly conform to the schema, and the Table Topic will then utilize the fields from this schema to populate the Iceberg table. If Schemaless is chosen, it signifies that the message content lacks explicit schema constraints; in this scenario, the message Key and Value are collectively used to populate the Iceberg table.

Click Confirm to create a Topic that supports streaming to tables.

Step 5: Produce Messages and Query Iceberg Table Data in Real-Time

After configuring the AutoMQ instance and creating Table Topics, you can test data production and query data in the Iceberg table.

Click to enter the Topic details, navigate to the Produce Messages tab, input the test message Key and message Value, and send the message.
Visit the AWS S3 Console to view the Iceberg database and tables written by AutoMQ.

Click Query Table from Athena to enable AWS Athena to query the table data in the Table Bucket. You can observe how AutoMQ converts Kafka messages into corresponding data table records in real-time. Users can also utilize other query engines for analysis and computation.

Prerequisites​

Operation Steps​

Step 1: Create an S3 Table Bucket​

Step 2: Create S3 Table Catalog Integration​

Step 3: Create an AutoMQ Instance and Enable the Table Topic Feature​

Step 4: Create Topic and Configure Stream Table​

Step 5: Produce Messages and Query Iceberg Table Data in Real-Time​