Skip to Main Content

Table Topic Integration with AWS Glue

AutoMQ Table Topic supports integration with Iceberg, enabling streaming data lake ingestion, analytics, and querying while eliminating the need for ETL configuration and maintenance. This article introduces how to configure Table Topic and Glue integration in an AWS environment.

Prerequisites

To use the AutoMQ Table Topic feature in an AWS environment, the following conditions must be met:

  • Version Constraints: AutoMQ instance version >= 1.4.1 is required.

  • Instance Constraints: The Table Topic feature must be enabled when creating the AutoMQ instance in order to use Table Topic. Once the instance is created, the Table Topic feature cannot be enabled afterward.

  • Resource Requirements: On AWS, you can use Table Topic with either AWS Glue as the Data Catalog or AWS S3 Tablebucket as the Data Catalog.

Steps for Operation

Step 1: Create Glue Catalog Integration

Before using Table Topic, you need to go to the AutoMQ console to create a Glue Catalog integration for recording catalog information. The operation instructions are as follows:

  1. Log in to the AutoMQ console and click the Integration menu.
  1. Select Create Glue Catalog Integration and provide the following information:

    1. Name: Provide a unique name for the integration configuration.

    2. Deployment Configuration: Choose the deployment configuration to which the integration belongs; it must match the subsequent instance creation.

    3. Warehouse: Enter the Object storage Bucket used by the data lake for long-term data storage.

  1. After filling in the Warehouse parameters, AutoMQ will generate the IAM Policy required to access the Bucket and display the IAM Role used by the AutoMQ instance. Please refer to this Policy in the Cloud provider IAM console to create the authorization.

  2. Once authorization is completed, click to create the Glue Catalog integration.

Steps 2: Create an AutoMQ Instance and Enable the Table Topic Feature

To use the AutoMQ Table Topic feature, it needs to be enabled during the instance creation. Only then can data be streamed into the lake subsequently. Therefore, refer to the following instructions for configuration during instance creation:

Tip

Note:

After enabling Table Topic for the AutoMQ instance, not all Topics will have the stream table enabled by default. Streaming data into the lake still needs to be configured on a per Topic basis as needed.

If you need to use Table Topic, you must enable it when creating the instance. Once the instance is created, this configuration cannot be changed.

Step 3: Create Topic and Configure Transfer Tables

After enabling the Table Topic feature in the AutoMQ instance, you can configure transfer tables as needed when creating Topics. The specific operations are as follows:

  1. Enter the instance from Step 2, go to the Topic list, and click Create Topic.

  2. In the configuration for creating a Topic, enable Table Topic conversion and configure the following parameters:

    1. Namespace: The namespace is used to isolate different Iceberg tables, similar to the Database in the Data Catalog. It is advised to configure the appropriate parameter values based on business requirements.

    2. Schema Constraints Type: Set whether Topic messages comply with Schema constraints. If 'Schema' is selected, Schema constraints are enabled, requiring the message Schema to be registered with the AutoMQ built-in SchemaRegistry. Following messages must strictly adhere to the Schema, and the subsequent Table Topic will utilize the Schema fields to populate the Iceberg table. If 'Schemaless' is selected, it indicates that the message content has no explicit Schema constraints. In this case, the message Key and Value will be used as overall fields to populate the Iceberg table.

  1. Click Confirm to create a Topic that supports the transfer table.

Step 4: Produce Messages and Query Iceberg Table Data in Real Time

After completing the AutoMQ instance configuration and Table Topic creation, you can test data production and query data in the Iceberg table.

  1. Click to enter the Topic details, go to the Produce Messages tab, input the test message Key and message Value, and send the message.

  2. Go to the AWS Glue Console to view the Iceberg database and tables written by AutoMQ.

Click View Data
  1. Click View Data, enable AWS Athena to query tables in Glue, and you will see AutoMQ converting Kafka messages into corresponding table records in real-time. Users can also use other query engines for analysis and computation.