Load data using Kafka connector
Load data using Kafka connector
StarRocks provides a self-developed connector named Apache Kafka® connector (StarRocks Connector for Apache Kafka®) that continuously consumes messages from Kafka and loads them into StarRocks. The Kafka connector guarantees at-least-once semantics.
The Kafka connector can seamlessly integrate with Kafka Connect, which allows StarRocks better integrated with the Kafka ecosystem. It is a wise choice if you want to load real-time data into StarRocks. Compared with Routine Load, it is recommended to use the Kafka connector in the following scenarios:
- The format of source data is, for example, Protobuf, not JSON, CSV, or Avro.
- Customize data transformation, such as Debezium-formatted CDC data.
- Load data from multiple Kafka topics.
- Load data from Confluent cloud.
- Need finer control over load batch sizes, parallelism, and other parameters to achieve a balance between load speed and resource utilization.
Preparations
Set up Kafka environment
Both self-managed Apache Kafka clusters and Confluent cloud are supported.
- For a self-managed Apache Kafka cluster, make sure that you deploy the Apache Kafka cluster and Kafka Connect cluster and create topics.
- For Confluent cloud, make sure that you have a Confluent account and create clusters and topics.
Install Kafka connector
Submit the Kafka connector into Kafka Connect:
- Self-managed Kafka cluster:
- Download and unzip starrocks-kafka-connector-1.0.0.tar.gz.
- Copy the extracted directory to the libs directory of Kafka. Restart Kafka Connect to read the latest JAR files.
- Confluent cloud:
NOTE
The Kafka connector is not currently uploaded to Confluent Hub. You need to upload the compressed file to Confluent cloud.