Skip to main content

Spark connector

Spark connector

Notifications

User guide:

Naming format of the JAR file: starrocks-spark-connector-${spark_version}_${scala_version}-${connector_version}.jar

Methods to obtain the JAR file:

  • Directly download the Spark connector JAR file from the Maven Central Repository.
  • Add the Spark connector as a dependency in your Maven project's pom.xml file and download it. For specific instructions, see user guide.
  • Compile the source codes into Spark connector JAR file. For specific instructions, see user guide.

Version requirements:

Spark connector

Spark

StarRocks

Java

Scala

1.1.1

3.2, 3.3, or 3.4

2.5 and later

8

2.12

1.1.0

3.2, 3.3, or 3.4

2.5 and later

8

2.12

Release note

1.1

1.1.1

This release mainly includes some features and improvements for loading data to StarRocks.

NOTICE

Take note of the some changes when you upgrade the Spark connector to this version. For details, see Upgrade Spark connector.

Features

  • The sink supports retrying. #61
  • Support to load data to BITMAP and HLL columns. #67
  • Support to load ARRAY-type data. #74
  • Support to flush according to the number of buffered rows. #78

Improvements

  • Remove useless dependency, and make the Spark connector JAR file lightweight. #55 #57
  • Replace fastjson with jackson. #58
  • Add the missing Apache license header. #60
  • Do not package the MySQL JDBC driver in the Spark connector JAR file. #63
  • Support to configure timezone parameter and become compatible with Spark Java8 API datetime. #64
  • Optimize row-string converter to reduce CPU costs. #68
  • The starrocks.fe.http.url parameter supports to add a http scheme. #71
  • The interface BatchWrite#useCommitCoordinator is implemented to run on DataBricks 13.1 #79
  • Add the hint of checking the privileges and parameters in the error log. #81

Bug fixes

  • Parse escape characters in the CSV related parameters column_seperator and row_delimiter. #85

Doc

  • Refactor the docs. #66
  • Add examples of load data to BITMAP and HLL columns. #70
  • Add examples of Spark applications written in Python. #72
  • Add examples of loading ARRAY-type data. #75
  • Add examples for performing partial updates and conditional updates on Primary Key tables. #80

1.1.0

Features

  • Support to load data into StarRocks.

1.0

Features

  • Support to unload data from StarRocks.