SSB Benchmark
SSB Flat-table Benchmarking
Star schema benchmark (SSB) is designed to test basic performance metrics of OLAP database products. SSB uses a star schema test set that is widely applied in academia and industry. For more information, see the paper Star Schema Benchmark. ClickHouse flattens the star schema into a wide flat table and rewrites the SSB into a single-table benchmark. For more information, see Star schema benchmark of ClickHouse. This test compares the performance of StarRocks, Apache Druid, and ClickHouse against SSB single-table datasets.
Test conclusion
- Among the 13 queries performed on SSB standard datasets, StarRocks has an overall query performance 2.1x that of ClickHouse and 8.7x that of Apache Druid.
- After Bitmap Indexing of StarRocks is enabled, the performance is 1.3x compared to when this feature is disabled. The overall performance of StarRocks is 2.8x that of ClickHouse and 11.4x that of Apache Druid.
Test preparation
Hardware
Machine | 3 cloud hosts |
---|---|
CPU | 16-Core Intel (R) Xeon (R) Platinum 8269CY CPU @2.50GHz Cache size: 36608 KB |
Memory | 64 GB |
Network bandwidth | 5 Gbit/s |
Disk | ESSD |
Software
StarRocks, Apache Druid, and ClickHouse are deployed on hosts of the same configurations.
- StarRocks: one FE and three BEs. The FE can be separately or hybrid deployed with BEs.
- ClickHouse: three nodes with distributed tables
- Apache Druid: three nodes. One is deployed with Master Servers and Data Servers, one is deployed with Query Servers and Data Servers, and the third is deployed only with Data Servers.
Kernel version: Linux 3.10.0-1160.59.1.el7.x86_64
OS version: CentOS Linux release 7.9.2009
Software version: StarRocks Community Version 3.0, ClickHouse 23.3, Apache Druid 25.0.0
Test data and results
Test data
Table | Record | Description |
---|---|---|
lineorder | 600 million | Lineorder fact table |
customer | 3 million | Customer dimension table |
part | 1.4 million | Parts dimension table |
supplier | 200 thousand | Supplier dimension table |
dates | 2,556 | Date dimension table |
lineorder_flat | 600 million | lineorder flat table |
Test results
The following table shows the performance test results on thirteen queries. The unit of query latency is ms. ClickHouse vs StarRocks
in the table header means using the query response time of ClickHouse to divide the query response time of StarRocks. A larger value indicates better performance of StarRocks.
StarRocks-3.0 | StarRocks-3.0-index | ClickHouse-23.3 | ClickHouse vs StarRocks | Druid-25.0.0 | Druid vs StarRocks | |
---|---|---|---|---|---|---|
Q1.1 | 33 | 30 | 48 | 1.45 | 430 | 13.03 |
Q1.2 | 10 | 10 | 15 | 1.50 | 270 | 27.00 |
Q1.3 | 23 | 30 | 14 | 0.61 | 820 | 35.65 |
Q2.1 | 186 | 116 | 301 | 1.62 | 760 | 4.09 |
Q2.2 | 156 | 50 | 273 | 1.75 | 920 | 5.90 |
Q2.3 | 73 | 36 | 255 | 3.49 | 910 | 12.47 |
Q3.1 | 173 | 233 | 398 | 2.30 | 1080 | 6.24 |
Q3.2 | 120 | 80 | 319 | 2.66 | 850 | 7.08 |
Q3.3 | 123 | 30 | 227 | 1.85 | 890 | 7.24 |
Q3.4 | 13 | 16 | 18 | 1.38 | 750 | 57.69 |
Q4.1 | 203 | 196 | 469 | 2.31 | 1230 | 6.06 |
Q4.2 | 73 | 76 | 160 | 2.19 | 1020 | 13.97 |
Q4.3 | 50 | 36 | 148 | 2.96 | 820 | 16.40 |
sum | 1236 | 939 | 2645 | 2.14 | 10750 | 8.70 |
Test procedure
For more information about how to create a ClickHouse table and load data to the table, see ClickHouse official doc. The following sections describe data generation and data loading of StarRocks.
Generate data
Download the ssb-poc toolkit and compile it.
wget https://starrocks-public.oss-cn-zhangjiakou.aliyuncs.com/ssb-poc-1.0.zip
unzip ssb-poc-1.0.zip
cd ssb-poc-1.0/
make && make install
cd output/
After the compilation, all the related tools are installed to the output
directories and the following operations are all performed under this directory.
First, generate data for SSB standard dataset scale factor=100
.
sh bin/gen-ssb.sh 100 data_dir
Create table schema
- Modify the configuration file
conf/starrocks.conf
and specify the cluster address. Pay special attention tomysql_host
andmysql_port
. - Run the following command to create a table:
sh bin/create_db_table.sh ddl_100
Query data
sh bin/benchmark.sh ssb-flat
Enable Bitmap Indexing
StarRocks performs better with Bitmap Indexing enabled. If you want to test the performance of StarRocks with Bitmap Indexing enabled, especially on Q2.2, Q2.3, and Q3.3, you can create Bitmap Indexes for all STRING columns.
-
Create another
lineorder_flat
table and create Bitmap Indexes.sh bin/create_db_table.sh ddl_100_bitmap_index
-
Add the following configuration to the
be.conf
file of all BEs and restart the BEs for the configurations to take effect.bitmap_max_filter_ratio=1000
-
Run the data loading script.
sh bin/flat_insert.sh data_dir
After data is loaded, wait for data version compaction to complete and then perform 4.4 again to query the data after Bitmap Indexing is enabled.
You can view the progress of data version compaction by running select CANDIDATES_NUM from information_schema.be_compactions
. For the three BE nodes, the following results show compaction is completed:
mysql> select CANDIDATES_NUM from information_schema.be_compactions;
+----------------+
| CANDIDATES_NUM |
+----------------+
| 0 |
| 0 |
| 0 |
+----------------+
3 rows in set (0.01 sec)
Test SQL and table creation statements
Test SQL
--Q1.1
SELECT sum(lo_extendedprice * lo_discount) AS `revenue`
FROM lineorder_flat
WHERE lo_orderdate >= '1993-01-01' and lo_orderdate <= '1993-12-31'
AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;
--Q1.2
SELECT sum(lo_extendedprice * lo_discount) AS revenue FROM lineorder_flat
WHERE lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-01-31'
AND lo_discount BETWEEN 4 AND 6 AND lo_quantity BETWEEN 26 AND 35;
--Q1.3
SELECT sum(lo_extendedprice * lo_discount) AS revenue
FROM lineorder_flat
WHERE weekofyear(lo_orderdate) = 6
AND lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-12-31'
AND lo_discount BETWEEN 5 AND 7 AND lo_quantity BETWEEN 26 AND 35;
--Q2.1
SELECT sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_category = 'MFGR#12' AND s_region = 'AMERICA'
GROUP BY year, p_brand
ORDER BY year, p_brand;
--Q2.2
SELECT
sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_brand >= 'MFGR#2221' AND p_brand <= 'MFGR#2228' AND s_region = 'ASIA'
GROUP BY year, p_brand
ORDER BY year, p_brand;
--Q2.3
SELECT sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_brand = 'MFGR#2239' AND s_region = 'EUROPE'
GROUP BY year, p_brand
ORDER BY year, p_brand;
--Q3.1
SELECT
c_nation,
s_nation,
year(lo_orderdate) AS year,
sum(lo_revenue) AS revenue FROM lineorder_flat
WHERE c_region = 'ASIA' AND s_region = 'ASIA' AND lo_orderdate >= '1992-01-01'
AND lo_orderdate <= '1997-12-31'
GROUP BY c_nation, s_nation, year
ORDER BY year ASC, revenue DESC;
--Q3.2
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_nation = 'UNITED STATES' AND s_nation = 'UNITED STATES'
AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
--Q3.3
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_city in ( 'UNITED KI1' ,'UNITED KI5') AND s_city in ('UNITED KI1', 'UNITED KI5')
AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
--Q3.4
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_city in ('UNITED KI1', 'UNITED KI5') AND s_city in ('UNITED KI1', 'UNITED KI5')
AND lo_orderdate >= '1997-12-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
--Q4.1
SELECT year(lo_orderdate) AS year, c_nation, sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder_flat
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND p_mfgr in ('MFGR#1', 'MFGR#2')
GROUP BY year, c_nation
ORDER BY year ASC, c_nation ASC;
--Q4.2
SELECT year(lo_orderdate) AS year,
s_nation, p_category, sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder_flat
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA'
AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31'
AND p_mfgr in ( 'MFGR#1' , 'MFGR#2')
GROUP BY year, s_nation, p_category
ORDER BY year ASC, s_nation ASC, p_category ASC;
--Q4.3
SELECT year(lo_orderdate) AS year, s_city, p_brand,
sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder_flat
WHERE s_nation = 'UNITED STATES'
AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31'
AND p_category = 'MFGR#14'
GROUP BY year, s_city, p_brand
ORDER BY year ASC, s_city ASC, p_brand ASC;
Table creation statements
Default lineorder_flat
table
The following statement matches the current cluster size and data size (three BEs, scale factor = 100). If your cluster has more BE nodes or larger data size, you can adjust the number of buckets, create the table again, and load data again to achieve better test results.
CREATE TABLE `lineorder_flat` (
`LO_ORDERDATE` date NOT NULL COMMENT "",
`LO_ORDERKEY` int(11) NOT NULL COMMENT "",
`LO_LINENUMBER` tinyint(4) NOT NULL COMMENT "",
`LO_CUSTKEY` int(11) NOT NULL COMMENT "",
`LO_PARTKEY` int(11) NOT NULL COMMENT "",
`LO_SUPPKEY` int(11) NOT NULL COMMENT "",
`LO_ORDERPRIORITY` varchar(100) NOT NULL COMMENT "",
`LO_SHIPPRIORITY` tinyint(4) NOT NULL COMMENT "",
`LO_QUANTITY` tinyint(4) NOT NULL COMMENT "",
`LO_EXTENDEDPRICE` int(11) NOT NULL COMMENT "",
`LO_ORDTOTALPRICE` int(11) NOT NULL COMMENT "",
`LO_DISCOUNT` tinyint(4) NOT NULL COMMENT "",
`LO_REVENUE` int(11) NOT NULL COMMENT "",
`LO_SUPPLYCOST` int(11) NOT NULL COMMENT "",
`LO_TAX` tinyint(4) NOT NULL COMMENT "",
`LO_COMMITDATE` date NOT NULL COMMENT "",
`LO_SHIPMODE` varchar(100) NOT NULL COMMENT "",
`C_NAME` varchar(100) NOT NULL COMMENT "",
`C_ADDRESS` varchar(100) NOT NULL COMMENT "",
`C_CITY` varchar(100) NOT NULL COMMENT "",
`C_NATION` varchar(100) NOT NULL COMMENT "",
`C_REGION` varchar(100) NOT NULL COMMENT "",
`C_PHONE` varchar(100) NOT NULL COMMENT "",
`C_MKTSEGMENT` varchar(100) NOT NULL COMMENT "",
`S_NAME` varchar(100) NOT NULL COMMENT "",
`S_ADDRESS` varchar(100) NOT NULL COMMENT "",
`S_CITY` varchar(100) NOT NULL COMMENT "",
`S_NATION` varchar(100) NOT NULL COMMENT "",
`S_REGION` varchar(100) NOT NULL COMMENT "",
`S_PHONE` varchar(100) NOT NULL COMMENT "",
`P_NAME` varchar(100) NOT NULL COMMENT "",
`P_MFGR` varchar(100) NOT NULL COMMENT "",
`P_CATEGORY` varchar(100) NOT NULL COMMENT "",
`P_BRAND` varchar(100) NOT NULL COMMENT "",
`P_COLOR` varchar(100) NOT NULL COMMENT "",
`P_TYPE` varchar(100) NOT NULL COMMENT "",
`P_SIZE` tinyint(4) NOT NULL COMMENT "",
`P_CONTAINER` varchar(100) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`LO_ORDERDATE`, `LO_ORDERKEY`)
COMMENT "OLAP"
PARTITION BY date_trunc('year', `LO_ORDERDATE`)
DISTRIBUTED BY HASH(`LO_ORDERKEY`) BUCKETS 48
PROPERTIES ("replication_num" = "1");
lineorder_flat
table with Bitmap Indexes
CREATE TABLE `lineorder_flat` (
`LO_ORDERDATE` date NOT NULL COMMENT "",
`LO_ORDERKEY` int(11) NOT NULL COMMENT "",
`LO_LINENUMBER` tinyint(4) NOT NULL COMMENT "",
`LO_CUSTKEY` int(11) NOT NULL COMMENT "",
`LO_PARTKEY` int(11) NOT NULL COMMENT "",
`LO_SUPPKEY` int(11) NOT NULL COMMENT "",
`LO_ORDERPRIORITY` varchar(100) NOT NULL COMMENT "",
`LO_SHIPPRIORITY` tinyint(4) NOT NULL COMMENT "",
`LO_QUANTITY` tinyint(4) NOT NULL COMMENT "",
`LO_EXTENDEDPRICE` int(11) NOT NULL COMMENT "",
`LO_ORDTOTALPRICE` int(11) NOT NULL COMMENT "",
`LO_DISCOUNT` tinyint(4) NOT NULL COMMENT "",
`LO_REVENUE` int(11) NOT NULL COMMENT "",
`LO_SUPPLYCOST` int(11) NOT NULL COMMENT "",
`LO_TAX` tinyint(4) NOT NULL COMMENT "",
`LO_COMMITDATE` date NOT NULL COMMENT "",
`LO_SHIPMODE` varchar(100) NOT NULL COMMENT "",
`C_NAME` varchar(100) NOT NULL COMMENT "",
`C_ADDRESS` varchar(100) NOT NULL COMMENT "",
`C_CITY` varchar(100) NOT NULL COMMENT "",
`C_NATION` varchar(100) NOT NULL COMMENT "",
`C_REGION` varchar(100) NOT NULL COMMENT "",
`C_PHONE` varchar(100) NOT NULL COMMENT "",
`C_MKTSEGMENT` varchar(100) NOT NULL COMMENT "",
`S_NAME` varchar(100) NOT NULL COMMENT "",
`S_ADDRESS` varchar(100) NOT NULL COMMENT "",
`S_CITY` varchar(100) NOT NULL COMMENT "",
`S_NATION` varchar(100) NOT NULL COMMENT "",
`S_REGION` varchar(100) NOT NULL COMMENT "",
`S_PHONE` varchar(100) NOT NULL COMMENT "",
`P_NAME` varchar(100) NOT NULL COMMENT "",
`P_MFGR` varchar(100) NOT NULL COMMENT "",
`P_CATEGORY` varchar(100) NOT NULL COMMENT "",
`P_BRAND` varchar(100) NOT NULL COMMENT "",
`P_COLOR` varchar(100) NOT NULL COMMENT "",
`P_TYPE` varchar(100) NOT NULL COMMENT "",
`P_SIZE` tinyint(4) NOT NULL COMMENT "",
`P_CONTAINER` varchar(100) NOT NULL COMMENT "",
index bitmap_lo_orderpriority (lo_orderpriority) USING BITMAP,
index bitmap_lo_shipmode (lo_shipmode) USING BITMAP,
index bitmap_c_name (c_name) USING BITMAP,
index bitmap_c_address (c_address) USING BITMAP,
index bitmap_c_city (c_city) USING BITMAP,
index bitmap_c_nation (c_nation) USING BITMAP,
index bitmap_c_region (c_region) USING BITMAP,
index bitmap_c_phone (c_phone) USING BITMAP,
index bitmap_c_mktsegment (c_mktsegment) USING BITMAP,
index bitmap_s_region (s_region) USING BITMAP,
index bitmap_s_nation (s_nation) USING BITMAP,
index bitmap_s_city (s_city) USING BITMAP,
index bitmap_s_name (s_name) USING BITMAP,
index bitmap_s_address (s_address) USING BITMAP,
index bitmap_s_phone (s_phone) USING BITMAP,
index bitmap_p_name (p_name) USING BITMAP,
index bitmap_p_mfgr (p_mfgr) USING BITMAP,
index bitmap_p_category (p_category) USING BITMAP,
index bitmap_p_brand (p_brand) USING BITMAP,
index bitmap_p_color (p_color) USING BITMAP,
index bitmap_p_type (p_type) USING BITMAP,
index bitmap_p_container (p_container) USING BITMAP
) ENGINE=OLAP
DUPLICATE KEY(`LO_ORDERDATE`, `LO_ORDERKEY`)
COMMENT "OLAP"
PARTITION BY date_trunc('year', `LO_ORDERDATE`)
DISTRIBUTED BY HASH(`LO_ORDERKEY`) BUCKETS 48
PROPERTIES ("replication_num" = "1");