StarRocks Requirements
This guide covers the requirements for using StarRocks as a CDC sink with EZ-CDC.
Version Requirements
| Requirement | Minimum | Recommended |
|---|---|---|
| StarRocks Version | 2.5 | 3.0+ |
| Stream Load | Enabled | - |
Architecture Overview
StarRocks has multiple components:
StarRocks cluster with Frontend (FE) and Backend (BE) nodes
Network Requirements
EZ-CDC workers need access to:
| Component | Port | Protocol | Purpose |
|---|---|---|---|
| Frontend (FE) | 9030 | MySQL | Metadata queries, DDL |
| Backend (BE) | 8040 | HTTP | Stream Load (data ingestion) |
Firewall Rules
# Allow MySQL protocol to FE
resource "aws_security_group_rule" "starrocks_fe" {
type = "ingress"
from_port = 9030
to_port = 9030
protocol = "tcp"
source_security_group_id = var.ezcdc_worker_sg_id
security_group_id = var.starrocks_sg_id
}
# Allow HTTP to BE (Stream Load)
resource "aws_security_group_rule" "starrocks_be" {
type = "ingress"
from_port = 8040
to_port = 8040
protocol = "tcp"
source_security_group_id = var.ezcdc_worker_sg_id
security_group_id = var.starrocks_sg_id
}
User Permissions
Create CDC User
-- Connect to StarRocks FE (MySQL protocol)
mysql -h starrocks-fe -P 9030 -u root
-- Create user
CREATE USER 'ezcdc_user' IDENTIFIED BY 'your_password';
-- Grant permissions
GRANT SELECT, INSERT, UPDATE, DELETE ON analytics.* TO 'ezcdc_user';
GRANT ALTER ON analytics.* TO 'ezcdc_user';
GRANT CREATE ON analytics.* TO 'ezcdc_user';
-- Required for Stream Load
GRANT LOAD ON analytics.* TO 'ezcdc_user';
Required Privileges
| Privilege | Purpose |
|---|---|
SELECT | Check table structure |
INSERT | Load data via Stream Load |
UPDATE | Update existing rows |
DELETE | Delete rows |
ALTER | Modify table schema |
CREATE | Create new tables |
LOAD | Use Stream Load |
Table Requirements
Table Engine
Use Primary Key tables for CDC (recommended):
CREATE TABLE orders (
id BIGINT NOT NULL,
customer_id BIGINT,
total DECIMAL(10, 2),
status VARCHAR(50),
created_at DATETIME,
_cdc_updated_at DATETIME,
_cdc_deleted BOOLEAN DEFAULT false
)
PRIMARY KEY (id)
DISTRIBUTED BY HASH(id) BUCKETS 8
PROPERTIES (
"replication_num" = "1"
);
Primary Key tables:
- Support UPSERT operations
- Enable UPDATE and DELETE
- Best for CDC workloads
Alternative: Duplicate Key Tables
For append-only workloads:
CREATE TABLE events (
event_id BIGINT,
event_type VARCHAR(50),
payload JSON,
created_at DATETIME
)
DUPLICATE KEY (event_id)
DISTRIBUTED BY HASH(event_id) BUCKETS 8;
Stream Load Configuration
Verify Stream Load
Check that Stream Load is working:
# Test Stream Load endpoint
curl -X PUT \
-H "Expect: 100-continue" \
-H "label:test_$(date +%s)" \
-H "format: json" \
-H "strip_outer_array: true" \
-T test_data.json \
-u ezcdc_user:password \
http://starrocks-be:8040/api/analytics/test_table/_stream_load
Stream Load Limits
| Parameter | Default | Description |
|---|---|---|
streaming_load_max_mb | 10240 | Max batch size (MB) |
streaming_load_max_batch_size_mb | 100 | Single batch max |
Performance Tuning
Backend (BE) Settings
For high-throughput CDC:
# be.conf
streaming_load_rpc_max_alive_time_sec = 1200
load_process_max_memory_limit_percent = 50
Frontend (FE) Settings
# fe.conf
stream_load_default_timeout_second = 600
Verification Script
-- 1. Check StarRocks version
SELECT current_version();
-- 2. Verify user permissions
SHOW GRANTS FOR 'ezcdc_user';
-- 3. Check databases
SHOW DATABASES;
-- 4. Test table creation
CREATE DATABASE IF NOT EXISTS ezcdc_test;
USE ezcdc_test;
CREATE TABLE test_table (
id BIGINT PRIMARY KEY
) DISTRIBUTED BY HASH(id) BUCKETS 1;
-- 5. Clean up
DROP TABLE test_table;
DROP DATABASE ezcdc_test;
Common Issues
"Failed to connect to FE"
Cause: Network or authentication issue.
Solutions:
- Check security groups allow port 9030
- Verify username/password
- Try connecting with
mysqlclient
"Stream Load timeout"
Cause: Large batch or slow BE.
Solutions:
- Reduce batch size in EZ-CDC
- Increase
stream_load_default_timeout_second - Check BE resource utilization
"Memory limit exceeded"
Cause: Batch too large for available memory.
Solutions:
- Reduce batch size
- Increase BE memory
- Increase
load_process_max_memory_limit_percent