Sinks Overview
Sinks are the destination systems where EZ-CDC writes the captured changes. EZ-CDC transforms CDC events into the appropriate format for each sink and loads them efficiently.
Supported Sinks
| Sink | Loading Method | Status | Documentation |
|---|---|---|---|
| StarRocks | Stream Load HTTP API | ✅ Stable | Setup Guide |
| ClickHouse | HTTP Interface | 🔜 Coming Soon | - |
| Snowflake | Staged Batch Load | 🔜 Planned | - |
| Apache Kafka | Producer API | 🔜 Planned | - |
How Sinks Work
Data Flow
Batching Strategy
EZ-CDC batches events for efficient loading:
| Parameter | Default | Description |
|---|---|---|
| Batch Size | 10,000 | Maximum events per batch |
| Flush Interval | 5 seconds | Maximum time before flush |
Batches are flushed when either limit is reached.
Exactly-Once Delivery
EZ-CDC ensures exactly-once delivery through:
- Checkpointing: LSN position saved after successful writes
- Idempotent Writes: Using primary keys for upserts
- Transaction Alignment: Respecting source transaction boundaries
Sink Capabilities
Different sinks have different capabilities:
| Capability | StarRocks | ClickHouse | Snowflake |
|---|---|---|---|
| Streaming | ✅ | ✅ | ❌ |
| Batch | ✅ | ✅ | ✅ |
| Upserts | ✅ | ✅ | ✅ |
| Deletes | ✅ (soft) | ✅ | ✅ |
| Schema Evolution | ✅ | ✅ | ✅ |
Loading Models
| Model | Description | Best For |
|---|---|---|
| Streaming | Continuous micro-batches | Real-time analytics |
| Staged Batch | Periodic large batches via staging | Data warehouses |
Table Management
Automatic Table Creation
EZ-CDC can automatically create destination tables:
- Reads schema from source
- Maps types to destination
- Creates table with primary key
- Adds audit columns
Audit Columns
EZ-CDC adds metadata columns to track changes:
| Column | Type | Description |
|---|---|---|
_cdc_updated_at | TIMESTAMP | When the row was last modified |
_cdc_deleted | BOOLEAN | Soft delete flag (true = deleted) |
SELECT
id,
name,
_cdc_updated_at,
_cdc_deleted
FROM users
WHERE _cdc_deleted = false;
Schema Evolution
When source schema changes:
- EZ-CDC detects new/modified columns
- Adds columns to destination (if supported)
- Continues replication
note
Column removal and type changes may require manual intervention.
Type Mapping
EZ-CDC maps source types to destination types:
PostgreSQL → StarRocks
| PostgreSQL | StarRocks |
|---|---|
integer | INT |
bigint | BIGINT |
numeric | DECIMAL |
varchar(n) | VARCHAR(n) |
text | STRING |
boolean | BOOLEAN |
timestamp | DATETIME |
timestamptz | DATETIME |
date | DATE |
jsonb | JSON |
uuid | VARCHAR(36) |
Adding a Sink
Via Portal
- Go to Datasources → New Datasource
- Select sink type (e.g., StarRocks)
- Enter connection details
- Test connection
- Save
Best Practices
1. Size Destination Appropriately
Ensure your sink can handle the write throughput:
- StarRocks: Size based on expected events/second
- Snowflake: Size warehouse for batch frequency
2. Monitor Write Performance
Track metrics:
- Events written per second
- Batch write latency
- Failed writes
3. Use Appropriate Data Types
Choose efficient types:
- Use
INTinstead ofBIGINTwhen possible - Use
VARCHAR(n)instead ofSTRINGfor bounded strings