Sinks Overview

Sinks are the destination systems where EZ-CDC writes the captured changes. EZ-CDC transforms CDC events into the appropriate format for each sink and loads them efficiently.

Supported Sinks

Sink	Loading Method	Status	Documentation
StarRocks	Stream Load HTTP API	✅ Stable	Setup Guide
ClickHouse	HTTP Interface	🔜 Coming Soon	-
Snowflake	Staged Batch Load	🔜 Planned	-
Apache Kafka	Producer API	🔜 Planned	-

How Sinks Work

Data Flow

Batching Strategy

EZ-CDC batches events for efficient loading:

Parameter	Default	Description
Batch Size	10,000	Maximum events per batch
Flush Interval	5 seconds	Maximum time before flush

Batches are flushed when either limit is reached.

Exactly-Once Delivery

EZ-CDC ensures exactly-once delivery through:

Checkpointing: LSN position saved after successful writes
Idempotent Writes: Using primary keys for upserts
Transaction Alignment: Respecting source transaction boundaries

Sink Capabilities

Different sinks have different capabilities:

Capability	StarRocks	ClickHouse	Snowflake
Streaming	✅	✅	❌
Batch	✅	✅	✅
Upserts	✅	✅	✅
Deletes	✅ (soft)	✅	✅
Schema Evolution	✅	✅	✅

Loading Models

Model	Description	Best For
Streaming	Continuous micro-batches	Real-time analytics
Staged Batch	Periodic large batches via staging	Data warehouses

Table Management

Automatic Table Creation

EZ-CDC can automatically create destination tables:

Reads schema from source
Maps types to destination
Creates table with primary key
Adds audit columns

Audit Columns

EZ-CDC adds metadata columns to track changes:

Column	Type	Description
`_cdc_updated_at`	TIMESTAMP	When the row was last modified
`_cdc_deleted`	BOOLEAN	Soft delete flag (true = deleted)

SELECT
  id,
  name,
  _cdc_updated_at,
  _cdc_deleted
FROM users
WHERE _cdc_deleted = false;

Schema Evolution

When source schema changes:

EZ-CDC detects new/modified columns
Adds columns to destination (if supported)
Continues replication

note

Column removal and type changes may require manual intervention.

Type Mapping

EZ-CDC maps source types to destination types:

PostgreSQL → StarRocks

PostgreSQL	StarRocks
`integer`	`INT`
`bigint`	`BIGINT`
`numeric`	`DECIMAL`
`varchar(n)`	`VARCHAR(n)`
`text`	`STRING`
`boolean`	`BOOLEAN`
`timestamp`	`DATETIME`
`timestamptz`	`DATETIME`
`date`	`DATE`
`jsonb`	`JSON`
`uuid`	`VARCHAR(36)`

Adding a Sink

Via Portal

Go to Datasources → New Datasource
Select sink type (e.g., StarRocks)
Enter connection details
Test connection
Save

Best Practices

1. Size Destination Appropriately

Ensure your sink can handle the write throughput:

StarRocks: Size based on expected events/second
Snowflake: Size warehouse for batch frequency

2. Monitor Write Performance

Track metrics:

Events written per second
Batch write latency
Failed writes

3. Use Appropriate Data Types

Choose efficient types:

Use INT instead of BIGINT when possible
Use VARCHAR(n) instead of STRING for bounded strings

Supported Sinks​

How Sinks Work​

Data Flow​

Batching Strategy​

Exactly-Once Delivery​

Sink Capabilities​

Loading Models​

Table Management​

Automatic Table Creation​

Audit Columns​

Schema Evolution​

Type Mapping​

PostgreSQL → StarRocks​

Adding a Sink​

Via Portal​

Best Practices​

1. Size Destination Appropriately​

2. Monitor Write Performance​

3. Use Appropriate Data Types​

Next Steps​