Skip to main content

Sinks Overview

Sinks are the destination systems where EZ-CDC writes the captured changes. EZ-CDC transforms CDC events into the appropriate format for each sink and loads them efficiently.

Supported Sinks

SinkLoading MethodStatusDocumentation
StarRocksStream Load HTTP API✅ StableSetup Guide
ClickHouseHTTP Interface🔜 Coming Soon-
SnowflakeStaged Batch Load🔜 Planned-
Apache KafkaProducer API🔜 Planned-

How Sinks Work

Data Flow

CDC EventsTransformType MappingBatchingBufferingCheckpointSink AdapterStream LoadHTTP ClientRetry LogicStarRocksBatchedHTTP Stream LoadConfigurable batch size& flush intervalSchema validation& error handling

Batching Strategy

EZ-CDC batches events for efficient loading:

ParameterDefaultDescription
Batch Size10,000Maximum events per batch
Flush Interval5 secondsMaximum time before flush

Batches are flushed when either limit is reached.

Exactly-Once Delivery

EZ-CDC ensures exactly-once delivery through:

  1. Checkpointing: LSN position saved after successful writes
  2. Idempotent Writes: Using primary keys for upserts
  3. Transaction Alignment: Respecting source transaction boundaries

Sink Capabilities

Different sinks have different capabilities:

CapabilityStarRocksClickHouseSnowflake
Streaming
Batch
Upserts
Deletes✅ (soft)
Schema Evolution

Loading Models

ModelDescriptionBest For
StreamingContinuous micro-batchesReal-time analytics
Staged BatchPeriodic large batches via stagingData warehouses

Table Management

Automatic Table Creation

EZ-CDC can automatically create destination tables:

  1. Reads schema from source
  2. Maps types to destination
  3. Creates table with primary key
  4. Adds audit columns

Audit Columns

EZ-CDC adds metadata columns to track changes:

ColumnTypeDescription
_cdc_updated_atTIMESTAMPWhen the row was last modified
_cdc_deletedBOOLEANSoft delete flag (true = deleted)
SELECT
id,
name,
_cdc_updated_at,
_cdc_deleted
FROM users
WHERE _cdc_deleted = false;

Schema Evolution

When source schema changes:

  1. EZ-CDC detects new/modified columns
  2. Adds columns to destination (if supported)
  3. Continues replication
note

Column removal and type changes may require manual intervention.

Type Mapping

EZ-CDC maps source types to destination types:

PostgreSQL → StarRocks

PostgreSQLStarRocks
integerINT
bigintBIGINT
numericDECIMAL
varchar(n)VARCHAR(n)
textSTRING
booleanBOOLEAN
timestampDATETIME
timestamptzDATETIME
dateDATE
jsonbJSON
uuidVARCHAR(36)

Adding a Sink

Via Portal

  1. Go to DatasourcesNew Datasource
  2. Select sink type (e.g., StarRocks)
  3. Enter connection details
  4. Test connection
  5. Save

Best Practices

1. Size Destination Appropriately

Ensure your sink can handle the write throughput:

  • StarRocks: Size based on expected events/second
  • Snowflake: Size warehouse for batch frequency

2. Monitor Write Performance

Track metrics:

  • Events written per second
  • Batch write latency
  • Failed writes

3. Use Appropriate Data Types

Choose efficient types:

  • Use INT instead of BIGINT when possible
  • Use VARCHAR(n) instead of STRING for bounded strings

Next Steps