Skip to main content

Table Selection

This guide covers how to select and configure tables for CDC replication.

Discovering Tables

When creating a job, EZ-CDC discovers available tables from your source:

Available Tables:
┌────────────────────┬──────────────┬─────────────┬────────────┐
│ Table │ Rows (est.) │ Primary Key │ Status │
├────────────────────┼──────────────┼─────────────┼────────────┤
│ public.orders │ 245,000 │ id │ ✓ Ready │
│ public.order_items │ 1,200,000 │ id │ ✓ Ready │
│ public.customers │ 50,000 │ id │ ✓ Ready │
│ public.products │ 5,000 │ id │ ✓ Ready │
│ public.audit_logs │ 10,000,000 │ id │ ⚠ Large │
│ public.sessions │ 500,000 │ (none) │ ⚠ No PK │
└────────────────────┴──────────────┴─────────────┴────────────┘

Selecting Tables

Basic Selection

Select tables to replicate:

{
"tables": [
"public.orders",
"public.order_items",
"public.customers"
]
}

Schema-Qualified Names

Always use fully-qualified names:

✓ public.orders
✓ sales.transactions
✗ orders (ambiguous)

Pattern Matching (Future)

Match tables by pattern:

{
"table_patterns": [
"public.*", // All tables in public schema
"sales.order_*", // Tables starting with order_
"!*._archive" // Exclude archive tables
]
}

Table Requirements

Primary Keys

Tables should have primary keys:

Has PK?Behavior
YesUpdates/Deletes use PK to identify rows
NoAll columns used (REPLICA IDENTITY FULL)

For tables without primary keys:

-- Option 1: Add a primary key
ALTER TABLE sessions ADD PRIMARY KEY (session_id);

-- Option 2: Use REPLICA IDENTITY FULL
ALTER TABLE sessions REPLICA IDENTITY FULL;

Supported Column Types

CategoryTypesStatus
Numericint, bigint, decimal, float
Stringvarchar, text, char
Temporaldate, timestamp, timestamptz
Booleanboolean
JSONjson, jsonb
UUIDuuid
Binarybytea
Arraysinteger[], text[], etc.⚠ Limited
Geometricpoint, polygon, etc.
CustomUser-defined types

Table Considerations

Large Tables

For tables with millions of rows:

⚠ Table public.audit_logs has 10,000,000 rows

Initial snapshot will:
- Take approximately 15 minutes
- Generate significant source load
- Require 2GB+ transfer

Options:

  1. Include with longer timeout
  2. Snapshot during off-peak hours
  3. Use snapshot_mode: never if data exists

High-Write Tables

For tables with heavy write load:

⚠ Table public.events has high write volume (1000+ writes/sec)

Consider:
- Larger batch size
- More frequent flushes
- Dedicated job

Tables Without Primary Keys

⚠ Table public.sessions has no primary key

Updates and deletes will:
- Include all columns in messages
- Be less efficient
- May cause issues with some sinks

Recommendation: Add a primary key or use REPLICA IDENTITY FULL.

Column Selection (Future)

Select specific columns:

{
"tables": [
{
"name": "public.customers",
"columns": ["id", "name", "email", "created_at"]
}
]
}

Excluding Tables

By Not Selecting

Simply don't include in the tables list.

By Pattern (Future)

{
"exclude_tables": [
"*_backup",
"*_archive",
"temp_*"
]
}

Schema Discovery

Tables are automatically discovered when you select a source in the portal. The portal displays:

  • Schema and table names
  • Estimated row counts
  • Column information (name, type, nullable, primary key)

Adding Tables to Running Job

Currently, adding tables requires:

  1. Stop the job
  2. Modify table selection
  3. Restart the job

The new table will be snapshotted and then streamed.

Future Feature

Hot-add tables without stopping the job is planned.

Removing Tables

To remove a table:

  1. Stop the job
  2. Remove table from selection
  3. Restart the job

Note: Data already replicated remains in the sink.

Table-Specific Settings (Future)

Configure per-table settings:

{
"tables": [
{
"name": "public.orders",
"batch_size": 5000,
"priority": "high"
},
{
"name": "public.audit_logs",
"batch_size": 50000,
"priority": "low"
}
]
}

Best Practices

1. Start Small

Begin with a few critical tables:

{
"tables": ["public.orders", "public.customers"]
}

Expand after validating replication.

Keep related tables in the same job:

Job: sales-data
- orders
- order_items
- customers

Job: inventory-data
- products
- inventory
- warehouses

3. Separate High-Volume Tables

Isolate high-write tables:

Job: events-realtime (dedicated)
- events (10K writes/sec)

Job: core-data
- users
- accounts
- settings

4. Monitor Table Lag

Watch per-table lag to identify bottlenecks.

Next Steps