Table Selection
This guide covers how to select and configure tables for CDC replication.
Discovering Tables
When creating a job, EZ-CDC discovers available tables from your source:
Available Tables:
┌────────────────────┬──────────────┬─────────────┬────────────┐
│ Table │ Rows (est.) │ Primary Key │ Status │
├────────────────────┼──────────────┼─────────────┼────────────┤
│ public.orders │ 245,000 │ id │ ✓ Ready │
│ public.order_items │ 1,200,000 │ id │ ✓ Ready │
│ public.customers │ 50,000 │ id │ ✓ Ready │
│ public.products │ 5,000 │ id │ ✓ Ready │
│ public.audit_logs │ 10,000,000 │ id │ ⚠ Large │
│ public.sessions │ 500,000 │ (none) │ ⚠ No PK │
└────────────────────┴──────────────┴─────────────┴────────────┘
Selecting Tables
Basic Selection
Select tables to replicate:
{
"tables": [
"public.orders",
"public.order_items",
"public.customers"
]
}
Schema-Qualified Names
Always use fully-qualified names:
✓ public.orders
✓ sales.transactions
✗ orders (ambiguous)
Pattern Matching (Future)
Match tables by pattern:
{
"table_patterns": [
"public.*", // All tables in public schema
"sales.order_*", // Tables starting with order_
"!*._archive" // Exclude archive tables
]
}
Table Requirements
Primary Keys
Tables should have primary keys:
| Has PK? | Behavior |
|---|---|
| Yes | Updates/Deletes use PK to identify rows |
| No | All columns used (REPLICA IDENTITY FULL) |
For tables without primary keys:
-- Option 1: Add a primary key
ALTER TABLE sessions ADD PRIMARY KEY (session_id);
-- Option 2: Use REPLICA IDENTITY FULL
ALTER TABLE sessions REPLICA IDENTITY FULL;
Supported Column Types
| Category | Types | Status |
|---|---|---|
| Numeric | int, bigint, decimal, float | ✓ |
| String | varchar, text, char | ✓ |
| Temporal | date, timestamp, timestamptz | ✓ |
| Boolean | boolean | ✓ |
| JSON | json, jsonb | ✓ |
| UUID | uuid | ✓ |
| Binary | bytea | ✓ |
| Arrays | integer[], text[], etc. | ⚠ Limited |
| Geometric | point, polygon, etc. | ✗ |
| Custom | User-defined types | ✗ |
Table Considerations
Large Tables
For tables with millions of rows:
⚠ Table public.audit_logs has 10,000,000 rows
Initial snapshot will:
- Take approximately 15 minutes
- Generate significant source load
- Require 2GB+ transfer
Options:
- Include with longer timeout
- Snapshot during off-peak hours
- Use
snapshot_mode: neverif data exists
High-Write Tables
For tables with heavy write load:
⚠ Table public.events has high write volume (1000+ writes/sec)
Consider:
- Larger batch size
- More frequent flushes
- Dedicated job
Tables Without Primary Keys
⚠ Table public.sessions has no primary key
Updates and deletes will:
- Include all columns in messages
- Be less efficient
- May cause issues with some sinks
Recommendation: Add a primary key or use REPLICA IDENTITY FULL.
Column Selection (Future)
Select specific columns:
{
"tables": [
{
"name": "public.customers",
"columns": ["id", "name", "email", "created_at"]
}
]
}
Excluding Tables
By Not Selecting
Simply don't include in the tables list.
By Pattern (Future)
{
"exclude_tables": [
"*_backup",
"*_archive",
"temp_*"
]
}
Schema Discovery
Tables are automatically discovered when you select a source in the portal. The portal displays:
- Schema and table names
- Estimated row counts
- Column information (name, type, nullable, primary key)
Adding Tables to Running Job
Currently, adding tables requires:
- Stop the job
- Modify table selection
- Restart the job
The new table will be snapshotted and then streamed.
Hot-add tables without stopping the job is planned.
Removing Tables
To remove a table:
- Stop the job
- Remove table from selection
- Restart the job
Note: Data already replicated remains in the sink.
Table-Specific Settings (Future)
Configure per-table settings:
{
"tables": [
{
"name": "public.orders",
"batch_size": 5000,
"priority": "high"
},
{
"name": "public.audit_logs",
"batch_size": 50000,
"priority": "low"
}
]
}
Best Practices
1. Start Small
Begin with a few critical tables:
{
"tables": ["public.orders", "public.customers"]
}
Expand after validating replication.
2. Group Related Tables
Keep related tables in the same job:
Job: sales-data
- orders
- order_items
- customers
Job: inventory-data
- products
- inventory
- warehouses
3. Separate High-Volume Tables
Isolate high-write tables:
Job: events-realtime (dedicated)
- events (10K writes/sec)
Job: core-data
- users
- accounts
- settings
4. Monitor Table Lag
Watch per-table lag to identify bottlenecks.