Worker Infrastructure
This guide covers the infrastructure components that run in your AWS account as part of an EZ-CDC deployment.
Components Overview
EC2 Instances
Worker Agent
Each EC2 instance runs the worker agent, a service that:
- Registers with the control plane
- Polls for assigned jobs
- Spawns and manages dbmazz daemons
- Reports health and metrics
- Handles graceful shutdown
Instance Specifications
| Instance Type | vCPU | Memory | Network | Recommended Jobs |
|---|---|---|---|---|
| t3.medium | 2 | 4 GB | Up to 5 Gbps | 1-3 |
| t3.large | 2 | 8 GB | Up to 5 Gbps | 3-5 |
| c6i.large | 2 | 4 GB | Up to 12.5 Gbps | 3-5 |
| c6i.xlarge | 4 | 8 GB | Up to 12.5 Gbps | 5-10 |
Resource Usage
Each dbmazz daemon uses approximately:
| Resource | Usage |
|---|---|
| Memory | ~5 MB |
| CPU | Less than 5% idle, 10-25% active |
| Disk | Minimal (logs only) |
Auto Scaling Group
Workers run in an Auto Scaling Group for reliability:
resource "aws_autoscaling_group" "workers" {
name = "ez-cdc-${var.deployment_name}"
min_size = var.min_workers
max_size = var.max_workers
desired_capacity = var.desired_workers
vpc_zone_identifier = var.subnet_ids
launch_template {
id = aws_launch_template.worker.id
version = "$Latest"
}
health_check_type = "EC2"
health_check_grace_period = 300
instance_refresh {
strategy = "Rolling"
preferences {
min_healthy_percentage = 50
}
}
tag {
key = "Name"
value = "ez-cdc-${var.deployment_name}-worker"
propagate_at_launch = true
}
}
Scaling Behavior
| Event | Action |
|---|---|
| Instance unhealthy | ASG replaces automatically |
| Scale out needed | New instance launched |
| Scale in | Drains jobs before termination |
Launch Template
The launch template defines instance configuration:
resource "aws_launch_template" "worker" {
name_prefix = "ez-cdc-${var.deployment_name}"
image_id = var.worker_ami_id
instance_type = var.instance_type
iam_instance_profile {
arn = aws_iam_instance_profile.worker.arn
}
network_interfaces {
associate_public_ip_address = false
security_groups = [aws_security_group.worker.id]
}
block_device_mappings {
device_name = "/dev/xvda"
ebs {
volume_size = var.volume_size
volume_type = "gp3"
encrypted = true
}
}
user_data = base64encode(templatefile("${path.module}/user_data.sh", {
control_plane_url = var.control_plane_url
metrics_auth_token = var.metrics_auth_token
deployment_id = var.deployment_id
}))
metadata_options {
http_tokens = "required" # IMDSv2 only
}
}
Bootstrap Process
When a worker instance starts:
#!/bin/bash
# user_data.sh - Worker bootstrap script
# 1. Download worker-agent binary from S3
aws s3 cp s3://ez-cdc-releases/worker-agent/latest/worker-agent /usr/local/bin/
chmod +x /usr/local/bin/worker-agent
# 2. Download dbmazz binary
aws s3 cp s3://ez-cdc-releases/dbmazz/latest/dbmazz /usr/local/bin/
chmod +x /usr/local/bin/dbmazz
# 3. Configure environment
cat > /etc/ez-cdc/config.env << EOF
CONTROL_PLANE_URL=${control_plane_url}
METRICS_AUTH_TOKEN=${metrics_auth_token}
DEPLOYMENT_ID=${deployment_id}
WORKER_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
EOF
# 4. Start worker-agent service
systemctl enable worker-agent
systemctl start worker-agent
Security Group
Workers use a security group with no inbound rules:
resource "aws_security_group" "worker" {
name = "ez-cdc-${var.deployment_name}-worker"
description = "EZ-CDC Worker - Egress only"
vpc_id = var.vpc_id
# NO INBOUND RULES
# Control plane
egress {
description = "HTTPS to control plane"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# PostgreSQL sources
egress {
description = "PostgreSQL"
from_port = 5432
to_port = 5432
protocol = "tcp"
cidr_blocks = var.postgres_cidrs
}
# StarRocks sinks
egress {
description = "StarRocks HTTP"
from_port = 8040
to_port = 8040
protocol = "tcp"
cidr_blocks = var.starrocks_cidrs
}
egress {
description = "StarRocks MySQL"
from_port = 9030
to_port = 9030
protocol = "tcp"
cidr_blocks = var.starrocks_cidrs
}
}
IAM Instance Profile
Workers need permissions for:
resource "aws_iam_role" "worker" {
name = "ez-cdc-${var.deployment_name}-worker"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}]
})
}
resource "aws_iam_role_policy" "worker" {
name = "ez-cdc-worker-policy"
role = aws_iam_role.worker.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = ["s3:GetObject"]
Resource = ["arn:aws:s3:::ez-cdc-releases/*"]
},
{
Effect = "Allow"
Action = [
"ssm:UpdateInstanceInformation",
"ssmmessages:*",
"ec2messages:*"
]
Resource = "*"
}
]
})
}
resource "aws_iam_instance_profile" "worker" {
name = "ez-cdc-${var.deployment_name}-worker"
role = aws_iam_role.worker.name
}
Monitoring
CloudWatch Metrics
Workers emit metrics to CloudWatch:
| Metric | Description |
|---|---|
CPUUtilization | CPU usage percentage |
MemoryUtilization | Memory usage percentage |
DiskUtilization | Disk usage percentage |
NetworkIn/Out | Network traffic |
CloudWatch Logs
Logs are sent to CloudWatch Logs:
/ez-cdc/workers/{deployment-name}/{instance-id}/worker-agent.log
/ez-cdc/workers/{deployment-name}/{instance-id}/dbmazz-{job-id}.log
SSM Session Manager
Connect to workers without SSH:
aws ssm start-session --target i-0abc123def456
Instance Access
No SSH Required
Workers don't need SSH access:
- Use SSM Session Manager for shell access
- Logs available in CloudWatch
- Metrics in EZ-CDC portal
If SSH is Needed
For debugging, you can enable SSH:
- Add a key pair to the launch template
- Add inbound rule for port 22
- Connect via bastion host or public IP
Enabling SSH increases attack surface. Use SSM Session Manager instead when possible.
Upgrades
Binary Upgrades
Workers check for upgrades via the control plane:
- Control plane signals upgrade available
- Worker downloads new binary from S3
- Worker restarts with new version
- Jobs are re-claimed automatically
Rolling Updates
For instance-level updates:
aws autoscaling start-instance-refresh \
--auto-scaling-group-name ez-cdc-production \
--preferences '{"MinHealthyPercentage": 50}'
Cost Optimization
Spot Instances
For non-critical workloads, use Spot instances:
resource "aws_launch_template" "worker" {
# ...
instance_market_options {
market_type = "spot"
spot_options {
max_price = "0.05"
}
}
}
Spot instances can be interrupted. Only use for fault-tolerant workloads.
Reserved Instances
For production workloads, consider Reserved Instances for cost savings.