Skip to main content

Worker Infrastructure

This guide covers the infrastructure components that run in your AWS account as part of an EZ-CDC deployment.

Components Overview

Worker Infrastructure

EC2 Instances

Worker Agent

Each EC2 instance runs the worker agent, a service that:

  • Registers with the control plane
  • Polls for assigned jobs
  • Spawns and manages dbmazz daemons
  • Reports health and metrics
  • Handles graceful shutdown

Instance Specifications

Instance TypevCPUMemoryNetworkRecommended Jobs
t3.medium24 GBUp to 5 Gbps1-3
t3.large28 GBUp to 5 Gbps3-5
c6i.large24 GBUp to 12.5 Gbps3-5
c6i.xlarge48 GBUp to 12.5 Gbps5-10

Resource Usage

Each dbmazz daemon uses approximately:

ResourceUsage
Memory~5 MB
CPULess than 5% idle, 10-25% active
DiskMinimal (logs only)

Auto Scaling Group

Workers run in an Auto Scaling Group for reliability:

resource "aws_autoscaling_group" "workers" {
name = "ez-cdc-${var.deployment_name}"
min_size = var.min_workers
max_size = var.max_workers
desired_capacity = var.desired_workers
vpc_zone_identifier = var.subnet_ids

launch_template {
id = aws_launch_template.worker.id
version = "$Latest"
}

health_check_type = "EC2"
health_check_grace_period = 300

instance_refresh {
strategy = "Rolling"
preferences {
min_healthy_percentage = 50
}
}

tag {
key = "Name"
value = "ez-cdc-${var.deployment_name}-worker"
propagate_at_launch = true
}
}

Scaling Behavior

EventAction
Instance unhealthyASG replaces automatically
Scale out neededNew instance launched
Scale inDrains jobs before termination

Launch Template

The launch template defines instance configuration:

resource "aws_launch_template" "worker" {
name_prefix = "ez-cdc-${var.deployment_name}"
image_id = var.worker_ami_id
instance_type = var.instance_type

iam_instance_profile {
arn = aws_iam_instance_profile.worker.arn
}

network_interfaces {
associate_public_ip_address = false
security_groups = [aws_security_group.worker.id]
}

block_device_mappings {
device_name = "/dev/xvda"
ebs {
volume_size = var.volume_size
volume_type = "gp3"
encrypted = true
}
}

user_data = base64encode(templatefile("${path.module}/user_data.sh", {
control_plane_url = var.control_plane_url
metrics_auth_token = var.metrics_auth_token
deployment_id = var.deployment_id
}))

metadata_options {
http_tokens = "required" # IMDSv2 only
}
}

Bootstrap Process

When a worker instance starts:

#!/bin/bash
# user_data.sh - Worker bootstrap script

# 1. Download worker-agent binary from S3
aws s3 cp s3://ez-cdc-releases/worker-agent/latest/worker-agent /usr/local/bin/
chmod +x /usr/local/bin/worker-agent

# 2. Download dbmazz binary
aws s3 cp s3://ez-cdc-releases/dbmazz/latest/dbmazz /usr/local/bin/
chmod +x /usr/local/bin/dbmazz

# 3. Configure environment
cat > /etc/ez-cdc/config.env << EOF
CONTROL_PLANE_URL=${control_plane_url}
METRICS_AUTH_TOKEN=${metrics_auth_token}
DEPLOYMENT_ID=${deployment_id}
WORKER_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
EOF

# 4. Start worker-agent service
systemctl enable worker-agent
systemctl start worker-agent

Security Group

Workers use a security group with no inbound rules:

resource "aws_security_group" "worker" {
name = "ez-cdc-${var.deployment_name}-worker"
description = "EZ-CDC Worker - Egress only"
vpc_id = var.vpc_id

# NO INBOUND RULES

# Control plane
egress {
description = "HTTPS to control plane"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}

# PostgreSQL sources
egress {
description = "PostgreSQL"
from_port = 5432
to_port = 5432
protocol = "tcp"
cidr_blocks = var.postgres_cidrs
}

# StarRocks sinks
egress {
description = "StarRocks HTTP"
from_port = 8040
to_port = 8040
protocol = "tcp"
cidr_blocks = var.starrocks_cidrs
}

egress {
description = "StarRocks MySQL"
from_port = 9030
to_port = 9030
protocol = "tcp"
cidr_blocks = var.starrocks_cidrs
}
}

IAM Instance Profile

Workers need permissions for:

resource "aws_iam_role" "worker" {
name = "ez-cdc-${var.deployment_name}-worker"

assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}]
})
}

resource "aws_iam_role_policy" "worker" {
name = "ez-cdc-worker-policy"
role = aws_iam_role.worker.id

policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = ["s3:GetObject"]
Resource = ["arn:aws:s3:::ez-cdc-releases/*"]
},
{
Effect = "Allow"
Action = [
"ssm:UpdateInstanceInformation",
"ssmmessages:*",
"ec2messages:*"
]
Resource = "*"
}
]
})
}

resource "aws_iam_instance_profile" "worker" {
name = "ez-cdc-${var.deployment_name}-worker"
role = aws_iam_role.worker.name
}

Monitoring

CloudWatch Metrics

Workers emit metrics to CloudWatch:

MetricDescription
CPUUtilizationCPU usage percentage
MemoryUtilizationMemory usage percentage
DiskUtilizationDisk usage percentage
NetworkIn/OutNetwork traffic

CloudWatch Logs

Logs are sent to CloudWatch Logs:

/ez-cdc/workers/{deployment-name}/{instance-id}/worker-agent.log
/ez-cdc/workers/{deployment-name}/{instance-id}/dbmazz-{job-id}.log

SSM Session Manager

Connect to workers without SSH:

aws ssm start-session --target i-0abc123def456

Instance Access

No SSH Required

Workers don't need SSH access:

  • Use SSM Session Manager for shell access
  • Logs available in CloudWatch
  • Metrics in EZ-CDC portal

If SSH is Needed

For debugging, you can enable SSH:

  1. Add a key pair to the launch template
  2. Add inbound rule for port 22
  3. Connect via bastion host or public IP
warning

Enabling SSH increases attack surface. Use SSM Session Manager instead when possible.

Upgrades

Binary Upgrades

Workers check for upgrades via the control plane:

  1. Control plane signals upgrade available
  2. Worker downloads new binary from S3
  3. Worker restarts with new version
  4. Jobs are re-claimed automatically

Rolling Updates

For instance-level updates:

aws autoscaling start-instance-refresh \
--auto-scaling-group-name ez-cdc-production \
--preferences '{"MinHealthyPercentage": 50}'

Cost Optimization

Spot Instances

For non-critical workloads, use Spot instances:

resource "aws_launch_template" "worker" {
# ...

instance_market_options {
market_type = "spot"
spot_options {
max_price = "0.05"
}
}
}
caution

Spot instances can be interrupted. Only use for fault-tolerant workloads.

Reserved Instances

For production workloads, consider Reserved Instances for cost savings.

Next Steps