AWS Deployment
Deploy the FailZero agent on AWS using EC2 or ECS/Fargate.
EC2
1. Create IAM Role
# Create trust policy
cat > trust-policy.json << 'EOF'
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
# Create role
aws iam create-role \
--role-name failzero-agent \
--assume-role-policy-document file://trust-policy.json
2. Attach IAM Policies
# Create policy for DR operations
cat > failzero-policy.json << 'EOF'
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"rds:PromoteReadReplica",
"rds:DescribeDBInstances",
"rds:ModifyDBInstance"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"route53:ChangeResourceRecordSets",
"route53:ListHostedZones",
"route53:GetHostedZone"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"autoscaling:UpdateAutoScalingGroup",
"autoscaling:DescribeAutoScalingGroups"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue"
],
"Resource": "arn:aws:secretsmanager:*:*:secret:failzero-*"
}
]
}
EOF
aws iam put-role-policy \
--role-name failzero-agent \
--policy-name failzero-dr-operations \
--policy-document file://failzero-policy.json
Scope permissions to specific resources in production. These examples use wildcards for simplicity.
3. Create Instance Profile
aws iam create-instance-profile \
--instance-profile-name failzero-agent
aws iam add-role-to-instance-profile \
--instance-profile-name failzero-agent \
--role-name failzero-agent
4. Launch EC2 Instance
# Store token in Secrets Manager
aws secretsmanager create-secret \
--name failzero-agent-token \
--secret-string "fzat_your_token"
# Create user data script
cat > user-data.sh << 'EOF'
#!/bin/bash
yum update -y
yum install -y docker
systemctl start docker
systemctl enable docker
# Fetch token from Secrets Manager
TOKEN=$(aws secretsmanager get-secret-value \
--secret-id failzero-agent-token \
--query SecretString \
--output text)
docker run -d \
--name failzero-agent \
--restart unless-stopped \
-e FAILZERO_AGENT_TOKEN=$TOKEN \
-e FAILZERO_API_URL=https://api.failzero.io \
-e PROVIDER_TYPE=aws \
-e AWS_ACCOUNT_ID=123456789012 \
-e AWS_REGION=us-east-1 \
failzero/agent:latest
EOF
# Launch instance
aws ec2 run-instances \
--image-id ami-0c55b159cbfafe1f0 \
--instance-type t3.small \
--iam-instance-profile Name=failzero-agent \
--user-data file://user-data.sh \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=failzero-agent}]'
ECS/Fargate
1. Create Task Execution Role
# Trust policy for ECS
cat > ecs-trust-policy.json << 'EOF'
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ecs-tasks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
# Create execution role
aws iam create-role \
--role-name failzero-agent-execution \
--assume-role-policy-document file://ecs-trust-policy.json
# Attach managed policy
aws iam attach-role-policy \
--role-name failzero-agent-execution \
--policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
# Create task role (for DR operations)
aws iam create-role \
--role-name failzero-agent-task \
--assume-role-policy-document file://ecs-trust-policy.json
# Attach DR policy (from step 2 above)
aws iam put-role-policy \
--role-name failzero-agent-task \
--policy-name failzero-dr-operations \
--policy-document file://failzero-policy.json
2. Create Task Definition
{
"family": "failzero-agent",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512",
"executionRoleArn": "arn:aws:iam::ACCOUNT_ID:role/failzero-agent-execution",
"taskRoleArn": "arn:aws:iam::ACCOUNT_ID:role/failzero-agent-task",
"containerDefinitions": [
{
"name": "failzero-agent",
"image": "failzero/agent:latest",
"essential": true,
"environment": [
{"name": "FAILZERO_API_URL", "value": "https://api.failzero.io"},
{"name": "PROVIDER_TYPE", "value": "aws"},
{"name": "AWS_ACCOUNT_ID", "value": "123456789012"},
{"name": "AWS_REGION", "value": "us-east-1"}
],
"secrets": [
{
"name": "FAILZERO_AGENT_TOKEN",
"valueFrom": "arn:aws:secretsmanager:us-east-1:ACCOUNT_ID:secret:failzero-agent-token"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/failzero-agent",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
}
]
}
Register the task definition:
aws ecs register-task-definition \
--cli-input-json file://task-definition.json
3. Create ECS Service
# Create cluster (if needed)
aws ecs create-cluster --cluster-name failzero
# Create log group
aws logs create-log-group --log-group-name /ecs/failzero-agent
# Create service
aws ecs create-service \
--cluster failzero \
--service-name failzero-agent \
--task-definition failzero-agent:1 \
--desired-count 1 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[subnet-xxx],securityGroups=[sg-xxx],assignPublicIp=ENABLED}"
Replace subnet-xxx and sg-xxx with your actual VPC subnet and security group IDs. The security group must allow outbound HTTPS.
IAM Permissions
Minimum Required
| Service | Actions | Purpose |
|---|
| RDS | rds:PromoteReadReplica, rds:DescribeDBInstances | Promote replicas |
| Route53 | route53:ChangeResourceRecordSets | Update DNS records |
Optional (Based on DR Plan)
| Service | Actions | Purpose |
|---|
| Auto Scaling | autoscaling:UpdateAutoScalingGroup | Scale compute |
| ECS | ecs:UpdateService | Scale containers |
| Secrets Manager | secretsmanager:GetSecretValue | Read secrets |
| S3 | s3:GetObject, s3:PutObject | Backup operations |
| SNS | sns:Publish | Notifications |
Verify Deployment
EC2
# Connect to instance
aws ssm start-session --target i-xxxxxxxxxxxxx
# Check Docker logs
docker logs failzero-agent
ECS
# List tasks
aws ecs list-tasks --cluster failzero --service-name failzero-agent
# View logs
aws logs tail /ecs/failzero-agent --follow
Expected output:
[Agent] Starting FailZero Agent...
[Agent] Registering with FailZero API...
[Agent] Registered successfully for organization: your-org
[Agent] Agent started successfully
Troubleshooting
Permission denied errors:
- Verify IAM role/policy is attached correctly
- Check task role (not execution role) has DR permissions
- Ensure Secrets Manager permissions for token retrieval
Cannot reach API:
- Verify security group allows outbound HTTPS (port 443)
- Check NAT gateway if running in private subnet
- Ensure VPC endpoints or internet gateway is configured
Task keeps restarting:
- Check CloudWatch logs for error messages
- Verify Secrets Manager secret exists and is accessible
- Confirm AWS_ACCOUNT_ID and AWS_REGION are correct
Next Steps