Skip to main content

GCP Deployment

Deploy the FailZero agent on Google Cloud Platform using Compute Engine or GKE.

Compute Engine

1. Create Service Account

# Create service account
gcloud iam service-accounts create failzero-agent \
  --display-name="FailZero Agent"

# Get the service account email
SA_EMAIL="failzero-agent@YOUR_PROJECT.iam.gserviceaccount.com"

2. Grant IAM Permissions

The agent needs permissions to execute failover operations:
PROJECT_ID="your-project-id"
SA_EMAIL="failzero-agent@${PROJECT_ID}.iam.gserviceaccount.com"

# Cloud SQL (database promotion)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/cloudsql.admin"

# Cloud DNS (DNS updates)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/dns.admin"

# Compute Engine (instance group scaling)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/compute.instanceAdmin.v1"

# Secret Manager (reading secrets)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/secretmanager.secretAccessor"
Grant only the permissions needed for your DR plan. These are examples for common failover operations.

3. Create VM Instance

gcloud compute instances create failzero-agent \
  --zone=us-west1-a \
  --machine-type=e2-small \
  --service-account=$SA_EMAIL \
  --scopes=cloud-platform \
  --image-family=cos-stable \
  --image-project=cos-cloud \
  --metadata=startup-script='#!/bin/bash
docker run -d \
  --name failzero-agent \
  --restart unless-stopped \
  -e FAILZERO_AGENT_TOKEN=fzat_your_token \
  -e FAILZERO_API_URL=https://api.failzero.io \
  -e PROVIDER_TYPE=gcp \
  -e GCP_PROJECT_ID=your-project \
  failzero/agent:latest'
Replace fzat_your_token with your actual agent token. For production, use Secret Manager instead of metadata.
Store sensitive values in Secret Manager:
# Create secret for agent token
echo -n "fzat_your_actual_token" | \
  gcloud secrets create failzero-agent-token \
  --data-file=-

# Grant access to service account
gcloud secrets add-iam-policy-binding failzero-agent-token \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/secretmanager.secretAccessor"
Update startup script to fetch from Secret Manager:
#!/bin/bash
TOKEN=$(gcloud secrets versions access latest --secret=failzero-agent-token)

docker run -d \
  --name failzero-agent \
  --restart unless-stopped \
  -e FAILZERO_AGENT_TOKEN=$TOKEN \
  -e FAILZERO_API_URL=https://api.failzero.io \
  -e PROVIDER_TYPE=gcp \
  -e GCP_PROJECT_ID=your-project \
  failzero/agent:latest

GKE (Kubernetes)

1. Create Kubernetes Secret

kubectl create secret generic failzero-agent \
  --from-literal=token=fzat_your_token

2. Deploy Agent

apiVersion: apps/v1
kind: Deployment
metadata:
  name: failzero-agent
spec:
  replicas: 1
  selector:
    matchLabels:
      app: failzero-agent
  template:
    metadata:
      labels:
        app: failzero-agent
    spec:
      serviceAccountName: failzero-agent
      containers:
        - name: agent
          image: failzero/agent:latest
          env:
            - name: FAILZERO_AGENT_TOKEN
              valueFrom:
                secretKeyRef:
                  name: failzero-agent
                  key: token
            - name: FAILZERO_API_URL
              value: "https://api.failzero.io"
            - name: PROVIDER_TYPE
              value: "gcp"
            - name: GCP_PROJECT_ID
              value: "your-project"
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "256Mi"
              cpu: "200m"
Use Workload Identity for secure credential management:
# Enable Workload Identity on cluster
gcloud container clusters update YOUR_CLUSTER \
  --workload-pool=YOUR_PROJECT.svc.id.goog

# Create Kubernetes service account
kubectl create serviceaccount failzero-agent

# Bind to GCP service account
gcloud iam service-accounts add-iam-policy-binding \
  failzero-agent@YOUR_PROJECT.iam.gserviceaccount.com \
  --role="roles/iam.workloadIdentityUser" \
  --member="serviceAccount:YOUR_PROJECT.svc.id.goog[default/failzero-agent]"

# Annotate Kubernetes service account
kubectl annotate serviceaccount failzero-agent \
  iam.gke.io/gcp-service-account=failzero-agent@YOUR_PROJECT.iam.gserviceaccount.com

IAM Permissions

Minimum Required

Resource TypeRolePurpose
Cloud SQLroles/cloudsql.adminPromote replicas
Cloud DNSroles/dns.adminUpdate DNS records

Optional (Based on DR Plan)

Resource TypeRolePurpose
Compute Engineroles/compute.instanceAdmin.v1Scale instance groups
Secret Managerroles/secretmanager.secretAccessorRead secrets
Cloud Storageroles/storage.adminBackup operations
Pub/Subroles/pubsub.publisherNotifications

Verify Deployment

Compute Engine

# SSH into the VM
gcloud compute ssh failzero-agent --zone=us-west1-a

# Check Docker logs
docker logs failzero-agent

GKE

# Check pod status
kubectl get pods -l app=failzero-agent

# View logs
kubectl logs -l app=failzero-agent
Expected output:
[Agent] Starting FailZero Agent...
[Agent] Registering with FailZero API...
[Agent] Registered successfully for organization: your-org
[Agent] Agent started successfully

Troubleshooting

Permission denied errors:
  • Verify IAM roles are assigned to the service account
  • Check the service account is attached to the VM/pod
  • Ensure Workload Identity is configured correctly (GKE)
Cannot reach API:
  • Check firewall rules allow outbound HTTPS (port 443)
  • Verify VPC allows egress to api.failzero.io
Agent not registering:
  • Confirm token is correct and not expired
  • Check logs for specific error messages

Next Steps