Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.failzero.io/llms.txt

Use this file to discover all available pages before exploring further.

GCP Deployment

Deploy the FailZero agent on Google Cloud Platform using Compute Engine or GKE.

Compute Engine

1. Create Service Account

# Create service account
gcloud iam service-accounts create failzero-agent \
  --display-name="FailZero Agent"

# Get the service account email
SA_EMAIL="failzero-agent@YOUR_PROJECT.iam.gserviceaccount.com"

2. Grant IAM Permissions

The agent needs permissions to execute failover operations:
PROJECT_ID="your-project-id"
SA_EMAIL="failzero-agent@${PROJECT_ID}.iam.gserviceaccount.com"

# Cloud SQL (database promotion)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/cloudsql.admin"

# Cloud DNS (DNS updates)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/dns.admin"

# Compute Engine (instance group scaling)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/compute.instanceAdmin.v1"

# Secret Manager (reading secrets)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/secretmanager.secretAccessor"
Grant only the permissions needed for your DR plan. These are examples for common failover operations.

3. Create VM Instance

gcloud compute instances create failzero-agent \
  --zone=us-west1-a \
  --machine-type=e2-small \
  --service-account=$SA_EMAIL \
  --scopes=cloud-platform \
  --image-family=cos-stable \
  --image-project=cos-cloud \
  --metadata=startup-script='#!/bin/bash
docker run -d \
  --name failzero-agent \
  --restart unless-stopped \
  -e FAILZERO_AGENT_TOKEN=fzat_your_token \
  -e FAILZERO_API_URL=https://api.failzero.io \
  -e PROVIDER_TYPE=gcp \
  -e GCP_PROJECT_ID=your-project \
  failzero/agent:latest'
Replace fzat_your_token with your actual agent token. For production, use Secret Manager instead of metadata.
Store sensitive values in Secret Manager:
# Create secret for agent token
echo -n "fzat_your_actual_token" | \
  gcloud secrets create failzero-agent-token \
  --data-file=-

# Grant access to service account
gcloud secrets add-iam-policy-binding failzero-agent-token \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/secretmanager.secretAccessor"
Update startup script to fetch from Secret Manager:
#!/bin/bash
TOKEN=$(gcloud secrets versions access latest --secret=failzero-agent-token)

docker run -d \
  --name failzero-agent \
  --restart unless-stopped \
  -e FAILZERO_AGENT_TOKEN=$TOKEN \
  -e FAILZERO_API_URL=https://api.failzero.io \
  -e PROVIDER_TYPE=gcp \
  -e GCP_PROJECT_ID=your-project \
  failzero/agent:latest

GKE (Kubernetes)

1. Create Kubernetes Secret

kubectl create secret generic failzero-agent \
  --from-literal=token=fzat_your_token

2. Deploy Agent

apiVersion: apps/v1
kind: Deployment
metadata:
  name: failzero-agent
spec:
  replicas: 1
  selector:
    matchLabels:
      app: failzero-agent
  template:
    metadata:
      labels:
        app: failzero-agent
    spec:
      serviceAccountName: failzero-agent
      containers:
        - name: agent
          image: failzero/agent:latest
          env:
            - name: FAILZERO_AGENT_TOKEN
              valueFrom:
                secretKeyRef:
                  name: failzero-agent
                  key: token
            - name: FAILZERO_API_URL
              value: "https://api.failzero.io"
            - name: PROVIDER_TYPE
              value: "gcp"
            - name: GCP_PROJECT_ID
              value: "your-project"
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "256Mi"
              cpu: "200m"
Use Workload Identity for secure credential management:
# Enable Workload Identity on cluster
gcloud container clusters update YOUR_CLUSTER \
  --workload-pool=YOUR_PROJECT.svc.id.goog

# Create Kubernetes service account
kubectl create serviceaccount failzero-agent

# Bind to GCP service account
gcloud iam service-accounts add-iam-policy-binding \
  failzero-agent@YOUR_PROJECT.iam.gserviceaccount.com \
  --role="roles/iam.workloadIdentityUser" \
  --member="serviceAccount:YOUR_PROJECT.svc.id.goog[default/failzero-agent]"

# Annotate Kubernetes service account
kubectl annotate serviceaccount failzero-agent \
  iam.gke.io/gcp-service-account=failzero-agent@YOUR_PROJECT.iam.gserviceaccount.com

IAM Permissions

Minimum Required

Resource TypeRolePurpose
Cloud SQLroles/cloudsql.adminPromote replicas
Cloud DNSroles/dns.adminUpdate DNS records

Optional (Based on DR Plan)

Resource TypeRolePurpose
Compute Engineroles/compute.instanceAdmin.v1Scale instance groups
Secret Managerroles/secretmanager.secretAccessorRead secrets
Cloud Storageroles/storage.adminBackup operations
Pub/Subroles/pubsub.publisherNotifications

Verify Deployment

Compute Engine

# SSH into the VM
gcloud compute ssh failzero-agent --zone=us-west1-a

# Check Docker logs
docker logs failzero-agent

GKE

# Check pod status
kubectl get pods -l app=failzero-agent

# View logs
kubectl logs -l app=failzero-agent
Expected output:
[Agent] Starting FailZero Agent...
[Agent] Registering with FailZero API...
[Agent] Registered successfully for organization: your-org
[Agent] Agent started successfully

Troubleshooting

Permission denied errors:
  • Verify IAM roles are assigned to the service account
  • Check the service account is attached to the VM/pod
  • Ensure Workload Identity is configured correctly (GKE)
Cannot reach API:
  • Check firewall rules allow outbound HTTPS (port 443)
  • Verify VPC allows egress to api.failzero.io
Agent not registering:
  • Confirm token is correct and not expired
  • Check logs for specific error messages

Next Steps

Example Plans

GCP-specific DR configurations

CLI Reference

Deploy and manage plans