GCP Deployment
Deploy the FailZero agent on Google Cloud Platform using Compute Engine or GKE.
Compute Engine
1. Create Service Account
# Create service account
gcloud iam service-accounts create failzero-agent \
--display-name="FailZero Agent"
# Get the service account email
SA_EMAIL="failzero-agent@YOUR_PROJECT.iam.gserviceaccount.com"
2. Grant IAM Permissions
The agent needs permissions to execute failover operations:
PROJECT_ID="your-project-id"
SA_EMAIL="failzero-agent@${PROJECT_ID}.iam.gserviceaccount.com"
# Cloud SQL (database promotion)
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:${SA_EMAIL}" \
--role="roles/cloudsql.admin"
# Cloud DNS (DNS updates)
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:${SA_EMAIL}" \
--role="roles/dns.admin"
# Compute Engine (instance group scaling)
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:${SA_EMAIL}" \
--role="roles/compute.instanceAdmin.v1"
# Secret Manager (reading secrets)
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:${SA_EMAIL}" \
--role="roles/secretmanager.secretAccessor"
Grant only the permissions needed for your DR plan. These are examples for common failover operations.
3. Create VM Instance
gcloud compute instances create failzero-agent \
--zone=us-west1-a \
--machine-type=e2-small \
--service-account=$SA_EMAIL \
--scopes=cloud-platform \
--image-family=cos-stable \
--image-project=cos-cloud \
--metadata=startup-script='#!/bin/bash
docker run -d \
--name failzero-agent \
--restart unless-stopped \
-e FAILZERO_AGENT_TOKEN=fzat_your_token \
-e FAILZERO_API_URL=https://api.failzero.io \
-e PROVIDER_TYPE=gcp \
-e GCP_PROJECT_ID=your-project \
failzero/agent:latest'
Replace fzat_your_token with your actual agent token. For production, use Secret Manager instead of metadata.
4. Using Secret Manager (Recommended)
Store sensitive values in Secret Manager:
# Create secret for agent token
echo -n "fzat_your_actual_token" | \
gcloud secrets create failzero-agent-token \
--data-file=-
# Grant access to service account
gcloud secrets add-iam-policy-binding failzero-agent-token \
--member="serviceAccount:${SA_EMAIL}" \
--role="roles/secretmanager.secretAccessor"
Update startup script to fetch from Secret Manager:
#!/bin/bash
TOKEN=$(gcloud secrets versions access latest --secret=failzero-agent-token)
docker run -d \
--name failzero-agent \
--restart unless-stopped \
-e FAILZERO_AGENT_TOKEN=$TOKEN \
-e FAILZERO_API_URL=https://api.failzero.io \
-e PROVIDER_TYPE=gcp \
-e GCP_PROJECT_ID=your-project \
failzero/agent:latest
GKE (Kubernetes)
1. Create Kubernetes Secret
kubectl create secret generic failzero-agent \
--from-literal=token=fzat_your_token
2. Deploy Agent
apiVersion: apps/v1
kind: Deployment
metadata:
name: failzero-agent
spec:
replicas: 1
selector:
matchLabels:
app: failzero-agent
template:
metadata:
labels:
app: failzero-agent
spec:
serviceAccountName: failzero-agent
containers:
- name: agent
image: failzero/agent:latest
env:
- name: FAILZERO_AGENT_TOKEN
valueFrom:
secretKeyRef:
name: failzero-agent
key: token
- name: FAILZERO_API_URL
value: "https://api.failzero.io"
- name: PROVIDER_TYPE
value: "gcp"
- name: GCP_PROJECT_ID
value: "your-project"
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
3. Workload Identity (Recommended)
Use Workload Identity for secure credential management:
# Enable Workload Identity on cluster
gcloud container clusters update YOUR_CLUSTER \
--workload-pool=YOUR_PROJECT.svc.id.goog
# Create Kubernetes service account
kubectl create serviceaccount failzero-agent
# Bind to GCP service account
gcloud iam service-accounts add-iam-policy-binding \
failzero-agent@YOUR_PROJECT.iam.gserviceaccount.com \
--role="roles/iam.workloadIdentityUser" \
--member="serviceAccount:YOUR_PROJECT.svc.id.goog[default/failzero-agent]"
# Annotate Kubernetes service account
kubectl annotate serviceaccount failzero-agent \
iam.gke.io/gcp-service-account=failzero-agent@YOUR_PROJECT.iam.gserviceaccount.com
IAM Permissions
Minimum Required
| Resource Type | Role | Purpose |
|---|
| Cloud SQL | roles/cloudsql.admin | Promote replicas |
| Cloud DNS | roles/dns.admin | Update DNS records |
Optional (Based on DR Plan)
| Resource Type | Role | Purpose |
|---|
| Compute Engine | roles/compute.instanceAdmin.v1 | Scale instance groups |
| Secret Manager | roles/secretmanager.secretAccessor | Read secrets |
| Cloud Storage | roles/storage.admin | Backup operations |
| Pub/Sub | roles/pubsub.publisher | Notifications |
Verify Deployment
Compute Engine
# SSH into the VM
gcloud compute ssh failzero-agent --zone=us-west1-a
# Check Docker logs
docker logs failzero-agent
GKE
# Check pod status
kubectl get pods -l app=failzero-agent
# View logs
kubectl logs -l app=failzero-agent
Expected output:
[Agent] Starting FailZero Agent...
[Agent] Registering with FailZero API...
[Agent] Registered successfully for organization: your-org
[Agent] Agent started successfully
Troubleshooting
Permission denied errors:
- Verify IAM roles are assigned to the service account
- Check the service account is attached to the VM/pod
- Ensure Workload Identity is configured correctly (GKE)
Cannot reach API:
- Check firewall rules allow outbound HTTPS (port 443)
- Verify VPC allows egress to
api.failzero.io
Agent not registering:
- Confirm token is correct and not expired
- Check logs for specific error messages
Next Steps