--- project: pbs-production-migration-test type: project-plan status: active tags: - pbs - production - docker - traefik - ansible - staging - wordpress --- # Production Migration Test Runbook ## Objective Validate full disaster recovery capability by restoring production to a new Linode, cutting over DNS, and confirming the site runs cleanly on fresh infrastructure. ## Pre-Migration (Day Before) - [ ] Post maintenance notice on Instagram (Jenny) - [ ] Add site banner or maintenance page notification - [ ] Confirm Linode Backups are enabled on production - [ ] Verify Cloudflare dashboard access - [ ] Have SSH keys ready for new server access - [ ] Notify Jenny of maintenance window and estimated downtime (30-60 min) ## Migration Steps (Early Morning) ### 1. Final Production Snapshot - [ ] Log into Linode dashboard - [ ] Navigate to production Linode → Backups tab - [ ] Take manual snapshot, label: `pre-migration-YYYY-MM-DD` - [ ] Wait for snapshot to complete (watch progress in dashboard) ### 2. Spin Up New Linode from Snapshot - [ ] Linode dashboard → Create → From Backup - [ ] Select the snapshot just taken - [ ] Choose same or equivalent plan (2GB Linode) - [ ] Same region as current production - [ ] Note the new server's IP address - [ ] Wait for provisioning to complete ### 3. Verify New Server - [ ] SSH into new server using noted IP - [ ] Run `docker ps --format "table {{.Names}}\t{{.Status}}"` — confirm all containers are running - [ ] Check WordPress responds: `curl -H "Host: plantbasedsoutherner.com" http://localhost:80` - [ ] Verify MySQL is healthy: `docker exec mysql-container mysqladmin ping` - [ ] Spot check wp-content files exist in the WordPress volume ### 4. DNS Cutover - [ ] Log into Cloudflare dashboard - [ ] Navigate to plantbasedsoutherner.com DNS settings - [ ] Update A record to new Linode IP address - [ ] Cloudflare proxied traffic should cut over within minutes - [ ] Verify: browse to https://plantbasedsoutherner.com from phone - [ ] Check multiple pages — homepage, a recipe, member login - [ ] Test PBS-API health endpoint - [ ] Verify n8n is accessible and workflows are intact ### 5. Post-Cutover Validation - [ ] Check Uptime Kuma monitors are green - [ ] Verify Portainer accessible - [ ] Confirm SSL certificate is valid (green lock) - [ ] Test Instagram automation — trigger a test comment if possible - [ ] Check Google Chat alerts are still routing ## Decision Point After validation, choose one: **Option A: Keep New Server** - [ ] Update Linode dashboard labels/tags - [ ] Update any hardcoded IPs in Ansible inventory - [ ] Run updated Ansible against new server to layer staging changes - [ ] Destroy or archive the old production Linode - [ ] Update SSH config with new server IP **Option B: Revert to Original** - [ ] Flip Cloudflare DNS back to original Linode IP - [ ] Verify site works on original server - [ ] Destroy the temporary test Linode - [ ] Document any issues found for future reference ## Rollback Plan If anything goes wrong at any step: 1. Flip Cloudflare DNS back to original production IP 2. Original server is untouched and still running 3. Destroy the test Linode 4. Site downtime should be under 30 minutes total ## Post-Migration - [ ] Remove maintenance notice / site banner - [ ] Post on Instagram that site is back up - [ ] Document outcome and any lessons learned - [ ] Update Tech Wiki with any new findings ## Key Details - **Production domain:** plantbasedsoutherner.com - **DNS provider:** Cloudflare (proxied) - **Current hosting:** Linode 2GB - **Estimated downtime:** 30-60 minutes - **Best window:** Early morning, low traffic ## Lessons Learned *(Fill in after completing the test)* --- Created: 2025-03-25 Updated: 2025-03-25