3.7 KiB
3.7 KiB
| project | type | status | tags | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| pbs-production-migration-test | project-plan | active |
|
Production Migration Test Runbook
Objective
Validate full disaster recovery capability by restoring production to a new Linode, cutting over DNS, and confirming the site runs cleanly on fresh infrastructure.
Pre-Migration (Day Before)
- Post maintenance notice on Instagram (Jenny)
- Add site banner or maintenance page notification
- Confirm Linode Backups are enabled on production
- Verify Cloudflare dashboard access
- Have SSH keys ready for new server access
- Notify Jenny of maintenance window and estimated downtime (30-60 min)
Migration Steps (Early Morning)
1. Final Production Snapshot
- Log into Linode dashboard
- Navigate to production Linode → Backups tab
- Take manual snapshot, label:
pre-migration-YYYY-MM-DD - Wait for snapshot to complete (watch progress in dashboard)
2. Spin Up New Linode from Snapshot
- Linode dashboard → Create → From Backup
- Select the snapshot just taken
- Choose same or equivalent plan (2GB Linode)
- Same region as current production
- Note the new server's IP address
- Wait for provisioning to complete
3. Verify New Server
- SSH into new server using noted IP
- Run
docker ps --format "table {{.Names}}\t{{.Status}}"— confirm all containers are running - Check WordPress responds:
curl -H "Host: plantbasedsoutherner.com" http://localhost:80 - Verify MySQL is healthy:
docker exec mysql-container mysqladmin ping - Spot check wp-content files exist in the WordPress volume
4. DNS Cutover
- Log into Cloudflare dashboard
- Navigate to plantbasedsoutherner.com DNS settings
- Update A record to new Linode IP address
- Cloudflare proxied traffic should cut over within minutes
- Verify: browse to https://plantbasedsoutherner.com from phone
- Check multiple pages — homepage, a recipe, member login
- Test PBS-API health endpoint
- Verify n8n is accessible and workflows are intact
5. Post-Cutover Validation
- Check Uptime Kuma monitors are green
- Verify Portainer accessible
- Confirm SSL certificate is valid (green lock)
- Test Instagram automation — trigger a test comment if possible
- Check Google Chat alerts are still routing
Decision Point
After validation, choose one:
Option A: Keep New Server
- Update Linode dashboard labels/tags
- Update any hardcoded IPs in Ansible inventory
- Run updated Ansible against new server to layer staging changes
- Destroy or archive the old production Linode
- Update SSH config with new server IP
Option B: Revert to Original
- Flip Cloudflare DNS back to original Linode IP
- Verify site works on original server
- Destroy the temporary test Linode
- Document any issues found for future reference
Rollback Plan
If anything goes wrong at any step:
- Flip Cloudflare DNS back to original production IP
- Original server is untouched and still running
- Destroy the test Linode
- Site downtime should be under 30 minutes total
Post-Migration
- Remove maintenance notice / site banner
- Post on Instagram that site is back up
- Document outcome and any lessons learned
- Update Tech Wiki with any new findings
Key Details
- Production domain: plantbasedsoutherner.com
- DNS provider: Cloudflare (proxied)
- Current hosting: Linode 2GB
- Estimated downtime: 30-60 minutes
- Best window: Early morning, low traffic
Lessons Learned
(Fill in after completing the test)
Created: 2025-03-25 Updated: 2025-03-25