pbs-projects/PBS/Inbox/pbs-production-migration-test.md

114 lines
3.7 KiB
Markdown

---
project: pbs-production-migration-test
type: project-plan
status: active
tags:
- pbs
- production
- docker
- traefik
- ansible
- staging
- wordpress
---
# Production Migration Test Runbook
## Objective
Validate full disaster recovery capability by restoring production to a new
Linode, cutting over DNS, and confirming the site runs cleanly on fresh
infrastructure.
## Pre-Migration (Day Before)
- [ ] Post maintenance notice on Instagram (Jenny)
- [ ] Add site banner or maintenance page notification
- [ ] Confirm Linode Backups are enabled on production
- [ ] Verify Cloudflare dashboard access
- [ ] Have SSH keys ready for new server access
- [ ] Notify Jenny of maintenance window and estimated downtime (30-60 min)
## Migration Steps (Early Morning)
### 1. Final Production Snapshot
- [ ] Log into Linode dashboard
- [ ] Navigate to production Linode → Backups tab
- [ ] Take manual snapshot, label: `pre-migration-YYYY-MM-DD`
- [ ] Wait for snapshot to complete (watch progress in dashboard)
### 2. Spin Up New Linode from Snapshot
- [ ] Linode dashboard → Create → From Backup
- [ ] Select the snapshot just taken
- [ ] Choose same or equivalent plan (2GB Linode)
- [ ] Same region as current production
- [ ] Note the new server's IP address
- [ ] Wait for provisioning to complete
### 3. Verify New Server
- [ ] SSH into new server using noted IP
- [ ] Run `docker ps --format "table {{.Names}}\t{{.Status}}"` — confirm
all containers are running
- [ ] Check WordPress responds: `curl -H "Host: plantbasedsoutherner.com"
http://localhost:80`
- [ ] Verify MySQL is healthy: `docker exec mysql-container mysqladmin ping`
- [ ] Spot check wp-content files exist in the WordPress volume
### 4. DNS Cutover
- [ ] Log into Cloudflare dashboard
- [ ] Navigate to plantbasedsoutherner.com DNS settings
- [ ] Update A record to new Linode IP address
- [ ] Cloudflare proxied traffic should cut over within minutes
- [ ] Verify: browse to https://plantbasedsoutherner.com from phone
- [ ] Check multiple pages — homepage, a recipe, member login
- [ ] Test PBS-API health endpoint
- [ ] Verify n8n is accessible and workflows are intact
### 5. Post-Cutover Validation
- [ ] Check Uptime Kuma monitors are green
- [ ] Verify Portainer accessible
- [ ] Confirm SSL certificate is valid (green lock)
- [ ] Test Instagram automation — trigger a test comment if possible
- [ ] Check Google Chat alerts are still routing
## Decision Point
After validation, choose one:
**Option A: Keep New Server**
- [ ] Update Linode dashboard labels/tags
- [ ] Update any hardcoded IPs in Ansible inventory
- [ ] Run updated Ansible against new server to layer staging changes
- [ ] Destroy or archive the old production Linode
- [ ] Update SSH config with new server IP
**Option B: Revert to Original**
- [ ] Flip Cloudflare DNS back to original Linode IP
- [ ] Verify site works on original server
- [ ] Destroy the temporary test Linode
- [ ] Document any issues found for future reference
## Rollback Plan
If anything goes wrong at any step:
1. Flip Cloudflare DNS back to original production IP
2. Original server is untouched and still running
3. Destroy the test Linode
4. Site downtime should be under 30 minutes total
## Post-Migration
- [ ] Remove maintenance notice / site banner
- [ ] Post on Instagram that site is back up
- [ ] Document outcome and any lessons learned
- [ ] Update Tech Wiki with any new findings
## Key Details
- **Production domain:** plantbasedsoutherner.com
- **DNS provider:** Cloudflare (proxied)
- **Current hosting:** Linode 2GB
- **Estimated downtime:** 30-60 minutes
- **Best window:** Early morning, low traffic
## Lessons Learned
*(Fill in after completing the test)*
---
Created: 2025-03-25
Updated: 2025-03-25