pbs-projects/PBS/Inbox/pbs-production-migration-test.md

3.7 KiB

project type status tags
pbs-production-migration-test project-plan active
pbs
production
docker
traefik
ansible
staging
wordpress

Production Migration Test Runbook

Objective

Validate full disaster recovery capability by restoring production to a new Linode, cutting over DNS, and confirming the site runs cleanly on fresh infrastructure.

Pre-Migration (Day Before)

  • Post maintenance notice on Instagram (Jenny)
  • Add site banner or maintenance page notification
  • Confirm Linode Backups are enabled on production
  • Verify Cloudflare dashboard access
  • Have SSH keys ready for new server access
  • Notify Jenny of maintenance window and estimated downtime (30-60 min)

Migration Steps (Early Morning)

1. Final Production Snapshot

  • Log into Linode dashboard
  • Navigate to production Linode → Backups tab
  • Take manual snapshot, label: pre-migration-YYYY-MM-DD
  • Wait for snapshot to complete (watch progress in dashboard)

2. Spin Up New Linode from Snapshot

  • Linode dashboard → Create → From Backup
  • Select the snapshot just taken
  • Choose same or equivalent plan (2GB Linode)
  • Same region as current production
  • Note the new server's IP address
  • Wait for provisioning to complete

3. Verify New Server

  • SSH into new server using noted IP
  • Run docker ps --format "table {{.Names}}\t{{.Status}}" — confirm all containers are running
  • Check WordPress responds: curl -H "Host: plantbasedsoutherner.com" http://localhost:80
  • Verify MySQL is healthy: docker exec mysql-container mysqladmin ping
  • Spot check wp-content files exist in the WordPress volume

4. DNS Cutover

  • Log into Cloudflare dashboard
  • Navigate to plantbasedsoutherner.com DNS settings
  • Update A record to new Linode IP address
  • Cloudflare proxied traffic should cut over within minutes
  • Verify: browse to https://plantbasedsoutherner.com from phone
  • Check multiple pages — homepage, a recipe, member login
  • Test PBS-API health endpoint
  • Verify n8n is accessible and workflows are intact

5. Post-Cutover Validation

  • Check Uptime Kuma monitors are green
  • Verify Portainer accessible
  • Confirm SSL certificate is valid (green lock)
  • Test Instagram automation — trigger a test comment if possible
  • Check Google Chat alerts are still routing

Decision Point

After validation, choose one:

Option A: Keep New Server

  • Update Linode dashboard labels/tags
  • Update any hardcoded IPs in Ansible inventory
  • Run updated Ansible against new server to layer staging changes
  • Destroy or archive the old production Linode
  • Update SSH config with new server IP

Option B: Revert to Original

  • Flip Cloudflare DNS back to original Linode IP
  • Verify site works on original server
  • Destroy the temporary test Linode
  • Document any issues found for future reference

Rollback Plan

If anything goes wrong at any step:

  1. Flip Cloudflare DNS back to original production IP
  2. Original server is untouched and still running
  3. Destroy the test Linode
  4. Site downtime should be under 30 minutes total

Post-Migration

  • Remove maintenance notice / site banner
  • Post on Instagram that site is back up
  • Document outcome and any lessons learned
  • Update Tech Wiki with any new findings

Key Details

  • Production domain: plantbasedsoutherner.com
  • DNS provider: Cloudflare (proxied)
  • Current hosting: Linode 2GB
  • Estimated downtime: 30-60 minutes
  • Best window: Early morning, low traffic

Lessons Learned

(Fill in after completing the test)


Created: 2025-03-25 Updated: 2025-03-25