pbs-projects/PBS/Tech/Projects/production-deploy-march-2026.md

4.7 KiB

project type status tags created updated path
production-deploy-march-2026 session-notes completed
production
docker
traefik
ansible
cloudflare
gitea
n8n
2026-03-29 2026-03-29 PBS/Tech/Projects/

Production Deploy - March 29, 2026

Overview

Deployed infrastructure updates to production Linode via Ansible, including new container configurations, database schema updates, and Gitea setup. Used Cloudflare Worker for maintenance mode during deploy.

Accomplishments

Cloudflare Worker Maintenance Page

  • Created pbs-maintain Worker with PBS-branded maintenance page (Sunnie themed)
  • Added IP bypass so Travis can access the site while maintenance page is active for public
  • Worker URL: https://pbs-maintain.tjherbranson.workers.dev
  • Toggle on/off by adding/removing Workers Route for plantbasedsoutherner.com/*
  • Worker uses CF-Connecting-IP header to check allowed IPs

n8n Document Pipeline Fix

  • Identified race condition: two emails processed simultaneously caused Gitea ref lock errors
  • Root cause: Gitea Contents API creates a commit on every file create/update — two simultaneous API calls create competing commits on main
  • Fix: Added Loop node before Gitea create/update nodes to serialize file processing
  • Key distinction: Loop node serializes items through the same path; Split In Batches chunks data — these are different nodes with different behaviors

Production Deploy Process

  • Cloned live Linode as backup before changes
  • Ran Ansible playbook against the clone
  • Deployed MySQL schema updates via phpMyAdmin (copy/paste with IF NOT EXISTS)
  • Updated DNS in Cloudflare to point to new clone server
  • Clone became the new production server; original kept as rollback backup

Container Fixes During Deploy

  • pbs-api healthcheck: Replaced curl-based healthcheck with Python (urllib.request) since curl not installed in container
  • Missing README.md: pbs-api build failed because pyproject.toml referenced a README.md that didn't exist — created empty file
  • MySQL memory limits: Added deploy block with 768M limit and tuning flags to compose
  • WordPress memory limit: Added 2000M deploy limit
  • Portainer stale container reference: Restarted Portainer to clear cached container IDs from pre-deploy

Gitea Production Setup

  • Added DNS record for gitea.plantbasedsoutherner.com in Cloudflare
  • Waited for DNS propagation before Traefik could issue Let's Encrypt cert
  • Removed healthcheck from Gitea container (healthcheck returns 404 before setup wizard completes, Traefik won't route to unhealthy containers)
  • Completed setup wizard and created admin user
  • Set LANDING_PAGE = login in app.ini (future task)

WordPress Staging Redirect Issue

  • After Ansible deploy, site redirected to staging domain
  • Root cause: One Traefik router label had staging URL, WordPress picked it up and wrote it to wp_options table
  • Fix: Updated siteurl and home in wp_options via phpMyAdmin, flushed Redis, purged Cloudflare cache
  • Lesson: WordPress can auto-update wp_options URLs based on incoming request hostname

Key Learnings

  • Cloudflare Worker IP bypass: return fetch(request) re-fetches through Cloudflare's network, so WordPress sees a Cloudflare edge IP instead of your real IP — can trigger Wordfence lockouts
  • Cloudflare caches 301 redirects: Always purge Cloudflare cache after fixing redirect issues
  • Gitea API creates implicit commits: Every file create/update via the Contents API is a git commit — serialize multiple file operations to avoid ref lock errors
  • Ansible staging drift: Fixes applied directly on staging don't make it back to Ansible automatically — fix on staging to unblock, but immediately update Ansible too
  • DNS propagation for Let's Encrypt: Traefik can't issue certs until DNS propagates — recreate the container (docker compose up -d --force-recreate) to trigger a retry without restarting all of Traefik
  • Wildcard DNS records: *.staging in Cloudflare catches all subdomains under staging automatically — convenient for staging, avoid on production
  • IF NOT EXISTS MySQL warning: The "less efficient" warning about index generation during table creation is negligible for small schemas — keep the safety of IF NOT EXISTS

Still To Do

  • n8n and database configuration on production
  • Set Gitea landing page to login in app.ini
  • Configure Gitea email settings (deferred)
  • Add Gitea healthcheck back with wget-based check after setup is stable
  • Delete old Linode backup server after stability verification (~1 week)
  • Continue work on per-container Ansible playbook
  • Update Ansible with any fixes applied directly to production during this session ...sent from Jenny & Travis