From bb0699b792a7186d310f5db900c373978326c1f3 Mon Sep 17 00:00:00 2001 From: herbygitea Date: Sun, 29 Mar 2026 23:49:16 +0000 Subject: [PATCH] Create production-deploy-march-2026.md via n8n --- .../Projects/production-deploy-march-2026.md | 116 ++++++++++++++++++ 1 file changed, 116 insertions(+) create mode 100644 PBS/Tech/Projects/production-deploy-march-2026.md diff --git a/PBS/Tech/Projects/production-deploy-march-2026.md b/PBS/Tech/Projects/production-deploy-march-2026.md new file mode 100644 index 0000000..a71c5c8 --- /dev/null +++ b/PBS/Tech/Projects/production-deploy-march-2026.md @@ -0,0 +1,116 @@ +--- +project: production-deploy-march-2026 +type: session-notes +status: completed +tags: + - production + - docker + - traefik + - ansible + - cloudflare + - gitea + - n8n +created: 2026-03-29 +updated: 2026-03-29 +path: PBS/Tech/Projects/ +--- + +# Production Deploy - March 29, 2026 + +## Overview +Deployed infrastructure updates to production Linode via Ansible, including +new container configurations, database schema updates, and Gitea setup. +Used Cloudflare Worker for maintenance mode during deploy. + +## Accomplishments + +### Cloudflare Worker Maintenance Page +- Created `pbs-maintain` Worker with PBS-branded maintenance page (Sunnie +themed) +- Added IP bypass so Travis can access the site while maintenance page is +active for public +- Worker URL: https://pbs-maintain.tjherbranson.workers.dev +- Toggle on/off by adding/removing Workers Route for ` +plantbasedsoutherner.com/*` +- Worker uses `CF-Connecting-IP` header to check allowed IPs + +### n8n Document Pipeline Fix +- Identified race condition: two emails processed simultaneously caused +Gitea ref lock errors +- Root cause: Gitea Contents API creates a commit on every file +create/update — two simultaneous API calls create competing commits on +`main` +- Fix: Added Loop node before Gitea create/update nodes to serialize file +processing +- Key distinction: Loop node serializes items through the same path; Split +In Batches chunks data — these are different nodes with different behaviors + +### Production Deploy Process +- Cloned live Linode as backup before changes +- Ran Ansible playbook against the clone +- Deployed MySQL schema updates via phpMyAdmin (copy/paste with `IF NOT +EXISTS`) +- Updated DNS in Cloudflare to point to new clone server +- Clone became the new production server; original kept as rollback backup + +### Container Fixes During Deploy +- **pbs-api healthcheck**: Replaced curl-based healthcheck with Python +(`urllib.request`) since curl not installed in container +- **Missing README.md**: pbs-api build failed because `pyproject.toml` +referenced a README.md that didn't exist — created empty file +- **MySQL memory limits**: Added deploy block with 768M limit and tuning +flags to compose +- **WordPress memory limit**: Added 2000M deploy limit +- **Portainer stale container reference**: Restarted Portainer to clear +cached container IDs from pre-deploy + +### Gitea Production Setup +- Added DNS record for `gitea.plantbasedsoutherner.com` in Cloudflare +- Waited for DNS propagation before Traefik could issue Let's Encrypt cert +- Removed healthcheck from Gitea container (healthcheck returns 404 before +setup wizard completes, Traefik won't route to unhealthy containers) +- Completed setup wizard and created admin user +- Set `LANDING_PAGE = login` in app.ini (future task) + +### WordPress Staging Redirect Issue +- After Ansible deploy, site redirected to staging domain +- Root cause: One Traefik router label had staging URL, WordPress picked it +up and wrote it to `wp_options` table +- Fix: Updated `siteurl` and `home` in `wp_options` via phpMyAdmin, flushed +Redis, purged Cloudflare cache +- Lesson: WordPress can auto-update `wp_options` URLs based on incoming +request hostname + +## Key Learnings + +- **Cloudflare Worker IP bypass**: `return fetch(request)` re-fetches +through Cloudflare's network, so WordPress sees a Cloudflare edge IP +instead of your real IP — can trigger Wordfence lockouts +- **Cloudflare caches 301 redirects**: Always purge Cloudflare cache after +fixing redirect issues +- **Gitea API creates implicit commits**: Every file create/update via the +Contents API is a git commit — serialize multiple file operations to avoid +ref lock errors +- **Ansible staging drift**: Fixes applied directly on staging don't make +it back to Ansible automatically — fix on staging to unblock, but +immediately update Ansible too +- **DNS propagation for Let's Encrypt**: Traefik can't issue certs until +DNS propagates — recreate the container (`docker compose up -d +--force-recreate`) to trigger a retry without restarting all of Traefik +- **Wildcard DNS records**: `*.staging` in Cloudflare catches all +subdomains under staging automatically — convenient for staging, avoid on +production +- **`IF NOT EXISTS` MySQL warning**: The "less efficient" warning about +index generation during table creation is negligible for small schemas — +keep the safety of `IF NOT EXISTS` + +## Still To Do +- [ ] n8n and database configuration on production +- [ ] Set Gitea landing page to login in `app.ini` +- [ ] Configure Gitea email settings (deferred) +- [ ] Add Gitea healthcheck back with wget-based check after setup is stable +- [ ] Delete old Linode backup server after stability verification (~1 week) +- [ ] Continue work on per-container Ansible playbook +- [ ] Update Ansible with any fixes applied directly to production during +this session +...sent from Jenny & Travis \ No newline at end of file