Blue-green deploys, zero-downtime container swaps with nginx
Two containers behind one nginx upstream, atomic switch via reload, parallel run for soak time, kill the old. Production deploys without dropping a single request.
Blue-green deploys, zero-downtime container swaps with nginx
Default docker compose up -d --force-recreate has 5-30 seconds of downtime, old container stops, new container starts, requests in-between fail. For a low-traffic SaaS that's fine. For anything customer-facing, blue-green is the upgrade: two containers run in parallel, nginx switches atomically, you kill the old one only after you're sure the new one is healthy.
Schritt 1: When you actually need this
Honest assessment first. You don't need blue-green if:
- You have < 100 active users at peak.
- A 30-second blip during deploys is acceptable.
- Most deploys are during off-peak hours (3am).
You do need it if:
- Customer-facing SaaS with paying users.
- You deploy during business hours.
- One bad deploy could lose data mid-request.
- You want to soak-test the new version against real traffic before committing.
For low-traffic content-only servers (memory, GEO-style audit, tutorial servers) the default we run is single-container --force-recreate, the simplicity wins. For SaaS that handles billing, real-time chat, or any state where dropping a single request costs money, blue-green pays for itself the first deploy.
Schritt 2: docker-compose.yml with two services
# docker-compose.yml
services:
my-mcp-blue:
image: my-mcp:${BLUE_TAG:-latest}
container_name: my-mcp-blue
restart: unless-stopped
network_mode: host
env_file: .env
environment:
- PORT=3001 # blue on 3001
- HOST=127.0.0.1
healthcheck:
test: ["CMD-SHELL", "node -e \"fetch('http://localhost:3001/health').then(r => process.exit(r.ok ? 0 : 1))\""]
interval: 10s
timeout: 3s
retries: 3
start_period: 15s
my-mcp-green:
image: my-mcp:${GREEN_TAG:-latest}
container_name: my-mcp-green
restart: unless-stopped
network_mode: host
env_file: .env
environment:
- PORT=3002 # green on 3002
- HOST=127.0.0.1
healthcheck:
test: ["CMD-SHELL", "node -e \"fetch('http://localhost:3002/health').then(r => process.exit(r.ok ? 0 : 1))\""]
interval: 10s
timeout: 3s
retries: 3
start_period: 15s
Two containers, two ports, two image tags. Both can run simultaneously.
Schritt 3: nginx upstream + atomic switch
# /etc/nginx/sites-available/my-mcp.conf
# This file is the "active" one, symlinked from sites-enabled.
upstream my_mcp_backend {
server 127.0.0.1:3001; # blue (active)
# server 127.0.0.1:3002; # green (commented out)
}
server {
listen 443 ssl http2;
server_name your-mcp.io;
# ... ssl + headers ...
location / {
proxy_pass http://my_mcp_backend;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_read_timeout 3600s;
proxy_buffering off;
}
}
To switch from blue to green: edit the upstream block, swap the comment, nginx -t && systemctl reload nginx. Reload is atomic, nginx finishes in-flight requests on blue, sends new requests to green. Zero dropped connections.
Schritt 4: The deploy script
#!/bin/bash
# bg-deploy.sh, Blue-green deploy
set -euo pipefail
NEW_TAG="$1" # e.g. v1.2.3 or commit SHA
REMOTE=ai-server
REMOTE_DIR=/opt/my-mcp
NGINX_CONF=/etc/nginx/sites-enabled/my-mcp.conf
if [ -z "$NEW_TAG" ]; then
echo "Usage: $0 <new-tag>"; exit 1
fi
ssh "$REMOTE" "set -euo pipefail; cd $REMOTE_DIR
# 1. Find which color is currently active
ACTIVE=\$(grep -E 'server 127.0.0.1:300[12];' $NGINX_CONF | head -1 | grep -oE '300[12]')
if [ \"\$ACTIVE\" = '3001' ]; then ACTIVE_COLOR=blue; INACTIVE_COLOR=green; INACTIVE_PORT=3002
else ACTIVE_COLOR=green; INACTIVE_COLOR=blue; INACTIVE_PORT=3001; fi
echo \"Active: \$ACTIVE_COLOR. Deploying $NEW_TAG to \$INACTIVE_COLOR.\"
# 2. Pull / build the new image, deploy to the inactive slot
if [ \"\$INACTIVE_COLOR\" = 'green' ]; then
GREEN_TAG=$NEW_TAG docker compose up -d --force-recreate my-mcp-green
else
BLUE_TAG=$NEW_TAG docker compose up -d --force-recreate my-mcp-blue
fi
# 3. Wait for the inactive container to be healthy
for i in \$(seq 1 30); do
STATUS=\$(docker inspect my-mcp-\$INACTIVE_COLOR --format '{{.State.Health.Status}}')
[ \"\$STATUS\" = 'healthy' ] && break
sleep 2
done
if [ \"\$STATUS\" != 'healthy' ]; then
echo \"FAIL: \$INACTIVE_COLOR did not become healthy. Aborting.\"
exit 1
fi
# 4. Smoke-test the inactive port directly
curl -fsS http://localhost:\$INACTIVE_PORT/health > /dev/null
# 5. Atomically switch nginx upstream
sed -i \"s|server 127.0.0.1:\$ACTIVE.*;|# server 127.0.0.1:\$ACTIVE; (was active)|\" $NGINX_CONF
sed -i \"s|# server 127.0.0.1:\$INACTIVE_PORT.*|server 127.0.0.1:\$INACTIVE_PORT;|\" $NGINX_CONF
nginx -t
systemctl reload nginx
echo 'Switched to '\$INACTIVE_COLOR'. Active deploys finishing on '\$ACTIVE_COLOR' for soak.'
"
# 6. Soak, let the old one keep handling in-flight requests for 5 minutes
echo "Soaking for 5 minutes, monitor https://your-mcp.io/health and external user behavior."
sleep 300
# 7. (Optional) Stop the old container
read -p "Stop the old container? [y/N] " ans
if [ "$ans" = "y" ]; then
ssh "$REMOTE" "docker stop my-mcp-\$ACTIVE_COLOR || true"
fi
What it does:
- SSH in (one connection, see 8.2 about SSH batching).
- Detect which color is active via the nginx config.
- Deploy new tag to the inactive color.
- Wait up to 60s for the new container to be healthy.
- Smoke-test the new port directly.
- Atomically swap nginx upstream + reload.
- Soak, old container keeps handling in-flight requests, new container takes new traffic.
- After 5 minutes (configurable), prompt to stop the old one.
Schritt 5: Rollback is also one command
If the new color misbehaves during soak:
# bg-rollback.sh
ssh "$REMOTE" "cd $REMOTE_DIR
# Swap nginx upstream back
sed -i 's|^server 127.0.0.1:300[12];|# &|' $NGINX_CONF
sed -i 's|^# server 127.0.0.1:300[12]; (was active)|server 127.0.0.1:&;|' $NGINX_CONF
nginx -t && systemctl reload nginx
"
Rollback is just nginx reload, 50ms. Fast because the old container is still running.
Schritt 6: When to actually kill the old
After at least 5 minutes of clean operation on the new color, with:
- No 5xx spike in nginx access logs.
- No errors logged by the app.
- No customer reports.
- External monitor (8.4 Uptime Kuma) green for the full 5 minutes.
Then docker stop the old. The slot is free for the next deploy.
Schritt 7: Verify
Run academy_validate_step. The validator confirms package.json plumbing.
For the actual blue-green setup:
# 1. Both containers can run simultaneously
docker ps --filter name=my-mcp
# → my-mcp-blue Up X (healthy) 3001
# → my-mcp-green Up Y (healthy) 3002
# 2. Both /health endpoints reachable directly
curl -s http://localhost:3001/health | jq .version
curl -s http://localhost:3002/health | jq .version
# → different versions during a deploy
# 3. Public URL only sees one
curl -s https://your-mcp.io/health | jq .version
# → matches whichever color is active
Common traps
- Same
container_namefor both colors. Docker can't run two containers with the same name. Use-blue/-greensuffix. - Same PORT in env, both bind to 3000, second container fails. Hard-code different ports per service.
- Hard switch without soak, kills in-flight requests. Always reload nginx (graceful) and let the old one drain.
- Forgetting to update both
BLUE_TAGandGREEN_TAGdefaults, first deploy useslatestfor both, can't tell them apart. - No healthcheck on either color, script can't tell when the new one is ready.
- Deploying via
restartinstead ofup -d --force-recreate, env_file changes don't take effect. - Running the deploy script from cron, interactive
read -pwill hang. Add a--non-interactiveflag with auto-stop after a fixed soak time.
What good looks like
Two containers run side-by-side. Deploy is one command (./bg-deploy.sh v1.2.3). 60s for new container to be healthy + 5min soak before old is killed = ~7 minutes total, but zero downtime. Rollback is also one command. nginx reload is atomic.
For low-stakes SaaS, single-container with --force-recreate is fine, keep it simple. For revenue-impacting deploys, blue-green pays for itself the first time you avoid an outage.
cat package.json 2>/dev/null | python3 -c "import json,sys; p=json.load(sys.stdin); deps=list((p.get(\"dependencies\") or {}).keys()); print(\"sdk:\", \"@modelcontextprotocol/sdk\" in deps); print(\"bin:\", bool(p.get(\"bin\"))); print(\"main:\", bool(p.get(\"main\")))" 2>/dev/null || echo "no package.json in cwd"