Recipe-Inhalt ist auf Englisch. Englisches Original lesen →
← Alle Recipes
Phase 8 · Deploy MCP·7 steps

Blue-green deploys, zero-downtime container swaps with nginx

Two containers behind one nginx upstream, atomic switch via reload, parallel run for soak time, kill the old. Production deploys without dropping a single request.

7 steps0%
Du liest ohne Account. Mit Login speichern wir Step-Fortschritt + Notes.

Blue-green deploys, zero-downtime container swaps with nginx

Default docker compose up -d --force-recreate has 5-30 seconds of downtime, old container stops, new container starts, requests in-between fail. For a low-traffic SaaS that's fine. For anything customer-facing, blue-green is the upgrade: two containers run in parallel, nginx switches atomically, you kill the old one only after you're sure the new one is healthy.

Schritt 1: When you actually need this

Honest assessment first. You don't need blue-green if:

  • You have < 100 active users at peak.
  • A 30-second blip during deploys is acceptable.
  • Most deploys are during off-peak hours (3am).

You do need it if:

  • Customer-facing SaaS with paying users.
  • You deploy during business hours.
  • One bad deploy could lose data mid-request.
  • You want to soak-test the new version against real traffic before committing.

For low-traffic content-only servers (memory, GEO-style audit, tutorial servers) the default we run is single-container --force-recreate, the simplicity wins. For SaaS that handles billing, real-time chat, or any state where dropping a single request costs money, blue-green pays for itself the first deploy.

Schritt 2: docker-compose.yml with two services

# docker-compose.yml
services:
  my-mcp-blue:
    image: my-mcp:${BLUE_TAG:-latest}
    container_name: my-mcp-blue
    restart: unless-stopped
    network_mode: host
    env_file: .env
    environment:
      - PORT=3001                  # blue on 3001
      - HOST=127.0.0.1
    healthcheck:
      test: ["CMD-SHELL", "node -e \"fetch('http://localhost:3001/health').then(r => process.exit(r.ok ? 0 : 1))\""]
      interval: 10s
      timeout: 3s
      retries: 3
      start_period: 15s

  my-mcp-green:
    image: my-mcp:${GREEN_TAG:-latest}
    container_name: my-mcp-green
    restart: unless-stopped
    network_mode: host
    env_file: .env
    environment:
      - PORT=3002                  # green on 3002
      - HOST=127.0.0.1
    healthcheck:
      test: ["CMD-SHELL", "node -e \"fetch('http://localhost:3002/health').then(r => process.exit(r.ok ? 0 : 1))\""]
      interval: 10s
      timeout: 3s
      retries: 3
      start_period: 15s

Two containers, two ports, two image tags. Both can run simultaneously.

Schritt 3: nginx upstream + atomic switch

# /etc/nginx/sites-available/my-mcp.conf

# This file is the "active" one, symlinked from sites-enabled.
upstream my_mcp_backend {
    server 127.0.0.1:3001;        # blue (active)
    # server 127.0.0.1:3002;      # green (commented out)
}

server {
    listen 443 ssl http2;
    server_name your-mcp.io;
    # ... ssl + headers ...

    location / {
        proxy_pass         http://my_mcp_backend;
        proxy_http_version 1.1;
        proxy_set_header   Host $host;
        proxy_read_timeout 3600s;
        proxy_buffering    off;
    }
}

To switch from blue to green: edit the upstream block, swap the comment, nginx -t && systemctl reload nginx. Reload is atomic, nginx finishes in-flight requests on blue, sends new requests to green. Zero dropped connections.

Schritt 4: The deploy script

#!/bin/bash
# bg-deploy.sh, Blue-green deploy
set -euo pipefail

NEW_TAG="$1"  # e.g. v1.2.3 or commit SHA
REMOTE=ai-server
REMOTE_DIR=/opt/my-mcp
NGINX_CONF=/etc/nginx/sites-enabled/my-mcp.conf

if [ -z "$NEW_TAG" ]; then
  echo "Usage: $0 <new-tag>"; exit 1
fi

ssh "$REMOTE" "set -euo pipefail; cd $REMOTE_DIR
  # 1. Find which color is currently active
  ACTIVE=\$(grep -E 'server 127.0.0.1:300[12];' $NGINX_CONF | head -1 | grep -oE '300[12]')
  if [ \"\$ACTIVE\" = '3001' ]; then ACTIVE_COLOR=blue; INACTIVE_COLOR=green; INACTIVE_PORT=3002
  else ACTIVE_COLOR=green; INACTIVE_COLOR=blue; INACTIVE_PORT=3001; fi
  echo \"Active: \$ACTIVE_COLOR. Deploying $NEW_TAG to \$INACTIVE_COLOR.\"

  # 2. Pull / build the new image, deploy to the inactive slot
  if [ \"\$INACTIVE_COLOR\" = 'green' ]; then
    GREEN_TAG=$NEW_TAG docker compose up -d --force-recreate my-mcp-green
  else
    BLUE_TAG=$NEW_TAG docker compose up -d --force-recreate my-mcp-blue
  fi

  # 3. Wait for the inactive container to be healthy
  for i in \$(seq 1 30); do
    STATUS=\$(docker inspect my-mcp-\$INACTIVE_COLOR --format '{{.State.Health.Status}}')
    [ \"\$STATUS\" = 'healthy' ] && break
    sleep 2
  done
  if [ \"\$STATUS\" != 'healthy' ]; then
    echo \"FAIL: \$INACTIVE_COLOR did not become healthy. Aborting.\"
    exit 1
  fi

  # 4. Smoke-test the inactive port directly
  curl -fsS http://localhost:\$INACTIVE_PORT/health > /dev/null

  # 5. Atomically switch nginx upstream
  sed -i \"s|server 127.0.0.1:\$ACTIVE.*;|# server 127.0.0.1:\$ACTIVE; (was active)|\" $NGINX_CONF
  sed -i \"s|# server 127.0.0.1:\$INACTIVE_PORT.*|server 127.0.0.1:\$INACTIVE_PORT;|\" $NGINX_CONF
  nginx -t
  systemctl reload nginx
  echo 'Switched to '\$INACTIVE_COLOR'. Active deploys finishing on '\$ACTIVE_COLOR' for soak.'
"

# 6. Soak, let the old one keep handling in-flight requests for 5 minutes
echo "Soaking for 5 minutes, monitor https://your-mcp.io/health and external user behavior."
sleep 300

# 7. (Optional) Stop the old container
read -p "Stop the old container? [y/N] " ans
if [ "$ans" = "y" ]; then
  ssh "$REMOTE" "docker stop my-mcp-\$ACTIVE_COLOR || true"
fi

What it does:

  1. SSH in (one connection, see 8.2 about SSH batching).
  2. Detect which color is active via the nginx config.
  3. Deploy new tag to the inactive color.
  4. Wait up to 60s for the new container to be healthy.
  5. Smoke-test the new port directly.
  6. Atomically swap nginx upstream + reload.
  7. Soak, old container keeps handling in-flight requests, new container takes new traffic.
  8. After 5 minutes (configurable), prompt to stop the old one.

Schritt 5: Rollback is also one command

If the new color misbehaves during soak:

# bg-rollback.sh
ssh "$REMOTE" "cd $REMOTE_DIR
  # Swap nginx upstream back
  sed -i 's|^server 127.0.0.1:300[12];|# &|' $NGINX_CONF
  sed -i 's|^# server 127.0.0.1:300[12]; (was active)|server 127.0.0.1:&;|' $NGINX_CONF
  nginx -t && systemctl reload nginx
"

Rollback is just nginx reload, 50ms. Fast because the old container is still running.

Schritt 6: When to actually kill the old

After at least 5 minutes of clean operation on the new color, with:

  • No 5xx spike in nginx access logs.
  • No errors logged by the app.
  • No customer reports.
  • External monitor (8.4 Uptime Kuma) green for the full 5 minutes.

Then docker stop the old. The slot is free for the next deploy.

Schritt 7: Verify

Run academy_validate_step. The validator confirms package.json plumbing.

For the actual blue-green setup:

# 1. Both containers can run simultaneously
docker ps --filter name=my-mcp
# → my-mcp-blue   Up X (healthy)   3001
# → my-mcp-green  Up Y (healthy)   3002

# 2. Both /health endpoints reachable directly
curl -s http://localhost:3001/health | jq .version
curl -s http://localhost:3002/health | jq .version
# → different versions during a deploy

# 3. Public URL only sees one
curl -s https://your-mcp.io/health | jq .version
# → matches whichever color is active

Common traps

  • Same container_name for both colors. Docker can't run two containers with the same name. Use -blue / -green suffix.
  • Same PORT in env, both bind to 3000, second container fails. Hard-code different ports per service.
  • Hard switch without soak, kills in-flight requests. Always reload nginx (graceful) and let the old one drain.
  • Forgetting to update both BLUE_TAG and GREEN_TAG defaults, first deploy uses latest for both, can't tell them apart.
  • No healthcheck on either color, script can't tell when the new one is ready.
  • Deploying via restart instead of up -d --force-recreate, env_file changes don't take effect.
  • Running the deploy script from cron, interactive read -p will hang. Add a --non-interactive flag with auto-stop after a fixed soak time.

What good looks like

Two containers run side-by-side. Deploy is one command (./bg-deploy.sh v1.2.3). 60s for new container to be healthy + 5min soak before old is killed = ~7 minutes total, but zero downtime. Rollback is also one command. nginx reload is atomic.

For low-stakes SaaS, single-container with --force-recreate is fine, keep it simple. For revenue-impacting deploys, blue-green pays for itself the first time you avoid an outage.

Client-Check · auf Deinem Rechner ausführen
cat package.json 2>/dev/null | python3 -c "import json,sys; p=json.load(sys.stdin); deps=list((p.get(\"dependencies\") or {}).keys()); print(\"sdk:\", \"@modelcontextprotocol/sdk\" in deps); print(\"bin:\", bool(p.get(\"bin\"))); print(\"main:\", bool(p.get(\"main\")))" 2>/dev/null || echo "no package.json in cwd"
Erwartet: sdk: True, plus either bin or main is True.
Falls hängen geblieben: Run `npm init -y && npm install @modelcontextprotocol/sdk zod`, then add `"bin": { "your-server": "./dist/server.js" }` to package.json.
/health endpoint, liveness vs Publish to npm, bin field, fil