CROP
ProjectsParts ServicesPayment

Payment Service Deployment Runbook

Version: 1.0 Last Updated: 2025-11-13 Service: payment-service Maintainer: Backend Team (@vova-appdev)

Payment Service Deployment Runbook

Version: 1.0 Last Updated: 2025-11-13 Service: payment-service Maintainer: Backend Team (@vova-appdev)


Table of Contents

  1. Quick Reference
  2. Pre-Deployment Checklist
  3. Staging Deployment
  4. Production Deployment
  5. Post-Deployment Verification
  6. Rollback Procedures
  7. Troubleshooting
  8. Emergency Contacts

Quick Reference

Service URLs

Key Repositories

Deployment Methods

  • Staging: Automatic on push to main
  • Production: Manual via GitHub Actions workflow_dispatch

Pre-Deployment Checklist

Code Readiness

  • All tests passing (bun test)
  • Linting checks pass (bun run biome check)
  • Type checking passes (bun run tsc --noEmit)
  • Security scan clean (Trivy)
  • PR approved and merged to main

Infrastructure Readiness

  • GCP Artifact Registry crop-services exists
  • Cloud Run service exists (or create with first deploy)
  • MongoDB connection string valid
  • All secrets configured in Secret Manager:
    • MONGODB_URI_STAGING / MONGODB_URI_PROD
    • STRIPE_SECRET_KEY_STAGING / STRIPE_SECRET_KEY_PROD
    • STRIPE_WEBHOOK_SECRET_STAGING / STRIPE_WEBHOOK_SECRET_PROD
    • CLERK_WEBHOOK_SECRET_STAGING / CLERK_WEBHOOK_SECRET_PROD
  • WIF (Workload Identity Federation) configured for GitHub Actions

Clerk Configuration

  • Webhook endpoint URL updated in Clerk Dashboard
  • JWT template api configured
  • Satellite domains verified
  • Test webhook successful

Staging Deployment

Automatic Deployment

Staging automatically deploys when changes are pushed to main branch.

Trigger:

git push origin main

Monitor:

  1. Go to GitHub Actions: https://github.com/your-org/microservices/actions
  2. Find workflow: "Payment Service Deploy"
  3. Watch build-and-push and deploy-staging jobs

Expected Duration: 5-8 minutes

Manual Staging Deployment

If automatic deployment fails or you need to redeploy:

  1. Go to GitHub Actions → "Payment Service Deploy"
  2. Click "Run workflow"
  3. Select:
    • Branch: main
    • Environment: staging
  4. Click "Run workflow"

Verify Staging Deployment

# Health check
curl https://payment-service-staging.crop.com/health

# Expected response:
{
  "status": "healthy",
  "service": "payment",
  "mongodb": "connected",
  "env": "ok"
}

# Test Stripe webhook (from Clerk Dashboard)
# 1. Go to Webhooks → Test Event
# 2. Select: user.created
# 3. Send
# 4. Verify 200 OK response

# Check logs
gcloud run services logs read payment-service-staging \
  --region=us-central1 \
  --limit=50

Production Deployment

Pre-Production Checklist

  • Staging deployment successful for at least 24 hours
  • No critical bugs reported in staging
  • All stakeholders notified
  • Maintenance window scheduled (if required)
  • Rollback plan ready

Deploy to Production

  1. Navigate to GitHub Actions

  2. Run Workflow

    • Click "Run workflow" (top right)
    • Select:
      • Branch: main
      • Environment: production
    • Click "Run workflow"
  3. Approve Deployment (if required)

    • GitHub will pause before production deploy
    • Review deployment details
    • Click "Approve" to proceed
  4. Monitor Deployment

    • Watch workflow progress
    • Expected duration: 8-12 minutes
    • Check for any errors in logs

Production Deployment Steps (Manual Alternative)

If GitHub Actions is unavailable, use gcloud CLI:

# Authenticate
gcloud auth login

# Set project
gcloud config set project crop-platform

# Deploy
gcloud run deploy payment-service \
  --image=us-central1-docker.pkg.dev/crop-platform/crop-services/payment-service:COMMIT_SHA \
  --region=us-central1 \
  --platform=managed \
  --allow-unauthenticated \
  --set-env-vars="NODE_ENV=production" \
  --set-secrets="MONGODB_URI=MONGODB_URI_PROD:latest,STRIPE_SECRET_KEY=STRIPE_SECRET_KEY_PROD:latest,STRIPE_WEBHOOK_SECRET=STRIPE_WEBHOOK_SECRET_PROD:latest,CLERK_WEBHOOK_SECRET=CLERK_WEBHOOK_SECRET_PROD:latest" \
  --min-instances=1 \
  --max-instances=100 \
  --memory=1Gi \
  --cpu=2 \
  --timeout=60s \
  --concurrency=80

# Verify deployment
curl https://api.crop.com/health

Post-Deployment Verification

Automated Checks

The deployment workflow automatically runs:

  • Health check (/health endpoint)
  • Readiness check (/ready endpoint)

Manual Verification

1. Health & Readiness

# Health check
curl https://api.crop.com/health
# Expected: {"status":"healthy","mongodb":"connected","env":"ok"}

# Readiness check
curl https://api.crop.com/ready
# Expected: {"status":"ready","service":"payment"}

2. MongoDB Connection

# Check logs for MongoDB connection
gcloud run services logs read payment-service \
  --region=us-central1 \
  --limit=10 | grep "MongoDB"

# Should show: "MongoDB connected successfully"

3. Stripe Webhook

  1. Create test payment intent in Stripe Dashboard
  2. Verify webhook received (check logs)
  3. Verify payment intent saved to MongoDB
# Check webhook logs
gcloud run services logs read payment-service \
  --region=us-central1 \
  --limit=50 | grep "Stripe webhook"

4. Clerk Webhook

  1. Go to Clerk Dashboard → Webhooks
  2. Send test event: user.created
  3. Verify 200 OK response
  4. Check MongoDB for user record
# Check Clerk webhook logs
gcloud run services logs read payment-service \
  --region=us-central1 \
  --limit=50 | grep "Clerk webhook"

5. API Endpoints

# Test checkout endpoint (requires auth token)
curl -X POST https://api.crop.com/api/checkout/sessions \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "amount": 5000,
    "currency": "usd",
    "successUrl": "https://app.crop.com/success",
    "cancelUrl": "https://app.crop.com/cancel"
  }'

6. Monitoring Dashboards

7. Error Rate Check

# Check for 5xx errors in last hour
gcloud run services logs read payment-service \
  --region=us-central1 \
  --format="value(jsonPayload.httpRequest.status)" \
  --filter="timestamp >= \"$(date -u -d '1 hour ago' '+%Y-%m-%dT%H:%M:%SZ')\"" | \
  grep "^5" | wc -l

# Should be 0 or very low

Rollback Procedures

Scenario 1: Recent Deployment (< 1 hour)

Use Cloud Run Revisions (fastest method)

# List recent revisions
gcloud run revisions list \
  --service=payment-service \
  --region=us-central1 \
  --limit=5

# Rollback to previous revision
gcloud run services update-traffic payment-service \
  --region=us-central1 \
  --to-revisions=PREVIOUS_REVISION=100

# Example:
gcloud run services update-traffic payment-service \
  --region=us-central1 \
  --to-revisions=payment-service-00042=100

Scenario 2: Rollback to Specific Version

Redeploy Previous Image

# Find previous working image
gcloud artifacts docker images list \
  us-central1-docker.pkg.dev/crop-platform/crop-services/payment-service \
  --limit=10

# Redeploy specific image
gcloud run deploy payment-service \
  --image=us-central1-docker.pkg.dev/crop-platform/crop-services/payment-service:PREVIOUS_SHA \
  --region=us-central1

Scenario 3: Critical Failure (Emergency)

Scale to Zero (Immediate)

# Stop all traffic immediately
gcloud run services update payment-service \
  --region=us-central1 \
  --min-instances=0 \
  --max-instances=0

# Investigate issue
# Deploy fix
# Restore scaling:
gcloud run services update payment-service \
  --region=us-central1 \
  --min-instances=1 \
  --max-instances=100

Scenario 4: Rollback via GitHub Actions

  1. Find previous working commit SHA in git history
  2. Create new commit reverting changes:
    git revert COMMIT_SHA
    git push origin main
  3. Automatic staging deployment will trigger
  4. Manually deploy to production via workflow_dispatch

Post-Rollback Verification

After rollback, verify:

  • Health check returns 200 OK
  • Error rate returned to baseline
  • MongoDB connection stable
  • Webhooks functioning
  • No customer impact reported

Troubleshooting

Deployment Fails with "Image not found"

Cause: Docker image push failed or wrong image reference

Fix:

  1. Check build-and-push job logs in GitHub Actions
  2. Verify image exists:
    gcloud artifacts docker images list \
      us-central1-docker.pkg.dev/crop-platform/crop-services/payment-service
  3. Manually rebuild and push if needed

Health Check Returns 503

Cause: MongoDB connection failed or environment variables missing

Fix:

  1. Check logs:
    gcloud run services logs read payment-service --region=us-central1 --limit=50
  2. Verify environment variables:
    gcloud run services describe payment-service --region=us-central1 --format="yaml(spec.template.spec.containers[0].env)"
  3. Verify secrets exist:
    gcloud secrets versions access latest --secret="MONGODB_URI_PROD"

Webhook Returns 401 Unauthorized

Cause: Invalid webhook secret

Fix:

  1. Verify CLERK_WEBHOOK_SECRET or STRIPE_WEBHOOK_SECRET in Secret Manager
  2. Check Clerk/Stripe Dashboard for correct signing secret
  3. Update secret and redeploy:
    echo "whsec_new_secret" | gcloud secrets versions add CLERK_WEBHOOK_SECRET_PROD --data-file=-
    gcloud run services update payment-service --region=us-central1

High Error Rate After Deployment

Immediate Actions:

  1. Check Cloud Monitoring for error types
  2. Review logs for common error messages
  3. If widespread: Rollback immediately (see Rollback Procedures)
  4. If isolated: Investigate specific errors

Investigation:

# Check error types
gcloud run services logs read payment-service \
  --region=us-central1 \
  --format="value(jsonPayload.message)" \
  --filter="severity=ERROR AND timestamp >= \"$(date -u -d '1 hour ago' '+%Y-%m-%dT%H:%M:%SZ')\"" | \
  head -20

MongoDB Connection Timeout

Cause: Network issue or MongoDB Atlas firewall

Fix:

  1. Check MongoDB Atlas network access settings
  2. Verify Cloud Run egress IP allowlisted
  3. Test connection from Cloud Shell:
    mongosh "mongodb+srv://cluster.mongodb.net/payment-service" --username <user>

Deployment Stuck in "Deploying" State

Cause: Cloud Run waiting for readiness probe

Fix:

  1. Check service logs for startup errors
  2. Verify /ready endpoint responds quickly
  3. Cancel deployment and investigate:
    gcloud run deploy operations list --region=us-central1
    gcloud run deploy operations cancel OPERATION_ID --region=us-central1

Emergency Contacts

Escalation Path

  1. On-Call Engineer: Check PagerDuty rotation
  2. Backend Team Lead: @vova-appdev
  3. DevOps Team: devops@crop.com
  4. CTO: cto@crop.com

External Services

Incident Response

For production incidents:

  1. Page on-call engineer via PagerDuty
  2. Create incident in incident management tool
  3. Update status page: status.crop.com
  4. Join incident bridge: Check incident channel
  5. Follow incident runbook: See INCIDENT_RESPONSE.md

Appendix

Useful Commands

# View current Cloud Run configuration
gcloud run services describe payment-service --region=us-central1

# Stream logs in real-time
gcloud run services logs tail payment-service --region=us-central1

# Check service metrics
gcloud monitoring time-series list \
  --filter='metric.type="run.googleapis.com/request_count"'

# List all revisions with traffic split
gcloud run services describe payment-service --region=us-central1 --format="yaml(status.traffic)"

# Update environment variable
gcloud run services update payment-service \
  --region=us-central1 \
  --set-env-vars="NEW_VAR=value"

# Update secret reference
gcloud run services update payment-service \
  --region=us-central1 \
  --update-secrets="MONGODB_URI=MONGODB_URI_PROD:latest"

Git Tags for Rollback

All phase completions are tagged:

  • phase-1-complete - Initial code implementation
  • phases-2-4-complete - Tests, CI/CD, documentation

To rollback to a tag:

git checkout phase-1-complete
git push origin HEAD:rollback-branch
# Deploy rollback-branch via GitHub Actions

Document Version: 1.0 Last Review: 2025-11-13 Next Review: 2025-12-13

On this page