Payment Service Deployment Runbook
Version: 1.0 Last Updated: 2025-11-13 Service: payment-service Maintainer: Backend Team (@vova-appdev)
Payment Service Deployment Runbook
Version: 1.0 Last Updated: 2025-11-13 Service: payment-service Maintainer: Backend Team (@vova-appdev)
Table of Contents
- Quick Reference
- Pre-Deployment Checklist
- Staging Deployment
- Production Deployment
- Post-Deployment Verification
- Rollback Procedures
- Troubleshooting
- Emergency Contacts
Quick Reference
Service URLs
- Staging: https://payment-service-staging.crop.com
- Production: https://api.crop.com
Key Repositories
- Code: https://github.com/your-org/microservices
- Images: us-central1-docker.pkg.dev/crop-platform/crop-services/payment-service
Deployment Methods
- Staging: Automatic on push to
main - Production: Manual via GitHub Actions workflow_dispatch
Pre-Deployment Checklist
Code Readiness
- All tests passing (
bun test) - Linting checks pass (
bun run biome check) - Type checking passes (
bun run tsc --noEmit) - Security scan clean (Trivy)
- PR approved and merged to
main
Infrastructure Readiness
- GCP Artifact Registry
crop-servicesexists - Cloud Run service exists (or create with first deploy)
- MongoDB connection string valid
- All secrets configured in Secret Manager:
MONGODB_URI_STAGING/MONGODB_URI_PRODSTRIPE_SECRET_KEY_STAGING/STRIPE_SECRET_KEY_PRODSTRIPE_WEBHOOK_SECRET_STAGING/STRIPE_WEBHOOK_SECRET_PRODCLERK_WEBHOOK_SECRET_STAGING/CLERK_WEBHOOK_SECRET_PROD
- WIF (Workload Identity Federation) configured for GitHub Actions
Clerk Configuration
- Webhook endpoint URL updated in Clerk Dashboard
- JWT template
apiconfigured - Satellite domains verified
- Test webhook successful
Staging Deployment
Automatic Deployment
Staging automatically deploys when changes are pushed to main branch.
Trigger:
git push origin mainMonitor:
- Go to GitHub Actions: https://github.com/your-org/microservices/actions
- Find workflow: "Payment Service Deploy"
- Watch build-and-push and deploy-staging jobs
Expected Duration: 5-8 minutes
Manual Staging Deployment
If automatic deployment fails or you need to redeploy:
- Go to GitHub Actions → "Payment Service Deploy"
- Click "Run workflow"
- Select:
- Branch:
main - Environment:
staging
- Branch:
- Click "Run workflow"
Verify Staging Deployment
# Health check
curl https://payment-service-staging.crop.com/health
# Expected response:
{
"status": "healthy",
"service": "payment",
"mongodb": "connected",
"env": "ok"
}
# Test Stripe webhook (from Clerk Dashboard)
# 1. Go to Webhooks → Test Event
# 2. Select: user.created
# 3. Send
# 4. Verify 200 OK response
# Check logs
gcloud run services logs read payment-service-staging \
--region=us-central1 \
--limit=50Production Deployment
Pre-Production Checklist
- Staging deployment successful for at least 24 hours
- No critical bugs reported in staging
- All stakeholders notified
- Maintenance window scheduled (if required)
- Rollback plan ready
Deploy to Production
-
Navigate to GitHub Actions
- Go to: https://github.com/your-org/microservices/actions
- Select: "Payment Service Deploy"
-
Run Workflow
- Click "Run workflow" (top right)
- Select:
- Branch:
main - Environment:
production
- Branch:
- Click "Run workflow"
-
Approve Deployment (if required)
- GitHub will pause before production deploy
- Review deployment details
- Click "Approve" to proceed
-
Monitor Deployment
- Watch workflow progress
- Expected duration: 8-12 minutes
- Check for any errors in logs
Production Deployment Steps (Manual Alternative)
If GitHub Actions is unavailable, use gcloud CLI:
# Authenticate
gcloud auth login
# Set project
gcloud config set project crop-platform
# Deploy
gcloud run deploy payment-service \
--image=us-central1-docker.pkg.dev/crop-platform/crop-services/payment-service:COMMIT_SHA \
--region=us-central1 \
--platform=managed \
--allow-unauthenticated \
--set-env-vars="NODE_ENV=production" \
--set-secrets="MONGODB_URI=MONGODB_URI_PROD:latest,STRIPE_SECRET_KEY=STRIPE_SECRET_KEY_PROD:latest,STRIPE_WEBHOOK_SECRET=STRIPE_WEBHOOK_SECRET_PROD:latest,CLERK_WEBHOOK_SECRET=CLERK_WEBHOOK_SECRET_PROD:latest" \
--min-instances=1 \
--max-instances=100 \
--memory=1Gi \
--cpu=2 \
--timeout=60s \
--concurrency=80
# Verify deployment
curl https://api.crop.com/healthPost-Deployment Verification
Automated Checks
The deployment workflow automatically runs:
- Health check (
/healthendpoint) - Readiness check (
/readyendpoint)
Manual Verification
1. Health & Readiness
# Health check
curl https://api.crop.com/health
# Expected: {"status":"healthy","mongodb":"connected","env":"ok"}
# Readiness check
curl https://api.crop.com/ready
# Expected: {"status":"ready","service":"payment"}2. MongoDB Connection
# Check logs for MongoDB connection
gcloud run services logs read payment-service \
--region=us-central1 \
--limit=10 | grep "MongoDB"
# Should show: "MongoDB connected successfully"3. Stripe Webhook
- Create test payment intent in Stripe Dashboard
- Verify webhook received (check logs)
- Verify payment intent saved to MongoDB
# Check webhook logs
gcloud run services logs read payment-service \
--region=us-central1 \
--limit=50 | grep "Stripe webhook"4. Clerk Webhook
- Go to Clerk Dashboard → Webhooks
- Send test event:
user.created - Verify 200 OK response
- Check MongoDB for user record
# Check Clerk webhook logs
gcloud run services logs read payment-service \
--region=us-central1 \
--limit=50 | grep "Clerk webhook"5. API Endpoints
# Test checkout endpoint (requires auth token)
curl -X POST https://api.crop.com/api/checkout/sessions \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"amount": 5000,
"currency": "usd",
"successUrl": "https://app.crop.com/success",
"cancelUrl": "https://app.crop.com/cancel"
}'6. Monitoring Dashboards
- Cloud Monitoring: https://console.cloud.google.com/monitoring
- Cloud Run Metrics: Check request count, latency, error rate
- MongoDB Atlas: Check connection count, query performance
7. Error Rate Check
# Check for 5xx errors in last hour
gcloud run services logs read payment-service \
--region=us-central1 \
--format="value(jsonPayload.httpRequest.status)" \
--filter="timestamp >= \"$(date -u -d '1 hour ago' '+%Y-%m-%dT%H:%M:%SZ')\"" | \
grep "^5" | wc -l
# Should be 0 or very lowRollback Procedures
Scenario 1: Recent Deployment (< 1 hour)
Use Cloud Run Revisions (fastest method)
# List recent revisions
gcloud run revisions list \
--service=payment-service \
--region=us-central1 \
--limit=5
# Rollback to previous revision
gcloud run services update-traffic payment-service \
--region=us-central1 \
--to-revisions=PREVIOUS_REVISION=100
# Example:
gcloud run services update-traffic payment-service \
--region=us-central1 \
--to-revisions=payment-service-00042=100Scenario 2: Rollback to Specific Version
Redeploy Previous Image
# Find previous working image
gcloud artifacts docker images list \
us-central1-docker.pkg.dev/crop-platform/crop-services/payment-service \
--limit=10
# Redeploy specific image
gcloud run deploy payment-service \
--image=us-central1-docker.pkg.dev/crop-platform/crop-services/payment-service:PREVIOUS_SHA \
--region=us-central1Scenario 3: Critical Failure (Emergency)
Scale to Zero (Immediate)
# Stop all traffic immediately
gcloud run services update payment-service \
--region=us-central1 \
--min-instances=0 \
--max-instances=0
# Investigate issue
# Deploy fix
# Restore scaling:
gcloud run services update payment-service \
--region=us-central1 \
--min-instances=1 \
--max-instances=100Scenario 4: Rollback via GitHub Actions
- Find previous working commit SHA in git history
- Create new commit reverting changes:
git revert COMMIT_SHA git push origin main - Automatic staging deployment will trigger
- Manually deploy to production via workflow_dispatch
Post-Rollback Verification
After rollback, verify:
- Health check returns 200 OK
- Error rate returned to baseline
- MongoDB connection stable
- Webhooks functioning
- No customer impact reported
Troubleshooting
Deployment Fails with "Image not found"
Cause: Docker image push failed or wrong image reference
Fix:
- Check build-and-push job logs in GitHub Actions
- Verify image exists:
gcloud artifacts docker images list \ us-central1-docker.pkg.dev/crop-platform/crop-services/payment-service - Manually rebuild and push if needed
Health Check Returns 503
Cause: MongoDB connection failed or environment variables missing
Fix:
- Check logs:
gcloud run services logs read payment-service --region=us-central1 --limit=50 - Verify environment variables:
gcloud run services describe payment-service --region=us-central1 --format="yaml(spec.template.spec.containers[0].env)" - Verify secrets exist:
gcloud secrets versions access latest --secret="MONGODB_URI_PROD"
Webhook Returns 401 Unauthorized
Cause: Invalid webhook secret
Fix:
- Verify
CLERK_WEBHOOK_SECRETorSTRIPE_WEBHOOK_SECRETin Secret Manager - Check Clerk/Stripe Dashboard for correct signing secret
- Update secret and redeploy:
echo "whsec_new_secret" | gcloud secrets versions add CLERK_WEBHOOK_SECRET_PROD --data-file=- gcloud run services update payment-service --region=us-central1
High Error Rate After Deployment
Immediate Actions:
- Check Cloud Monitoring for error types
- Review logs for common error messages
- If widespread: Rollback immediately (see Rollback Procedures)
- If isolated: Investigate specific errors
Investigation:
# Check error types
gcloud run services logs read payment-service \
--region=us-central1 \
--format="value(jsonPayload.message)" \
--filter="severity=ERROR AND timestamp >= \"$(date -u -d '1 hour ago' '+%Y-%m-%dT%H:%M:%SZ')\"" | \
head -20MongoDB Connection Timeout
Cause: Network issue or MongoDB Atlas firewall
Fix:
- Check MongoDB Atlas network access settings
- Verify Cloud Run egress IP allowlisted
- Test connection from Cloud Shell:
mongosh "mongodb+srv://cluster.mongodb.net/payment-service" --username <user>
Deployment Stuck in "Deploying" State
Cause: Cloud Run waiting for readiness probe
Fix:
- Check service logs for startup errors
- Verify /ready endpoint responds quickly
- Cancel deployment and investigate:
gcloud run deploy operations list --region=us-central1 gcloud run deploy operations cancel OPERATION_ID --region=us-central1
Emergency Contacts
Escalation Path
- On-Call Engineer: Check PagerDuty rotation
- Backend Team Lead: @vova-appdev
- DevOps Team: devops@crop.com
- CTO: cto@crop.com
External Services
- Stripe Support: https://support.stripe.com
- Clerk Support: https://clerk.com/support
- MongoDB Atlas Support: https://support.mongodb.com
- Google Cloud Support: https://console.cloud.google.com/support
Incident Response
For production incidents:
- Page on-call engineer via PagerDuty
- Create incident in incident management tool
- Update status page: status.crop.com
- Join incident bridge: Check incident channel
- Follow incident runbook: See INCIDENT_RESPONSE.md
Appendix
Useful Commands
# View current Cloud Run configuration
gcloud run services describe payment-service --region=us-central1
# Stream logs in real-time
gcloud run services logs tail payment-service --region=us-central1
# Check service metrics
gcloud monitoring time-series list \
--filter='metric.type="run.googleapis.com/request_count"'
# List all revisions with traffic split
gcloud run services describe payment-service --region=us-central1 --format="yaml(status.traffic)"
# Update environment variable
gcloud run services update payment-service \
--region=us-central1 \
--set-env-vars="NEW_VAR=value"
# Update secret reference
gcloud run services update payment-service \
--region=us-central1 \
--update-secrets="MONGODB_URI=MONGODB_URI_PROD:latest"Git Tags for Rollback
All phase completions are tagged:
phase-1-complete- Initial code implementationphases-2-4-complete- Tests, CI/CD, documentation
To rollback to a tag:
git checkout phase-1-complete
git push origin HEAD:rollback-branch
# Deploy rollback-branch via GitHub ActionsDocument Version: 1.0 Last Review: 2025-11-13 Next Review: 2025-12-13