This document describes the monorepo CI testing architecture, designed to handle multiple services independently with full rollback safety.

CI Testing Architecture

Overview

This document describes the monorepo CI testing architecture, designed to handle multiple services independently with full rollback safety.

Architecture Diagram

┌───────────────────────────────────────────────────────────────┐
│                     GitHub Actions CI                          │
├───────────────────────────────────────────────────────────────┤
│                                                                │
│  ┌──────────────┐                                             │
│  │  Lint (Root) │  ← Runs once for entire monorepo           │
│  └──────────────┘                                             │
│                                                                │
│  ┌────────────────────────────────────────────────────────┐  │
│  │   Test Standalone (Matrix - No Dependencies)           │  │
│  ├────────────────────────────────────────────────────────┤  │
│  │                                                          │  │
│  │  ┌─────────┐  ┌──────────┐  ┌───────┐  ┌────────┐    │  │
│  │  │ Catalog │  │ User │  │ Media │  │ Search │    │  │
│  │  │  test   │  │   test   │  │  test │  │test:ci │    │  │
│  │  └─────────┘  └──────────┘  └───────┘  └────────┘    │  │
│  │                                                          │  │
│  │  ✅ No service containers - fast startup                │  │
│  └────────────────────────────────────────────────────────┘  │
│                                                                │
│  ┌────────────────────────────────────────────────────────┐  │
│  │   Test with MongoDB (Matrix)                           │  │
│  ├────────────────────────────────────────────────────────┤  │
│  │                                                          │  │
│  │  ┌─────────────────┐                                    │  │
│  │  │    Payment      │                                    │  │
│  │  │     test        │                                    │  │
│  │  │  +MongoDB:7     │                                    │  │
│  │  └─────────────────┘                                    │  │
│  │                                                          │  │
│  │  ✅ Only 1 MongoDB container (not 5)                    │  │
│  └────────────────────────────────────────────────────────┘  │
│                                                                │
│  ┌────────────────────────────────────────────────────────┐  │
│  │   Future: Test with Elasticsearch (When Needed)        │  │
│  │   For services that require Elasticsearch              │  │
│  └────────────────────────────────────────────────────────┘  │
│                                                                │
│                                                               │
│                                                                │
└───────────────────────────────────────────────────────────────┘

Design Principles

1. Job-Level Isolation

Services are grouped into separate jobs based on their infrastructure dependencies:

test-standalone: Services with no external dependencies (catalog, user, media, search)
test-with-mongodb: Services requiring MongoDB (payment)
Future: test-with-elasticsearch when needed

Why separate jobs?

GitHub Actions service containers are job-level, not matrix-level
Each matrix iteration would get ALL service containers, regardless of conditional logic
With 5 services and 2 containers, that's 10 containers (wasteful)
Separate jobs = only start containers when needed

2. Working Directory Approach

The architecture uses working-directory in the test step to ensure Bun's test runner only scans that service's directory:

- name: Run tests
  working-directory: ${{ matrix.service.path }}
  run: ${{ matrix.service.test_command || 'bun test' }}

Why this works:

Bun's test runner scans from current working directory using **/*.test.ts glob
When CWD is services/catalog, it only finds services/catalog/**/*.test.ts
This bypasses Bun's limitation with --exclude flag

3. Resource Efficiency

Before (conditional services approach - doesn't work):

5 matrix services × 2 containers each = 10 containers started
All containers idle except for payment job
Wasted ~90% of container resources

After (separate jobs):

4 standalone services × 0 containers = 0
1 MongoDB service × 1 container = 1
Result: 90% reduction in container starts

4. Fail-Fast Disabled

strategy:
  fail-fast: false

This ensures all services are tested even if one fails, providing complete test coverage.

5. Custom Test Commands

Services can override the default bun test command:

- name: search
  test_command: bun run test:ci  # Excludes integration tests

6. Common Pitfall: Conditional Services Don't Work

⚠️ Incorrect approach (looks valid, but wastes resources):

services:
  mongodb:
    # ❌ This does NOT prevent container from starting
    options: ${{ matrix.service.needs_mongodb && '--health-cmd=...' || '' }}

Why it fails:

options only passes Docker CLI flags
Container ALWAYS starts regardless of options value
Empty options = container starts without health checks (still wasteful)

Correct approach: Use separate jobs (as implemented)

Service Configuration

Job: `test-standalone`

Services with no external dependencies (fast startup, no containers):

Service	Path	Test Command	Notes
catalog	`services/catalog`	`bun test`	Example service
user	`services/user`	`bun test`	Example service
media	`services/media`	`bun test`	Example service
search	`services/search`	`bun run test:ci`	Excludes integration tests

Job: `test-with-mongodb`

Services requiring MongoDB:

Service	Path	Test Command	Container
payment	`services/payment`	`bun test`	MongoDB 7

Adding a New Service

Step 1: Determine Dependencies

Does your service need external infrastructure?

No dependencies → Add to test-standalone job
Needs MongoDB → Add to test-with-mongodb job
Needs Elasticsearch → Create test-with-elasticsearch job (see template below)
Needs multiple → Create dedicated job

Step 2: Update Workflow

Example: Service with no dependencies

Add to test-standalone matrix in .github/workflows/ci.yml:

test-standalone:
  name: Test - ${{ matrix.service.name }}
  strategy:
    matrix:
      service:
        # ... existing services ...

        - name: your-new-service
          path: services/your-new-service
          # Optional: custom test command
          test_command: bun run test:custom

Example: Service needing MongoDB

Add to test-with-mongodb matrix:

test-with-mongodb:
  name: Test - ${{ matrix.service.name }}
  strategy:
    matrix:
      service:
        # ... existing services ...

        - name: your-new-service
          path: services/your-new-service

Example: Service needing Elasticsearch (create new job)

test-with-elasticsearch:
  name: Test - ${{ matrix.service.name }}
  runs-on: ubuntu-latest
  strategy:
    fail-fast: false
    matrix:
      service:
        - name: your-new-service
          path: services/your-new-service

  services:
    elasticsearch:
      image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
      env:
        discovery.type: single-node
        xpack.security.enabled: false
      ports:
        - 9200:9200
      options: --health-cmd="curl -f http://localhost:9200/_cluster/health" --health-interval=10s

  steps:
    - uses: actions/checkout@v4
    - uses: oven-sh/setup-bun@v2
    - run: bun install --frozen-lockfile

    - name: Run tests
      working-directory: ${{ matrix.service.path }}
      run: ${{ matrix.service.test_command || 'bun test' }}
      env:
        ELASTICSEARCH_URL: http://localhost:9200

Step 3: Add Environment Variables (if needed)

If your service needs specific env vars, add them to the env: section:

env:
  YOUR_VAR: ${{ matrix.service.name == 'your-new-service' && 'value' || '' }}

Step 4: Ensure Service Has Test Script

In services/your-new-service/package.json:

{
  "scripts": {
    "test": "bun test"
  }
}

Step 5: Test Locally

cd services/your-new-service
bun test

Rollback Procedures

Scenario A: CI Changes Break Tests (Before Merge)

If tests fail on your PR:

Fix the workflow:

vim .github/workflows/ci.yml
git add .github/workflows/ci.yml
git commit -m "fix(ci): correct service configuration"
git push

PR updates automatically, CI re-runs
Main branch is unaffected - no rollback needed

Scenario B: CI Changes Merged and Breaking Main

If CI breaks after merging to main:

Option 1: Quick Revert (Recommended)

# Find the commit that broke CI
git log --oneline -10

# Revert the commit
git revert <commit-sha>
git push origin main

Option 2: Restore from Backup

# Use the backup file committed on 2025-11-11
git checkout main
cp .github/workflows/ci.yml.backup-20251111 .github/workflows/ci.yml
git add .github/workflows/ci.yml
git commit -m "revert(ci): restore working CI configuration from backup"
git push origin main

Scenario C: Emergency Rollback (Nuclear Option)

If git history is complex:

# Force reset main to last known good commit
git checkout main
git reset --hard <last-good-commit-sha>
git push --force origin main  # ⚠️ USE WITH CAUTION

⚠️ Warning: Only use force push if:

No one else is working on main
You have team coordination
No better option exists

Testing the Rollback

Before deploying changes, verify rollback procedure works:

# Save current state
cp .github/workflows/ci.yml /tmp/new-ci.yml

# Test rollback to main version
git checkout main -- .github/workflows/ci.yml
git diff  # Should show revert to main version

# Restore new version
cp /tmp/new-ci.yml .github/workflows/ci.yml

Debugging CI Failures

Check Specific Service Job

Go to GitHub Actions run
Click on failed service in matrix (e.g., "Test - payment")
Expand "Run tests" step
Check error messages

Run Locally

cd services/<service-name>
bun test

Check Environment Variables

Verify service has required env vars in workflow:

env:
  MONGODB_URI: ${{ matrix.service.needs_mongodb && 'mongodb://localhost:27017/crop-test' || '' }}

Common Issues

Issue: Service not found

Check path in matrix matches actual directory
Verify service has package.json with test script

Issue: Tests pass locally but fail in CI

Check environment variables
Verify dependencies are installed (bun install --frozen-lockfile)
Check if service needs MongoDB/Elasticsearch

Issue: Wrong tests running

Verify working-directory is set correctly
Check test_command matches package.json script
Ensure service uses correct test script

Performance Characteristics

Current Setup (Matrix Strategy)

Lint job: ~1-2 minutes
Test jobs (parallel): ~1-3 minutes each
Total CI time: ~3-4 minutes (limited by slowest service)

Benefits

Parallel execution: All services test simultaneously
Fast feedback: Failures visible within minutes
Isolated failures: One service failure doesn't block others
Clear attribution: Easy to see which service failed

Migration History

Previous Approaches (Failed)

Attempt 1: bun test --exclude='**/service/**'
- Bun ignores --exclude flag
Attempt 2: bun test services/catalog services/user ...
- ❌ Failed: Bun ignores explicit paths
- Still scanned entire monorepo

Current Approach (Successful)

Matrix strategy with working-directory

✅ Works: Bun only scans from CWD
✅ Reliable: Each service isolated
✅ Maintainable: Clear service list in matrix

Future Improvements

Potential Enhancements

Cache optimization: Cache node_modules per service
Path-based triggers: Only test changed services
Service-specific workflows: Migrate to separate workflow files
Test coverage tracking: Integrate coverage reporting
Performance monitoring: Track test execution times

Separate Workflows Migration

If matrix becomes too complex, consider migrating to per-service workflows:

.github/workflows/
  catalog-ci.yml    # triggers: services/catalog/**
  user-ci.yml       # triggers: services/user/**
  payment-ci.yml    # triggers: services/payment/**

Benefits:

Complete isolation
Can deploy services independently
Clear ownership per service

Migration path:

Create new workflow for one service
Remove from matrix
Test for a week
Repeat for other services

Verification Checklist

Before merging CI changes:

All services have tests that pass locally
Matrix strategy runs all services in parallel
Failed service doesn't block others (fail-fast: false)
Each service has correct working directory
Environment variables are service-specific
Documentation is updated
Rollback procedure is documented and tested
Backup configuration is committed
PR has successful CI run

Monitoring

GitHub Actions Status Checks

Set up branch protection rules:

Go to repo Settings → Branches → main
Add rule: "Require status checks to pass before merging"
Select: lint and test jobs
Enable: "Require branches to be up to date before merging"

Metrics to Track

Total CI duration (target: < 5 minutes)
Service-specific test times
Failure rates per service
Flaky test incidents

Support

Questions?

Check this documentation first
Review backup file: .github/workflows/ci.yml.backup-20251111
Test locally: cd services/<service> && bun test
Check GitHub Actions logs for detailed errors

Need to Rollback?

Use backup file (fastest)
Revert commit (cleanest)
Reset to last good commit (last resort)

PRODUCTION_URLS.md - Production service URLs
services/search/CLAUDE.md - Search service development guide
services/search/docs/TESTCONTAINERS_BUN_COMPATIBILITY.md - Integration testing guide

CI Testing Architecture

On this page