CROP
ProjectsParts Services

Data Flows

This document describes how data flows through the CROP parts catalog system, from external APIs to the end user.

Data Flows

This document describes how data flows through the CROP parts catalog system, from external APIs to the end user.

Overview

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│  External APIs  │───▶│    MongoDB      │───▶│  Elasticsearch  │───▶│    Website      │
│  (DIS, K&M)     │    │   (crop_dev)    │    │  (parts_current)│    │ (Next.js SSR)   │
└─────────────────┘    └─────────────────┘    └─────────────────┘    └─────────────────┘
        │                      │                      │                      │
   Catalog Service        Raw Storage           Search Index           User Interface

1. External API → MongoDB (Catalog Service)

DIS API Collections

DIS (Dealer Information Systems) provides parts data for multiple manufacturers.

CollectionManufacturerEndpoint
parts_kuhKubotaDIS API
parts_nhlNew HollandDIS API
parts_bnsBriggs & StrattonDIS API
parts_vntVentracDIS API
parts_mchMcHaleDIS API
parts_marMaruyamaDIS API
parts_kinKinzeDIS API
parts_harHarleyDIS API
parts_ferFerrisDIS API
parts_hotHotsyDIS API

Sync Trigger: Manual via catalog service endpoints or scheduled jobs.

Data Flow:

DIS API ──▶ GET /parts/{vendor} ──▶ Transform ──▶ MongoDB (parts_{vendor})

K&M Tire API

K&M Tire provides tire and wheel data via their proprietary API.

CollectionSourceItems
parts_kmtK&M Tire API~7,000 tires

Sync Endpoints:

  • POST /catalog/km-tire/sync - Single page sync
  • POST /catalog/km-tire/batch-sync - Full catalog sync (parallel)

Data Flow:

K&M API ──▶ GET /Tires (paginated) ──▶ Transform ──▶ MongoDB (parts_kmt)


                                    Elasticsearch (optional)

Rate Limiting:

  • Token bucket: 10 requests/second
  • Retry with exponential backoff
  • Parallel processing: 3 concurrent pages

Transformer: indexedPartFormat in packages/shared-catalog/src/transformers.ts

2. MongoDB → Elasticsearch (Search Sync)

Sync Process

The sync is triggered by GitHub Actions workflow (search-deploy.yml) during deployment.

Dual Sync Configuration:

Sync JobCollectionsTarget IndexDocuments
Parts Syncparts_nhl,parts_bns,parts_vnt,parts_mch,parts_kuh,parts_hot,parts_har,parts_kin,parts_marparts_current~2,500
Tires Syncparts_kmttires_current~7,000

Sync Script: services/search/scripts/sync-mongodb-to-es.ts

Usage:

# Parts sync (DIS collections → parts_current)
bash scripts/gcp-sync-data.sh "parts_nhl,parts_bns,..." "parts_current"

# Tires sync (K&M → tires_current)
bash scripts/gcp-sync-data.sh "parts_kmt" "tires_current"

Data Flow:

MongoDB ──▶ Cursor (batched) ──▶ Transform ──▶ Validate ──▶ Bulk Index ──▶ Elasticsearch
   │              │                  │            │              │
   │              │                  │            │              ▼
parts_{vendor}   2000 docs/batch   IndexedPart   Skip invalid   parts_current OR tires_current

Transformer Registry

Located in packages/shared-catalog/src/transformers.ts:

Collection PatternTransformerDescription
crop_stage.parts, crop_prod.partsindexedPartFormatPre-indexed format
parts_kmtindexedPartFormatK&M Tire (pre-indexed)
parts_* (DIS)disApiDIS API format
Archive collectionsConfig-basedLegacy data
UnknownfallbackAuto-detect fields

Index Management

Dual-Index Architecture:

The search system uses two separate indices to optimize for different use cases:

Index AliasContentCollectionsDocuments
parts_currentDIS parts catalogparts_nhl,parts_bns,parts_vnt,parts_mch,parts_kuh,parts_hot,parts_har,parts_kin,parts_mar~2,500
tires_currentK&M Tire catalogparts_kmt~7,000

Alias Structure:

parts_current_read  ──▶ parts_v{YYYY_MM_DD} (DIS parts)
parts_current_write ──▶ parts_v{YYYY_MM_DD}

tires_current_read  ──▶ tires_v{YYYY_MM_DD} (K&M Tires)
tires_current_write ──▶ tires_v{YYYY_MM_DD}

Index Routing:

  • Category filter tires, wheels, wheels & tirestires_current
  • Manufacturer filter KMT, K&M, km-tiretires_current
  • All other queries → parts_current

Alias Switch Script: services/search/scripts/switch-alias-to-latest.ts

3. Elasticsearch → Website (Search Service)

Search Service API

Base URL: https://search-service-atife5uvka-ue.a.run.app

Key Endpoints:

EndpointPurposeResponse
GET /api/searchFull search with paginationParts + facets
GET /api/filtersFacets only (no hits)Aggregations
GET /healthService health checkIndex stats

Query Flow:

Request ──▶ Parse Params ──▶ Build Query ──▶ Elasticsearch ──▶ Transform ──▶ Response
   │              │               │               │                │
   │              │               │               │                ▼
?q=tire      QueryFilters    bool query      Search response    PartPreview[]
&category=tires             + aggregations

Response Structure

{
  parts: PartPreview[],      // Search results
  pagination: {
    page: number,
    pageSize: number,
    total: number,
    totalPages: number
  },
  facets: {
    manufacturer: [...],
    category: [...],
    price: {...}
  }
}

4. Website Data Flow (Next.js)

Server-Side Rendering

The website uses Next.js with React Server Components for initial data fetch.

Tires Page: https://www.clintontractor.net/parts/tires

User Request ──▶ Next.js Server ──▶ Search API ──▶ SSR ──▶ HTML Response
                      │                 │                      │
                      │                 ▼                      ▼
                 Server Component   Elasticsearch        Pre-rendered
                                                         tire catalog

Client-Side Updates:

  • Pagination: Client-side navigation
  • Filters: Re-fetch via API
  • Search: Debounced queries

5. Environment Configuration

Database Selection

EnvironmentMongoDB DatabaseUsage
Productioncrop_devLive website
Stagecrop_devTesting (same as prod)
Developmentcrop_devLocal development

Note: crop_dev is currently used for production. Historical crop_stage references have been migrated.

Elasticsearch Configuration

ELASTICSEARCH_URL=http://10.0.0.52:9200  # VPC internal
SEARCH_INDEX_NAME=parts_current          # Alias name

6. Deployment Triggers

GitHub Actions Workflow

File: .github/workflows/search-deploy.yml

Trigger: Push to main branch

Steps:

  1. Build Docker image
  2. Deploy to Cloud Run (preview)
  3. Sync MongoDB → Elasticsearch
  4. Switch aliases
  5. Shift traffic gradually
  6. Run smoke tests

Manual Sync

To trigger a manual sync:

# Via GitHub Actions
gh workflow run search-deploy.yml

# Via Cloud Run Job
gcloud run jobs execute sync-data-auto --region=us-east1

7. Monitoring

Health Checks

# Search service health
curl https://search-service-atife5uvka-ue.a.run.app/health

# Response includes:
# - elasticsearch: ok/error
# - mongodb: ok/error
# - docCount_es: number of indexed documents
# - aliasTarget: current index name

Key Metrics

MetricLocationDescription
docCount_es/healthDocuments in Elasticsearch
docCount_api/healthDocuments via API count
delta/healthDifference (should be 0)

8. Troubleshooting

Common Issues

Issue: Website shows 0 tires

  • Check: Elasticsearch has data (/health)
  • Check: Search API returns results (/api/search?category=tires)
  • Check: Sync job completed successfully

Issue: Partial sync (fewer docs than expected)

  • Check: Cloud Run job logs for errors
  • Check: Transformer is registered for collection
  • Check: Job timeout (default 1800s)

Issue: Stale data on website

  • Check: Alias points to latest index
  • Check: Website cache (CDN/browser)
  • Trigger: New deployment or manual sync

Debug Commands

# Check index document count
curl "https://search-service-atife5uvka-ue.a.run.app/api/filters" | jq '.total'

# Check specific manufacturer
curl "https://search-service-atife5uvka-ue.a.run.app/api/search?manufacturer=kmt" | jq '.pagination.total'

# Check MongoDB collection
# Use MongoDB MCP tool: count(database: "crop_dev", collection: "parts_kmt")

9. Data Schema

IndexedPart (Elasticsearch Document)

interface IndexedPart {
  id: string;
  slug: string;
  sku: string;                    // CT-{MFG}-{PN}
  skuVariants: string[];
  partNumber: string;
  pnNorm: string;                 // Normalized for search
  title: string;
  description?: string;
  manufacturer: {
    name: string;
    code: string;                 // KMT, BNS, etc.
  };
  category: string;               // "Wheels & Tires", "parts"
  price?: {
    list: { value: number; currency: string; }
  };
  media?: {
    hasImages: boolean;
    imageCount: number;
    images?: Array<{ url: string; type: string; }>;
  };
  status: 'active' | 'discontinued';
  sources: string[];              // ['dis'], ['km-tire']
  createdAt: string;
  updatedAt: string;
}

10. Future Improvements

  1. Full K&M Sync: Currently syncing ~2000 of 7091 tires. Need to investigate timeout/chunking.
  2. Real-time Sync: Webhook-based updates instead of batch sync.
  3. Index Versioning: Implement blue-green index deployment.
  4. Monitoring: Add alerting for sync failures and data drift.

On this page