CROP
ProjectsParts Services

Search Index Analysis & Improvement Plan

Date: 2026-01-21 Status: Critical issues identified, action required

Search Index Analysis & Improvement Plan

Date: 2026-01-21 Status: Critical issues identified, action required

Executive Summary

The current search index has critical data integrity issues:

  • Only 2,000 of 7,091 K&M tires are indexed (72% missing)
  • Index contains stale data from old syncs
  • Sync workflow is incomplete - missing collections
  • No separation between parts and tires indices

1. Current State Analysis

1.1 MongoDB Data (crop_dev)

CollectionDocumentsTypeIn Sync Config?
parts_nhl1PartsYes
parts_bns1,307PartsYes
parts_vnt211PartsYes
parts_mch63PartsYes
parts_kuh495PartsNO
parts_hot190PartsNO
parts_har147PartsNO
parts_kin101PartsNO
parts_mar66PartsNO
Parts Subtotal2,581
parts_kmt7,091TiresYes
Grand Total9,672

1.2 Elasticsearch Index (parts_current → parts_v2026_01_20)

ManufacturerES DocsMongoDBDeltaIssue
KMT2,0007,091-5,091SYNC INCOMPLETE
BNS1,3071,3070OK
KUH447495-48Stale (not in sync)
VNT2112110OK
HOT1901900Stale (not in sync)
HAR123147-24Stale (not in sync)
KIN1011010Stale (not in sync)
MAR66660Stale (not in sync)
MCH63630OK
NHL01-1MISSING
Total4,5089,672-5,164

1.3 Current Sync Configuration

File: .github/workflows/search-deploy.yml (line 389)

bash services/search/scripts/gcp-sync-data.sh "parts_nhl,parts_bns,parts_vnt,parts_mch,parts_kmt"

Missing collections: parts_kuh,parts_hot,parts_har,parts_kin,parts_mar


2. Critical Issues Identified

Issue #1: K&M Tire Sync Incomplete (CRITICAL)

  • Expected: 7,091 tires
  • Actual: 2,000 tires
  • Missing: 5,091 tires (72%)
  • Root Cause: Unknown - likely Cloud Run job timeout or memory issue
  • Impact: Website shows only 28% of available tires

Issue #2: Stale Data in Index

  • Collections KUH, HOT, HAR, KIN, MAR exist in ES from OLD syncs
  • These are NOT in current sync config, so data never updates
  • Impact: Outdated product information

Issue #3: Missing Collections in Sync

  • Workflow only syncs 5 collections
  • Should sync all 10 parts collections
  • Impact: Inconsistent data, some manufacturers missing

Issue #4: No Index Separation

  • Parts and Tires share same index
  • User wants:
    • parts_current → ~2,500 parts only
    • tires_current → 7,091 K&M tires
  • Impact: Cannot manage catalogs independently

Issue #5: No Incremental Sync

  • Full re-sync on every deployment
  • No delta/change detection
  • Impact: Slow deployments, potential data loss during sync

3. Constructive Criticism

What We Did Wrong

  1. Incomplete testing of K&M sync

    • Added K&M to sync without verifying full dataset syncs
    • Should have tested with production data volume locally
  2. No validation after deployment

    • Didn't check document counts after sync completed
    • Health endpoint showed 4,508 docs but we didn't verify this was correct
  3. Mixed old and new data

    • Index has data from multiple syncs with different collection sets
    • Should have created fresh index or cleaned up old data
  4. No monitoring/alerting

    • No alerts for sync failures or document count mismatches
    • Should have Datadog/Sentry alerts for data integrity
  5. Documentation gap

    • DATA_FLOWS.md created but didn't catch these issues
    • Need runbooks for common sync problems

What Worked Well

  1. Transformer registration for KMT worked correctly
  2. Manufacturer aliases configured properly
  3. Website correctly displays data from ES
  4. GitHub Actions workflow structure is solid

4. Recommendations

Immediate Actions (P0 - Today)

  1. Fix K&M sync - full 7,091 tires

    # Increase Cloud Run job memory and timeout
    --memory=4Gi
    --task-timeout=3600s  # 1 hour
  2. Add all collections to sync

    # Update search-deploy.yml line 389
    bash services/search/scripts/gcp-sync-data.sh \
      "parts_nhl,parts_bns,parts_vnt,parts_mch,parts_kuh,parts_hot,parts_har,parts_kin,parts_mar,parts_kmt"
  3. Create fresh index

    • Delete stale parts_v2026_01_20
    • Run full sync to new parts_v2026_01_21
    • Switch alias atomically

Short-term Improvements (P1 - This Week)

  1. Separate indices for parts vs tires

    parts_current → parts_v{date}  (2,581 docs)
    tires_current → tires_v{date}  (7,091 docs)
  2. Add sync validation

    • Compare MongoDB count vs ES count after sync
    • Fail deployment if delta > 1%
  3. Add monitoring

    • Datadog metric: search.index.doc_count
    • Alert if count drops by > 5%

Medium-term Improvements (P2 - This Month)

  1. Incremental sync

    • Track updatedAt in MongoDB
    • Only sync changed documents
    • Reduces sync time from 30min to ~5min
  2. Blue-green index deployment

    • Create new index alongside old
    • Validate new index before switching alias
    • Keep old index for instant rollback
  3. Separate sync jobs

    • sync-parts job for DIS collections
    • sync-tires job for K&M collection
    • Can run independently

5. Implementation Plan

Phase 1: Emergency Fix (2 hours)

graph LR
    A[Update sync config] --> B[Increase job resources]
    B --> C[Trigger manual sync]
    C --> D[Validate counts]
    D --> E[Switch alias]

Steps:

  1. Update search-deploy.yml with all collections
  2. Increase Cloud Run job memory to 4Gi
  3. Manually trigger workflow
  4. Verify ES doc count = 9,672
  5. Verify website shows correct data

Phase 2: Index Separation (1 day)

New architecture:

MongoDB (crop_dev)
├── parts_* (9 collections) → parts_current alias
└── parts_kmt (tires)       → tires_current alias

Elasticsearch
├── parts_current → parts_v{date}
│   └── 2,581 documents (DIS parts only)
└── tires_current → tires_v{date}
    └── 7,091 documents (K&M tires only)

Required changes:

  1. Create tires_current alias and index
  2. Modify sync script to support multiple target indices
  3. Update search service to query correct index based on category
  4. Update website routing

Phase 3: Monitoring & Validation (1 day)

  1. Add post-sync validation step in workflow
  2. Create Datadog dashboard for index health
  3. Add Slack alerts for sync failures
  4. Create runbook for common issues

6. Refactoring Needs

sync-mongodb-to-es.ts

Current issues:

  • Single index target hardcoded
  • No progress reporting to CI
  • Silent failures possible

Proposed changes:

// Add index targeting
const INDEX_TARGETS = {
  parts_kmt: 'tires_current',
  default: 'parts_current'
};

// Add validation
async function validateSync(collection: string, expected: number) {
  const actual = await es.count({ index: getIndex(collection) });
  if (Math.abs(actual - expected) / expected > 0.01) {
    throw new Error(`Sync validation failed: expected ${expected}, got ${actual}`);
  }
}

// Add CI-friendly output
console.log(`::set-output name=docs_synced::${total}`);

gcp-sync-data.sh

Current issues:

  • Fixed memory allocation
  • No retry on failure
  • Log parsing fragile

Proposed changes:

# Dynamic resource allocation based on collection size
if [[ "$COLLECTION" == *"kmt"* ]]; then
  MEMORY="4Gi"
  TIMEOUT="3600s"
else
  MEMORY="2Gi"
  TIMEOUT="1800s"
fi

# Retry logic
MAX_RETRIES=3
for i in $(seq 1 $MAX_RETRIES); do
  if gcloud run jobs execute ...; then
    break
  fi
  echo "Retry $i of $MAX_RETRIES"
  sleep 30
done

7. Success Criteria

After implementing Phase 1:

  • ES doc count = 9,672
  • KMT docs = 7,091
  • Parts docs = 2,581
  • Website /parts/tires shows 7,091 tires
  • Website /parts/catalog shows 2,581 parts

After implementing Phase 2:

  • Separate indices: parts_current, tires_current
  • Independent sync jobs
  • Validation step passes

After implementing Phase 3:

  • Monitoring dashboard live
  • Alerts configured
  • Runbook documented

8. Risk Assessment

RiskLikelihoodImpactMitigation
Sync timeout againMediumHighIncrease resources, add retry
Data loss during reindexLowCriticalBlue-green deployment
Website downtimeLowHighAlias atomic switch
Memory exhaustionMediumMediumMonitor and tune

Appendix: Commands

Check current state

# ES document count
curl -s "https://search-service-atife5uvka-ue.a.run.app/health" | jq '.docCount_es'

# ES by manufacturer
curl -s "https://search-service-atife5uvka-ue.a.run.app/api/filters" | jq '.facets.manufacturer'

# MongoDB counts (via MCP)
# Use mcp__mongodb__count tool

Trigger manual sync

gh workflow run search-deploy.yml
gh run watch

Check sync job logs

gcloud logging read \
  "resource.type=cloud_run_job AND resource.labels.job_name=sync-data-auto" \
  --limit=100 \
  --project="noted-bliss-466410-q6"

On this page