Search Index Analysis & Improvement Plan

Date: 2026-01-21 Status: Critical issues identified, action required

Executive Summary

The current search index has critical data integrity issues:

Only 2,000 of 7,091 K&M tires are indexed (72% missing)
Index contains stale data from old syncs
Sync workflow is incomplete - missing collections
No separation between parts and tires indices

1. Current State Analysis

1.1 MongoDB Data (crop_dev)

Collection	Documents	Type	In Sync Config?
parts_nhl	1	Parts	Yes
parts_bns	1,307	Parts	Yes
parts_vnt	211	Parts	Yes
parts_mch	63	Parts	Yes
parts_kuh	495	Parts	NO
parts_hot	190	Parts	NO
parts_har	147	Parts	NO
parts_kin	101	Parts	NO
parts_mar	66	Parts	NO
Parts Subtotal	2,581
parts_kmt	7,091	Tires	Yes
Grand Total	9,672

1.2 Elasticsearch Index (parts_current → parts_v2026_01_20)

Manufacturer	ES Docs	MongoDB	Delta	Issue
KMT	2,000	7,091	-5,091	SYNC INCOMPLETE
BNS	1,307	1,307	0	OK
KUH	447	495	-48	Stale (not in sync)
VNT	211	211	0	OK
HOT	190	190	0	Stale (not in sync)
HAR	123	147	-24	Stale (not in sync)
KIN	101	101	0	Stale (not in sync)
MAR	66	66	0	Stale (not in sync)
MCH	63	63	0	OK
NHL	0	1	-1	MISSING
Total	4,508	9,672	-5,164

1.3 Current Sync Configuration

File: .github/workflows/search-deploy.yml (line 389)

bash services/search/scripts/gcp-sync-data.sh "parts_nhl,parts_bns,parts_vnt,parts_mch,parts_kmt"

Missing collections: parts_kuh,parts_hot,parts_har,parts_kin,parts_mar

2. Critical Issues Identified

Issue #1: K&M Tire Sync Incomplete (CRITICAL)

Expected: 7,091 tires
Actual: 2,000 tires
Missing: 5,091 tires (72%)
Root Cause: Unknown - likely Cloud Run job timeout or memory issue
Impact: Website shows only 28% of available tires

Issue #2: Stale Data in Index

Collections KUH, HOT, HAR, KIN, MAR exist in ES from OLD syncs
These are NOT in current sync config, so data never updates
Impact: Outdated product information

Issue #3: Missing Collections in Sync

Workflow only syncs 5 collections
Should sync all 10 parts collections
Impact: Inconsistent data, some manufacturers missing

Issue #4: No Index Separation

Parts and Tires share same index
User wants:
- parts_current → ~2,500 parts only
- tires_current → 7,091 K&M tires
Impact: Cannot manage catalogs independently

Issue #5: No Incremental Sync

Full re-sync on every deployment
No delta/change detection
Impact: Slow deployments, potential data loss during sync

3. Constructive Criticism

What We Did Wrong

Incomplete testing of K&M sync
- Added K&M to sync without verifying full dataset syncs
- Should have tested with production data volume locally
No validation after deployment
- Didn't check document counts after sync completed
- Health endpoint showed 4,508 docs but we didn't verify this was correct
Mixed old and new data
- Index has data from multiple syncs with different collection sets
- Should have created fresh index or cleaned up old data
No monitoring/alerting
- No alerts for sync failures or document count mismatches
- Should have Datadog/Sentry alerts for data integrity
Documentation gap
- DATA_FLOWS.md created but didn't catch these issues
- Need runbooks for common sync problems

What Worked Well

Transformer registration for KMT worked correctly
Manufacturer aliases configured properly
Website correctly displays data from ES
GitHub Actions workflow structure is solid

4. Recommendations

Immediate Actions (P0 - Today)

Fix K&M sync - full 7,091 tires

# Increase Cloud Run job memory and timeout
--memory=4Gi
--task-timeout=3600s  # 1 hour

Add all collections to sync

# Update search-deploy.yml line 389
bash services/search/scripts/gcp-sync-data.sh \
  "parts_nhl,parts_bns,parts_vnt,parts_mch,parts_kuh,parts_hot,parts_har,parts_kin,parts_mar,parts_kmt"

Create fresh index
- Delete stale parts_v2026_01_20
- Run full sync to new parts_v2026_01_21
- Switch alias atomically

Short-term Improvements (P1 - This Week)

Separate indices for parts vs tires

parts_current → parts_v{date}  (2,581 docs)
tires_current → tires_v{date}  (7,091 docs)

Add sync validation
- Compare MongoDB count vs ES count after sync
- Fail deployment if delta > 1%
Add monitoring
- Datadog metric: search.index.doc_count
- Alert if count drops by > 5%

Medium-term Improvements (P2 - This Month)

Incremental sync
- Track updatedAt in MongoDB
- Only sync changed documents
- Reduces sync time from 30min to ~5min
Blue-green index deployment
- Create new index alongside old
- Validate new index before switching alias
- Keep old index for instant rollback
Separate sync jobs
- sync-parts job for DIS collections
- sync-tires job for K&M collection
- Can run independently

5. Implementation Plan

Phase 1: Emergency Fix (2 hours)

graph LR
    A[Update sync config] --> B[Increase job resources]
    B --> C[Trigger manual sync]
    C --> D[Validate counts]
    D --> E[Switch alias]

Steps:

Update search-deploy.yml with all collections
Increase Cloud Run job memory to 4Gi
Manually trigger workflow
Verify ES doc count = 9,672
Verify website shows correct data

Phase 2: Index Separation (1 day)

New architecture:

MongoDB (crop_dev)
├── parts_* (9 collections) → parts_current alias
└── parts_kmt (tires)       → tires_current alias

Elasticsearch
├── parts_current → parts_v{date}
│   └── 2,581 documents (DIS parts only)
└── tires_current → tires_v{date}
    └── 7,091 documents (K&M tires only)

Required changes:

Create tires_current alias and index
Modify sync script to support multiple target indices
Update search service to query correct index based on category
Update website routing

Phase 3: Monitoring & Validation (1 day)

Add post-sync validation step in workflow
Create Datadog dashboard for index health
Add Slack alerts for sync failures
Create runbook for common issues

6. Refactoring Needs

sync-mongodb-to-es.ts

Current issues:

Single index target hardcoded
No progress reporting to CI
Silent failures possible

Proposed changes:

// Add index targeting
const INDEX_TARGETS = {
  parts_kmt: 'tires_current',
  default: 'parts_current'
};

// Add validation
async function validateSync(collection: string, expected: number) {
  const actual = await es.count({ index: getIndex(collection) });
  if (Math.abs(actual - expected) / expected > 0.01) {
    throw new Error(`Sync validation failed: expected ${expected}, got ${actual}`);
  }
}

// Add CI-friendly output
console.log(`::set-output name=docs_synced::${total}`);

gcp-sync-data.sh

Current issues:

Fixed memory allocation
No retry on failure
Log parsing fragile

Proposed changes:

# Dynamic resource allocation based on collection size
if [[ "$COLLECTION" == *"kmt"* ]]; then
  MEMORY="4Gi"
  TIMEOUT="3600s"
else
  MEMORY="2Gi"
  TIMEOUT="1800s"
fi

# Retry logic
MAX_RETRIES=3
for i in $(seq 1 $MAX_RETRIES); do
  if gcloud run jobs execute ...; then
    break
  fi
  echo "Retry $i of $MAX_RETRIES"
  sleep 30
done

7. Success Criteria

After implementing Phase 1:

After implementing Phase 2:

Separate indices: parts_current, tires_current
Independent sync jobs
Validation step passes

After implementing Phase 3:

Monitoring dashboard live
Alerts configured
Runbook documented

8. Risk Assessment

Risk	Likelihood	Impact	Mitigation
Sync timeout again	Medium	High	Increase resources, add retry
Data loss during reindex	Low	Critical	Blue-green deployment
Website downtime	Low	High	Alias atomic switch
Memory exhaustion	Medium	Medium	Monitor and tune

Appendix: Commands

Check current state

# ES document count
curl -s "https://search-service-atife5uvka-ue.a.run.app/health" | jq '.docCount_es'

# ES by manufacturer
curl -s "https://search-service-atife5uvka-ue.a.run.app/api/filters" | jq '.facets.manufacturer'

# MongoDB counts (via MCP)
# Use mcp__mongodb__count tool

Trigger manual sync

gh workflow run search-deploy.yml
gh run watch

Check sync job logs

gcloud logging read \
  "resource.type=cloud_run_job AND resource.labels.job_name=sync-data-auto" \
  --limit=100 \
  --project="noted-bliss-466410-q6"

Search Index Analysis & Improvement Plan

Executive Summary

1. Current State Analysis

1.1 MongoDB Data (crop_dev)

1.2 Elasticsearch Index (parts_current → parts_v2026_01_20)

1.3 Current Sync Configuration

2. Critical Issues Identified

Issue #1: K&M Tire Sync Incomplete (CRITICAL)

Issue #2: Stale Data in Index

Issue #3: Missing Collections in Sync

Issue #4: No Index Separation

Issue #5: No Incremental Sync

3. Constructive Criticism

What We Did Wrong

What Worked Well

4. Recommendations

Immediate Actions (P0 - Today)

Short-term Improvements (P1 - This Week)

Medium-term Improvements (P2 - This Month)

5. Implementation Plan

Phase 1: Emergency Fix (2 hours)

Phase 2: Index Separation (1 day)

Phase 3: Monitoring & Validation (1 day)

6. Refactoring Needs

sync-mongodb-to-es.ts

gcp-sync-data.sh

7. Success Criteria

8. Risk Assessment

Appendix: Commands

Check current state

Trigger manual sync

Check sync job logs

Search Index Analysis & Improvement Plan

On this page