CROP

Search Index Improvement

Analysis of search index data integrity issues and improvement plan for Elasticsearch sync.

Search Index Analysis & Improvement Plan

[!NOTE] Discussion: Open an issue to comment on this plan. Attach supporting documents to the issue or link them here.

Date: 2026-01-21 Status: Critical issues identified, action required

Executive Summary

The current search index has critical data integrity issues:

  • Only 2,000 of 7,091 K&M tires are indexed (72% missing)
  • Index contains stale data from old syncs
  • Sync workflow is incomplete — missing collections
  • No separation between parts and tires indices

Current State Analysis

MongoDB Data (crop_dev)

CollectionDocumentsTypeIn Sync Config?
parts_nhl1PartsYes
parts_bns1,307PartsYes
parts_vnt211PartsYes
parts_mch63PartsYes
parts_kuh495PartsNO
parts_hot190PartsNO
parts_har147PartsNO
parts_kin101PartsNO
parts_mar66PartsNO
Parts Subtotal2,581
parts_kmt7,091TiresYes
Grand Total9,672

Elasticsearch Index (parts_current)

ManufacturerES DocsMongoDBDeltaIssue
KMT2,0007,091-5,091SYNC INCOMPLETE
BNS1,3071,3070OK
KUH447495-48Stale (not in sync)
VNT2112110OK
HOT1901900Stale (not in sync)
HAR123147-24Stale (not in sync)
KIN1011010Stale (not in sync)
MAR66660Stale (not in sync)
MCH63630OK
NHL01-1MISSING
Total4,5089,672-5,164

Critical Issues

Issue #1: K&M Tire Sync Incomplete (CRITICAL)

  • Expected: 7,091 tires
  • Actual: 2,000 tires (72% missing)
  • Root Cause: Unknown — likely Cloud Run job timeout or memory issue

Issue #2: Stale Data in Index

  • Collections KUH, HOT, HAR, KIN, MAR exist in ES from OLD syncs
  • These are NOT in current sync config, so data never updates

Issue #3: Missing Collections in Sync

  • Workflow only syncs 5 collections, should sync all 10

Issue #4: No Index Separation

  • Parts and Tires share same index
  • Need separate parts_current and tires_current indices

Issue #5: No Incremental Sync

  • Full re-sync on every deployment, no delta/change detection

Recommendations

Immediate Actions (P0)

  1. Fix K&M sync — increase Cloud Run job memory to 4Gi, timeout to 3600s
  2. Add all collections to sync — update search-deploy.yml
  3. Create fresh index — delete stale data, run full sync

Short-term (P1)

  1. Separate indices for parts vs tires
  2. Add sync validation — compare MongoDB vs ES counts after sync
  3. Add monitoring — Datadog metric for index doc count

Medium-term (P2)

  1. Incremental sync — track updatedAt, only sync changes
  2. Blue-green index deployment — validate before switching alias
  3. Separate sync jobs — independent jobs for parts and tires

Success Criteria

After Phase 1:

  • ES doc count = 9,672
  • KMT docs = 7,091
  • Parts docs = 2,581

After Phase 2:

  • Separate indices: parts_current, tires_current
  • Independent sync jobs
  • Validation step passes

On this page