CROP
ProjectsParts Services

CROP Search - Cleanup and Sync Plan

> Date: 2026-01-23 > Status: In Progress > Priority: CRITICAL

CROP Search - Cleanup and Sync Plan

Date: 2026-01-23 Status: In Progress Priority: CRITICAL


Current Issues

Data

SourcePartsTires (KMT)Total
MongoDB crop_dev2,5817,0919,672
ES parts_current2,5942,000 ❌4,594
ES tires_current07,091 ✅7,091

Problems:

  1. ❌ 2,000 KMT tires in parts_current (should not be there)
  2. ❌ Only 2,000 out of 7,091 KMT made it to parts_current (incomplete sync)
  3. ✅ tires_current contains correct 7,091 KMT

Expected State After Fix

IndexContentsCount
parts_currentParts only (productType='part')~2,581
tires_currentK&M Tires only (productType='tire')7,091

Fix Plan

Phase 1: Clean parts_current from KMT

Goal: Remove 2,000 KMT documents from parts_current

# 1. Check current state
curl -s "https://api.crop-dev.app/api/filters" | jq '.facets.manufacturer[] | select(.key=="kmt")'

# 2. Delete KMT from parts_current (via SSH on VM)
gcloud compute ssh elasticsearch-vm --zone=us-east1-b --project=noted-bliss-466410-q6 --tunnel-through-iap \
  --command="curl -X POST 'localhost:9200/parts_current/_delete_by_query' \
    -H 'Content-Type: application/json' \
    -d '{\"query\":{\"term\":{\"manufacturer.code\":\"kmt\"}}}'"

# 3. Force refresh
gcloud compute ssh elasticsearch-vm --zone=us-east1-b --project=noted-bliss-466410-q6 --tunnel-through-iap \
  --command="curl -X POST 'localhost:9200/parts_current/_refresh'"

# 4. Verify result
curl -s "https://api.crop-dev.app/health" | jq '{docCount_es, elasticsearch}'
curl -s "https://api.crop-dev.app/api/filters" | jq '.total'

Expected result: parts_current = 2,594 (without KMT)


Phase 2: Fix Architecture

2.1 Update env-manager-service.ts

Remove parts_kmt from parts collections list:

// services/search/src/services/env-manager-service.ts

// Collections for parts_current (no tires)
const PARTS_COLLECTIONS = [
  'parts_nhl', 'parts_bns', 'parts_vnt',
  'parts_kuh', 'parts_hot', 'parts_har', 'parts_mar', 'parts_kin', 'parts_mch',
];

// Collections for tires_current
const TIRES_COLLECTIONS = ['parts_kmt'];

export const ENVIRONMENTS: Record<EnvironmentId, EnvironmentConfig> = {
  crop_dev: {
    name: 'Development',
    collections: PARTS_COLLECTIONS,  // WITHOUT parts_kmt!
    tiresCollections: TIRES_COLLECTIONS,
    description: 'Development dataset with all vendors',
  },
  // ...
};

2.2 Add auto-routing to sync script

// services/search/scripts/sync-mongodb-to-es.ts

const TIRE_COLLECTIONS = new Set(['parts_kmt']);

function getTargetIndex(collectionName: string): string {
  if (TIRE_COLLECTIONS.has(collectionName)) {
    return env.TIRES_INDEX_NAME; // tires_current
  }
  return env.SEARCH_INDEX_NAME; // parts_current
}

Phase 3: Add productType

3.1 Apply new ES mapping

# Already added to elasticsearch-mapping.json:
# "productType": { "type": "keyword", "normalizer": "lowercase_ascii" }

# Update mapping on VM
gcloud compute ssh elasticsearch-vm --zone=us-east1-b --project=noted-bliss-466410-q6 --tunnel-through-iap \
  --command="curl -X PUT 'localhost:9200/parts_current/_mapping' \
    -H 'Content-Type: application/json' \
    -d '{\"properties\":{\"productType\":{\"type\":\"keyword\",\"normalizer\":\"lowercase_ascii\"}}}'"

gcloud compute ssh elasticsearch-vm --zone=us-east1-b --project=noted-bliss-466410-q6 --tunnel-through-iap \
  --command="curl -X PUT 'localhost:9200/tires_current/_mapping' \
    -H 'Content-Type: application/json' \
    -d '{\"properties\":{\"productType\":{\"type\":\"keyword\",\"normalizer\":\"lowercase_ascii\"}}}'"

3.2 Update productType in existing data

# Set productType='part' for all in parts_current
gcloud compute ssh elasticsearch-vm --zone=us-east1-b --project=noted-bliss-466410-q6 --tunnel-through-iap \
  --command="curl -X POST 'localhost:9200/parts_current/_update_by_query' \
    -H 'Content-Type: application/json' \
    -d '{\"script\":{\"source\":\"ctx._source.productType = \\\"part\\\"\"},\"query\":{\"match_all\":{}}}'"

# Set productType='tire' for all in tires_current
gcloud compute ssh elasticsearch-vm --zone=us-east1-b --project=noted-bliss-466410-q6 --tunnel-through-iap \
  --command="curl -X POST 'localhost:9200/tires_current/_update_by_query' \
    -H 'Content-Type: application/json' \
    -d '{\"script\":{\"source\":\"ctx._source.productType = \\\"tire\\\"\"},\"query\":{\"match_all\":{}}}'"

3.3 Add productType filter to Search API

// services/search/src/schemas/search.ts
productType: z.enum(['part', 'tire', 'accessory']).optional(),

// services/search/src/utils/query-builder.ts
if (params.productType) {
  filters.push({ term: { productType: params.productType } });
}

Phase 4: Testing

# 1. Health check
curl -s "https://api.crop-dev.app/health" | jq

# 2. Check parts_current (should be ~2,594, all productType='part')
curl -s "https://api.crop-dev.app/api/search?productType=part&limit=1" | jq '.pagination.total'

# 3. Check tires_current (should be 7,091, all productType='tire')
curl -s "https://api.crop-dev.app/api/search?productType=tire&limit=1" | jq '.pagination.total'

# 4. Check search without filter (all ~9,685)
curl -s "https://api.crop-dev.app/api/search?q=*&limit=1" | jq '.pagination.total'

# 5. Verify KMT not in parts (manufacturer filter)
curl -s "https://api.crop-dev.app/api/filters" | jq '.facets.manufacturer[] | select(.key=="kmt")'
# Should return empty result

Phase 5: Frontend Integration

/catalog page

// Add productType: 'part' to buildSearchParams
const searchParams = {
  ...params,
  productType: 'part',  // Parts only
};

/tires page

// Add productType: 'tire' to buildSearchParams
const searchParams = {
  ...params,
  productType: 'tire',  // Tires only
};

Search (global)

// Don't specify productType - search across all
const searchParams = {
  ...params,
  // productType: not specified
};

Execution Checklist

Phase 1: Cleanup (15 min)

  • Delete KMT from parts_current
  • Refresh index
  • Verify count = 2,594

Phase 2: Architecture (30 min)

  • Update env-manager-service.ts
  • Add TIRE_COLLECTIONS
  • Add auto-routing to sync script
  • Test locally

Phase 3: productType (30 min)

  • Apply ES mapping
  • Update by query for parts_current
  • Update by query for tires_current
  • Add filter to Search API
  • Test API

Phase 4: Testing (15 min)

  • Health check
  • productType=part → 2,594
  • productType=tire → 7,091
  • Search all → ~9,685
  • KMT absent from /api/filters for parts

Phase 5: Frontend (separate PR)

  • /catalog → productType: 'part'
  • /tires → productType: 'tire'
  • Search without filter

Risks and Mitigation

RiskProbabilityMitigation
Data loss during delete_by_queryLowtires_current contains full KMT data
API downtimeLowOperations are atomic
Frontend breaksMediumproductType optional, backward compatible

Quick Commands for Phase 1

# Execute all at once:

# 1. Delete KMT from parts_current
gcloud compute ssh elasticsearch-vm --zone=us-east1-b --project=noted-bliss-466410-q6 --tunnel-through-iap \
  --command="curl -X POST 'localhost:9200/parts_current/_delete_by_query?refresh=true' \
    -H 'Content-Type: application/json' \
    -d '{\"query\":{\"term\":{\"manufacturer.code\":\"kmt\"}}}'"

# 2. Verify
curl -s "https://api.crop-dev.app/health" | jq '{ok, docCount_es}'
curl -s "https://api.crop-dev.app/api/filters" | jq '[.facets.manufacturer[].key]'

On this page