ProjectsParts Services
CROP Search - Cleanup and Sync Plan
> Date: 2026-01-23 > Status: In Progress > Priority: CRITICAL
CROP Search - Cleanup and Sync Plan
Date: 2026-01-23 Status: In Progress Priority: CRITICAL
Current Issues
Data
| Source | Parts | Tires (KMT) | Total |
|---|---|---|---|
| MongoDB crop_dev | 2,581 | 7,091 | 9,672 |
| ES parts_current | 2,594 | 2,000 ❌ | 4,594 |
| ES tires_current | 0 | 7,091 ✅ | 7,091 |
Problems:
- ❌ 2,000 KMT tires in parts_current (should not be there)
- ❌ Only 2,000 out of 7,091 KMT made it to parts_current (incomplete sync)
- ✅ tires_current contains correct 7,091 KMT
Expected State After Fix
| Index | Contents | Count |
|---|---|---|
| parts_current | Parts only (productType='part') | ~2,581 |
| tires_current | K&M Tires only (productType='tire') | 7,091 |
Fix Plan
Phase 1: Clean parts_current from KMT
Goal: Remove 2,000 KMT documents from parts_current
# 1. Check current state
curl -s "https://api.crop-dev.app/api/filters" | jq '.facets.manufacturer[] | select(.key=="kmt")'
# 2. Delete KMT from parts_current (via SSH on VM)
gcloud compute ssh elasticsearch-vm --zone=us-east1-b --project=noted-bliss-466410-q6 --tunnel-through-iap \
--command="curl -X POST 'localhost:9200/parts_current/_delete_by_query' \
-H 'Content-Type: application/json' \
-d '{\"query\":{\"term\":{\"manufacturer.code\":\"kmt\"}}}'"
# 3. Force refresh
gcloud compute ssh elasticsearch-vm --zone=us-east1-b --project=noted-bliss-466410-q6 --tunnel-through-iap \
--command="curl -X POST 'localhost:9200/parts_current/_refresh'"
# 4. Verify result
curl -s "https://api.crop-dev.app/health" | jq '{docCount_es, elasticsearch}'
curl -s "https://api.crop-dev.app/api/filters" | jq '.total'Expected result: parts_current = 2,594 (without KMT)
Phase 2: Fix Architecture
2.1 Update env-manager-service.ts
Remove parts_kmt from parts collections list:
// services/search/src/services/env-manager-service.ts
// Collections for parts_current (no tires)
const PARTS_COLLECTIONS = [
'parts_nhl', 'parts_bns', 'parts_vnt',
'parts_kuh', 'parts_hot', 'parts_har', 'parts_mar', 'parts_kin', 'parts_mch',
];
// Collections for tires_current
const TIRES_COLLECTIONS = ['parts_kmt'];
export const ENVIRONMENTS: Record<EnvironmentId, EnvironmentConfig> = {
crop_dev: {
name: 'Development',
collections: PARTS_COLLECTIONS, // WITHOUT parts_kmt!
tiresCollections: TIRES_COLLECTIONS,
description: 'Development dataset with all vendors',
},
// ...
};2.2 Add auto-routing to sync script
// services/search/scripts/sync-mongodb-to-es.ts
const TIRE_COLLECTIONS = new Set(['parts_kmt']);
function getTargetIndex(collectionName: string): string {
if (TIRE_COLLECTIONS.has(collectionName)) {
return env.TIRES_INDEX_NAME; // tires_current
}
return env.SEARCH_INDEX_NAME; // parts_current
}Phase 3: Add productType
3.1 Apply new ES mapping
# Already added to elasticsearch-mapping.json:
# "productType": { "type": "keyword", "normalizer": "lowercase_ascii" }
# Update mapping on VM
gcloud compute ssh elasticsearch-vm --zone=us-east1-b --project=noted-bliss-466410-q6 --tunnel-through-iap \
--command="curl -X PUT 'localhost:9200/parts_current/_mapping' \
-H 'Content-Type: application/json' \
-d '{\"properties\":{\"productType\":{\"type\":\"keyword\",\"normalizer\":\"lowercase_ascii\"}}}'"
gcloud compute ssh elasticsearch-vm --zone=us-east1-b --project=noted-bliss-466410-q6 --tunnel-through-iap \
--command="curl -X PUT 'localhost:9200/tires_current/_mapping' \
-H 'Content-Type: application/json' \
-d '{\"properties\":{\"productType\":{\"type\":\"keyword\",\"normalizer\":\"lowercase_ascii\"}}}'"3.2 Update productType in existing data
# Set productType='part' for all in parts_current
gcloud compute ssh elasticsearch-vm --zone=us-east1-b --project=noted-bliss-466410-q6 --tunnel-through-iap \
--command="curl -X POST 'localhost:9200/parts_current/_update_by_query' \
-H 'Content-Type: application/json' \
-d '{\"script\":{\"source\":\"ctx._source.productType = \\\"part\\\"\"},\"query\":{\"match_all\":{}}}'"
# Set productType='tire' for all in tires_current
gcloud compute ssh elasticsearch-vm --zone=us-east1-b --project=noted-bliss-466410-q6 --tunnel-through-iap \
--command="curl -X POST 'localhost:9200/tires_current/_update_by_query' \
-H 'Content-Type: application/json' \
-d '{\"script\":{\"source\":\"ctx._source.productType = \\\"tire\\\"\"},\"query\":{\"match_all\":{}}}'"3.3 Add productType filter to Search API
// services/search/src/schemas/search.ts
productType: z.enum(['part', 'tire', 'accessory']).optional(),
// services/search/src/utils/query-builder.ts
if (params.productType) {
filters.push({ term: { productType: params.productType } });
}Phase 4: Testing
# 1. Health check
curl -s "https://api.crop-dev.app/health" | jq
# 2. Check parts_current (should be ~2,594, all productType='part')
curl -s "https://api.crop-dev.app/api/search?productType=part&limit=1" | jq '.pagination.total'
# 3. Check tires_current (should be 7,091, all productType='tire')
curl -s "https://api.crop-dev.app/api/search?productType=tire&limit=1" | jq '.pagination.total'
# 4. Check search without filter (all ~9,685)
curl -s "https://api.crop-dev.app/api/search?q=*&limit=1" | jq '.pagination.total'
# 5. Verify KMT not in parts (manufacturer filter)
curl -s "https://api.crop-dev.app/api/filters" | jq '.facets.manufacturer[] | select(.key=="kmt")'
# Should return empty resultPhase 5: Frontend Integration
/catalog page
// Add productType: 'part' to buildSearchParams
const searchParams = {
...params,
productType: 'part', // Parts only
};/tires page
// Add productType: 'tire' to buildSearchParams
const searchParams = {
...params,
productType: 'tire', // Tires only
};Search (global)
// Don't specify productType - search across all
const searchParams = {
...params,
// productType: not specified
};Execution Checklist
Phase 1: Cleanup (15 min)
- Delete KMT from parts_current
- Refresh index
- Verify count = 2,594
Phase 2: Architecture (30 min)
- Update env-manager-service.ts
- Add TIRE_COLLECTIONS
- Add auto-routing to sync script
- Test locally
Phase 3: productType (30 min)
- Apply ES mapping
- Update by query for parts_current
- Update by query for tires_current
- Add filter to Search API
- Test API
Phase 4: Testing (15 min)
- Health check
- productType=part → 2,594
- productType=tire → 7,091
- Search all → ~9,685
- KMT absent from /api/filters for parts
Phase 5: Frontend (separate PR)
- /catalog → productType: 'part'
- /tires → productType: 'tire'
- Search without filter
Risks and Mitigation
| Risk | Probability | Mitigation |
|---|---|---|
| Data loss during delete_by_query | Low | tires_current contains full KMT data |
| API downtime | Low | Operations are atomic |
| Frontend breaks | Medium | productType optional, backward compatible |
Quick Commands for Phase 1
# Execute all at once:
# 1. Delete KMT from parts_current
gcloud compute ssh elasticsearch-vm --zone=us-east1-b --project=noted-bliss-466410-q6 --tunnel-through-iap \
--command="curl -X POST 'localhost:9200/parts_current/_delete_by_query?refresh=true' \
-H 'Content-Type: application/json' \
-d '{\"query\":{\"term\":{\"manufacturer.code\":\"kmt\"}}}'"
# 2. Verify
curl -s "https://api.crop-dev.app/health" | jq '{ok, docCount_es}'
curl -s "https://api.crop-dev.app/api/filters" | jq '[.facets.manufacturer[].key]'