Catalog Service
Category taxonomy, manufacturer aliases, and catalog management
Catalog Service
The catalog service manages product categorization and manufacturer normalization for the CROP parts platform. It provides a unified category taxonomy, keyword-based product classification, and manufacturer alias resolution for search and autocomplete.
Category Taxonomy
Unified category taxonomy based on New Holland, Briggs & Stratton, and Ventrac standards.
Location: packages/shared-types/src/categories/
Category Hierarchy
graph TD
subgraph "Level 1 - Root Categories"
AF[AIR FILTRATION]
AS[ALTERNATORS & STARTERS]
BA[BATTERIES & ACCESSORIES]
BB[BEARINGS & BUSHINGS]
BC[BELTS & CHAINS]
BL[BLADES & CUTTING]
BR[BRAKES]
CF[CARBURETORS & FUEL]
CT[CLUTCHES & TRANSMISSION]
EL[ELECTRICAL]
EP[ENGINE PARTS]
FI[FILTERS]
FL[FLUIDS & LUBRICANTS]
GS[GASKETS & SEALS]
GI[GAUGES & INSTRUMENTS]
HW[HARDWARE]
HY[HYDRAULICS]
IS[IGNITION & SPARK]
LI[LIGHTS]
PT[PTO]
PS[PULLEYS & SPINDLES]
SE[SENSORS]
ST[STEERING]
WT[WHEELS & TIRES]
CE[COOLING & EXHAUST]
UN[UNCATEGORIZED]
end
subgraph "Level 2 - Subcategories Examples"
AF --> AF1[Air Filters]
AF --> AF2[Pre-Filters]
BB --> BB1[Bearings]
BB --> BB2[Bushings]
FI --> FI1[Oil Filters]
FI --> FI2[Fuel Filters]
FI --> FI3[Hydraulic Filters]
FI --> FI4[Cabin Filters]
FI --> FI5[Filter Kits]
GS --> GS1[Gaskets]
GS --> GS2[Seals]
GS --> GS3[O-Rings]
GS --> GS4[Seal Kits]
HY --> HY1[Hydraulic Pumps]
HY --> HY2[Hydraulic Cylinders]
HY --> HY3[Hydraulic Valves]
HY --> HY4[Fittings & Hoses]
EL --> EL1[Switches]
EL --> EL2[Wiring & Connectors]
EL --> EL3[Relays & Fuses]
end
style AF fill:#e1f5fe
style FI fill:#e1f5fe
style HY fill:#e1f5fe
style EL fill:#e1f5fe
style BB fill:#e1f5fe
style GS fill:#e1f5fe
style UN fill:#ffebeeStats: 56 total categories (26 Level 1, 30 Level 2), 250+ keywords, average 5-10 keywords per category.
Category Document Schema
erDiagram
CATEGORY {
string id PK
string slug
string name
int level
string path
array path_ids
string parent_id FK
array children
array keywords
string icon
int sort_order
}
PRODUCT {
string id PK
string partNumber
string title
string manufacturer
}
PRODUCT_CATEGORY {
string id
string slug
string name
int level
string path
array path_ids
boolean leaf
float confidence
string matchedKeyword
boolean needsReview
}
CATEGORY ||--o{ CATEGORY : "parent_id"
PRODUCT ||--|| PRODUCT_CATEGORY : "category[0]"Category Fields
Category fields use camelCase naming:
| Field | Type | Description |
|---|---|---|
categoryName | string[] | Array of category names |
categoryPath | string[] | Array of category paths |
categoryId | string[] | Array of category IDs |
The search API accepts both camelCase and legacy snake_case query parameters for backward compatibility:
?categoryId=filters # Recommended
?category_id=filters # Legacy (still supported)API responses use camelCase fields only:
{
"id": "part-123",
"categoryId": ["filters"],
"categoryName": ["Air Filters"],
"categoryPath": ["FILTERS > Air Filters"]
}Normalization Flow
flowchart TD
A[Product Title] --> B{Find Keyword Match}
B -->|Found| C[Get Category ID]
B -->|Not Found| D[UNCATEGORIZED]
C --> E[Calculate Confidence Score]
E --> F{Confidence >= threshold?}
F -->|Yes| G[Build Category Result]
F -->|No| H[Flag for Review]
D --> I[Set needsReview = true]
H --> G
I --> G
G --> J[ProductCategoryResult]
subgraph "Result Structure"
J --> K[category: CategoryResult[]]
J --> L[categoryPath: string[]]
J --> M[categoryId: string[]]
J --> N[categoryName: string[]]
J --> O[breadcrumbs: string[]]
J --> P[uncategorized: boolean]
end
style A fill:#e3f2fd
style J fill:#c8e6c9
style D fill:#ffebee
style H fill:#fff3e0Keyword Matching Algorithm
flowchart LR
A[Input Title] --> B[Normalize to lowercase]
B --> C[Sort Keywords by Priority]
C --> D[Iterate Keywords]
D --> E{Title includes keyword?}
E -->|Yes| F[Return Match]
E -->|No| G{More keywords?}
G -->|Yes| D
G -->|No| H[Return null]
F --> I[categoryId]
F --> J[keyword]
F --> K[priority]
style A fill:#e3f2fd
style F fill:#c8e6c9
style H fill:#ffebeeConfidence Score Calculation
flowchart TD
A[Start: priority value] --> B[Base = 0.5 + priority/200]
B --> C{Word boundary match?}
C -->|Yes| D[+0.1]
C -->|No| E[+0.0]
D --> F{Starts with keyword?}
E --> F
F -->|Yes| G[+0.05]
F -->|No| H[+0.0]
G --> I{Multi-word keyword?}
H --> I
I -->|Yes| J[+0.1]
I -->|No| K[+0.0]
J --> L[Cap at 1.0]
K --> L
L --> M[Final Confidence]
style A fill:#e3f2fd
style M fill:#c8e6c9Data Flow Architecture
flowchart TB
subgraph "Source Data"
BNS[BNS Parts CSV]
VNT[VNT Parts CSV]
NHL[NHL Parts DB]
end
subgraph "Transformers"
BT[BNS Transformer]
VT[VNT Transformer]
NT[NHL Transformer]
end
subgraph "Category Normalization"
CN[categorizeProduct]
TX[TAXONOMY]
KW[KEYWORD_MAPPINGS]
end
subgraph "MongoDB Collections"
PS[parts_stage]
PC[categories]
end
subgraph "Elasticsearch"
ES[parts index]
end
BNS --> BT
VNT --> VT
NHL --> NT
BT --> CN
VT --> CN
NT --> CN
TX --> CN
KW --> CN
CN --> PS
TX --> PC
PS --> ES
PC --> ES
style CN fill:#c8e6c9
style TX fill:#fff3e0
style KW fill:#fff3e0Category API Usage
Categorize a Single Product
import { categorizeProduct } from '@crop/shared-types/categories';
const result = categorizeProduct('Bearing, Ball 25 mm x 62 mm');
console.log(result);
// {
// category: [{
// id: 'bearings',
// name: 'Bearings',
// path: 'BEARINGS & BUSHINGS > Bearings',
// confidence: 0.85,
// matchedKeyword: 'bearing'
// }],
// categoryPath: ['BEARINGS & BUSHINGS > Bearings'],
// categoryId: ['bearings'],
// categoryName: ['Bearings'],
// breadcrumbs: ['BEARINGS & BUSHINGS', 'Bearings'],
// uncategorized: false
// }Batch Categorization
import { categorizeProducts, getCategorizeStats } from '@crop/shared-types/categories';
const products = [
{ title: 'SWITCH-PTO PUSH' },
{ title: 'KIT, FILTER' },
{ title: 'Bearing, Ball' },
];
const categorized = categorizeProducts(products);
const stats = getCategorizeStats(categorized);
console.log(stats);
// {
// total: 3,
// categorized: 3,
// uncategorized: 0,
// needsReview: 0,
// byCategory: { 'pto': 1, 'filter-kits': 1, 'bearings': 1 },
// avgConfidence: 0.87
// }Get All Categories
import { getRootCategories, getCategoryChildren, TAXONOMY } from '@crop/shared-types/categories';
// Get root categories
const roots = getRootCategories();
console.log(roots.map(c => c.name));
// ['AIR FILTRATION', 'ALTERNATORS & STARTERS', ...]
// Get children
const filterChildren = getCategoryChildren('filters');
console.log(filterChildren.map(c => c.name));
// ['Oil Filters', 'Fuel Filters', 'Hydraulic Filters', 'Cabin Filters', 'Filter Kits']MongoDB Index Strategy
graph LR
subgraph "Categories Collection Indexes"
I1[_id - Primary]
I2[slug - Unique]
I3[parent_id - Reference]
I4[level - Filter]
I5["path_ids - Array"]
end
subgraph "Parts Collection Indexes"
P1["categoryId - Category filter"]
P2["category.id - Nested"]
P3["uncategorized - Flag"]
P4["category.needsReview - QA"]
end
style I1 fill:#e8f5e9
style I2 fill:#e8f5e9
style P1 fill:#e3f2fd
style P2 fill:#e3f2fdManufacturer Aliases
The manufacturer alias system enables partial manufacturer name queries to resolve to canonical names in autocomplete. For example, "newh" resolves to "New Holland", "nh" resolves to "New Holland", and "briggs" resolves to "Briggs & Stratton".
Location: packages/shared-catalog/src/config/manufacturers.ts
Performance
- Single query lookup: < 0.05ms (O(1) hash table lookup)
- 1000 queries: < 50ms
- Alias map size: ~2KB in memory
- No external dependencies, no caching needed
Alias Mappings
| Manufacturer | Aliases |
|---|---|
| New Holland | nh, newh, new-holland, newholland, new holland |
| McHale | mch, mchale |
| Briggs & Stratton | bgs, briggs, stratton, briggs-stratton, briggs and stratton |
| Great Plains | gp, great-plains, great plains |
| Club Car | cc, clubcar, club car, club-car |
| Kawasaki | kws, kawasaki |
| Ventrac | ven, ventrac |
| Hotsy | hsy, hotsy |
| EZ-Trail | ezt, ez-trail, eztrail |
| Dion AG | dio, dion, dion ag |
| AMCO | amc, amco |
| Harvest Tec | ht, harvesttec, harvest-tec, harvest tec |
| Ag-Bag | agb, ag-bag, agbag |
| HLA | hla |
50+ aliases covering 14 manufacturers. All lookups are case-insensitive. Each alias maps to exactly one manufacturer.
Manufacturer API Reference
normalizeManufacturerQuery(query: string): string[]
Convert a query to matching canonical manufacturer names.
import { normalizeManufacturerQuery } from '@crop/shared-catalog';
normalizeManufacturerQuery('newh'); // ['New Holland']
normalizeManufacturerQuery('nh'); // ['New Holland']
normalizeManufacturerQuery('briggs'); // ['Briggs & Stratton']
normalizeManufacturerQuery('unknown'); // []
normalizeManufacturerQuery(''); // []asManufacturer(key: string): { name: string; code: string }
Get manufacturer info from an alias, code, or direct name.
import { asManufacturer } from '@crop/shared-catalog';
asManufacturer('newh'); // { name: 'New Holland', code: 'NH' }
asManufacturer('nh'); // { name: 'New Holland', code: 'NH' }
asManufacturer('New Holland'); // { name: 'New Holland', code: 'NH' }
asManufacturer('McHale'); // { name: 'McHale', code: 'MCH' }
asManufacturer('unknown'); // { name: 'unknown', code: 'UNK' }Reverse Lookup
Inspect all aliases for a given manufacturer:
import { DEFAULT_ALIASES } from '@crop/shared-catalog';
const aliases = Object.entries(DEFAULT_ALIASES)
.filter(([_, info]) => info.name === 'New Holland')
.map(([alias]) => alias);
// ['nh', 'newh', 'new-holland', 'newholland', 'new holland']Adding New Aliases
- Open
packages/shared-catalog/src/config/manufacturers.ts - Add to the
DEFAULT_ALIASESmap:
export const DEFAULT_ALIASES: Record<string, { name: string; code: string }> = {
// Existing aliases...
// New alias
'holland': { name: 'New Holland', code: 'NH' },
};- Update tests in
services/search/src/__tests__/manufacturer-aliases.test.ts - Run tests:
bun test src/__tests__/manufacturer-aliases.test.ts
Autocomplete Integration
The alias system integrates into the search autocomplete pipeline at services/search/src/utils/autocomplete-suggestions.ts.
How It Works
- User types a query (e.g., "newh")
normalizeManufacturerQueryresolves it to["New Holland"]- A
termsquery is added to the Elasticsearch manufacturer aggregation with boost 4.5 - New Holland appears in autocomplete suggestions
Integration Code
import { normalizeManufacturerQuery } from '@crop/shared-catalog';
if (!skipExpensiveQueries && caps.manufacturer > 0) {
const lowerQuery = query.toLowerCase();
const normalizedMfgNames = normalizeManufacturerQuery(lowerQuery);
searches.push({ index: indexName, preference });
searches.push({
size: 0,
_source: false,
track_total_hits: false,
timeout: '100ms',
query: {
bool: {
should: [
// 1. Exact manufacturer code match (highest priority)
{
term: {
'manufacturer.code': { value: lowerQuery, boost: 10 },
},
},
// 2. Exact name match
{
term: {
'manufacturer.name.keyword': {
value: query, boost: 5, case_insensitive: true,
},
},
},
// 3. Alias expansion match
...(normalizedMfgNames.length > 0
? [{
terms: {
'manufacturer.name.keyword': normalizedMfgNames,
boost: 4.5,
},
}]
: []),
// 4. Suggest analyzer match
{
match: {
'manufacturer.name.suggest': {
query: lowerQuery, minimum_should_match: '75%', boost: 3,
},
},
},
// 5. Ngram match
{
match: {
'manufacturer.name.ngram': {
query: lowerQuery, minimum_should_match: '50%', boost: 1,
},
},
},
],
minimum_should_match: 1,
},
},
aggs: {
manufacturers: {
terms: {
field: 'manufacturer.name.keyword',
size: caps.manufacturer,
shard_size: caps.manufacturer * 3,
min_doc_count: 1,
order: { _count: 'desc' },
},
},
},
});
}Boost Tuning
The alias terms query boost (4.5) controls how much alias matches contribute to relevance scoring. Adjust based on autocomplete quality:
- Higher boost = more weight for alias matches
- 4.5 balances alias matches against direct code/name matches
Monitoring and Debugging
Enable Query Logging
export ES_LOG_QUERIES=true
bun run devCheck Alias Resolution
import { DEFAULT_ALIASES, normalizeManufacturerQuery } from '@crop/shared-catalog';
// All aliases for a manufacturer
const aliases = Object.entries(DEFAULT_ALIASES)
.filter(([_, info]) => info.name === 'New Holland')
.map(([alias]) => alias);
// How a query normalizes
const canonical = normalizeManufacturerQuery('newh');
console.log(canonical[0]); // "New Holland"File Structure
Category Taxonomy
packages/shared-types/src/categories/
├── index.ts # Public exports
├── taxonomy.ts # Category definitions and schema
├── keywords.ts # Keyword-to-category mappings
└── normalizer.ts # Categorization functionsManufacturer Aliases
| File | Purpose |
|---|---|
packages/shared-catalog/src/config/manufacturers.ts | Alias definitions and normalization helpers |
services/search/src/utils/autocomplete-suggestions.ts | Autocomplete integration point |
services/search/src/__tests__/manufacturer-aliases.test.ts | Unit tests (37 tests) |
services/search/src/utils/manufacturer-detector.ts | Manufacturer detection utilities |
Testing
Category Tests
cd packages/shared-types
bun testManufacturer Alias Tests
bun test src/__tests__/manufacturer-aliases.test.ts
# Result: 37 pass, 0 fail, 214 expect() callsTest coverage includes: all 14 manufacturers, case insensitivity, edge cases (empty/whitespace/unknown), query integration scenarios, alias reverse lookups, and performance benchmarks.
Manual Testing
# Autocomplete with alias
curl "http://localhost:3001/api/autocomplete?q=newh"
# Expected: New Holland in suggestions
# Manufacturer code
curl "http://localhost:3001/api/autocomplete?q=nh"
# Expected: New Holland in suggestions
# Category search
curl "http://localhost:3001/api/search?categoryId=filters"
curl "http://localhost:3001/api/search?q=filter" | jq '.parts[0] | {categoryId, categoryName, categoryPath}'FAQ
Why is "new" not an alias but "newh" is? Single-word short prefixes risk false positives. "new" could match non-manufacturer terms. "newh" is long enough to be a safe alias.
Can one alias map to multiple manufacturers? No. Each alias maps to exactly one manufacturer for predictable search behavior.
Is alias matching case-sensitive? No. All lookups are case-insensitive: "NH", "nh", "Nh" all work.
How do aliases affect Elasticsearch aggregations? Normalization happens client-side before the query is sent. Elasticsearch receives the expanded canonical name and returns results normally.
What if a user types "NH Tractor"? The alias system handles "NH" (resolves to "New Holland"), then the rest of the query ("Tractor") is processed normally by Elasticsearch multi-word matching.