CROP

Catalog Service

Category taxonomy, manufacturer aliases, and catalog management

Catalog Service

The catalog service manages product categorization and manufacturer normalization for the CROP parts platform. It provides a unified category taxonomy, keyword-based product classification, and manufacturer alias resolution for search and autocomplete.


Category Taxonomy

Unified category taxonomy based on New Holland, Briggs & Stratton, and Ventrac standards.

Location: packages/shared-types/src/categories/

Category Hierarchy

graph TD
    subgraph "Level 1 - Root Categories"
        AF[AIR FILTRATION]
        AS[ALTERNATORS & STARTERS]
        BA[BATTERIES & ACCESSORIES]
        BB[BEARINGS & BUSHINGS]
        BC[BELTS & CHAINS]
        BL[BLADES & CUTTING]
        BR[BRAKES]
        CF[CARBURETORS & FUEL]
        CT[CLUTCHES & TRANSMISSION]
        EL[ELECTRICAL]
        EP[ENGINE PARTS]
        FI[FILTERS]
        FL[FLUIDS & LUBRICANTS]
        GS[GASKETS & SEALS]
        GI[GAUGES & INSTRUMENTS]
        HW[HARDWARE]
        HY[HYDRAULICS]
        IS[IGNITION & SPARK]
        LI[LIGHTS]
        PT[PTO]
        PS[PULLEYS & SPINDLES]
        SE[SENSORS]
        ST[STEERING]
        WT[WHEELS & TIRES]
        CE[COOLING & EXHAUST]
        UN[UNCATEGORIZED]
    end

    subgraph "Level 2 - Subcategories Examples"
        AF --> AF1[Air Filters]
        AF --> AF2[Pre-Filters]

        BB --> BB1[Bearings]
        BB --> BB2[Bushings]

        FI --> FI1[Oil Filters]
        FI --> FI2[Fuel Filters]
        FI --> FI3[Hydraulic Filters]
        FI --> FI4[Cabin Filters]
        FI --> FI5[Filter Kits]

        GS --> GS1[Gaskets]
        GS --> GS2[Seals]
        GS --> GS3[O-Rings]
        GS --> GS4[Seal Kits]

        HY --> HY1[Hydraulic Pumps]
        HY --> HY2[Hydraulic Cylinders]
        HY --> HY3[Hydraulic Valves]
        HY --> HY4[Fittings & Hoses]

        EL --> EL1[Switches]
        EL --> EL2[Wiring & Connectors]
        EL --> EL3[Relays & Fuses]
    end

    style AF fill:#e1f5fe
    style FI fill:#e1f5fe
    style HY fill:#e1f5fe
    style EL fill:#e1f5fe
    style BB fill:#e1f5fe
    style GS fill:#e1f5fe
    style UN fill:#ffebee

Stats: 56 total categories (26 Level 1, 30 Level 2), 250+ keywords, average 5-10 keywords per category.

Category Document Schema

erDiagram
    CATEGORY {
        string id PK
        string slug
        string name
        int level
        string path
        array path_ids
        string parent_id FK
        array children
        array keywords
        string icon
        int sort_order
    }

    PRODUCT {
        string id PK
        string partNumber
        string title
        string manufacturer
    }

    PRODUCT_CATEGORY {
        string id
        string slug
        string name
        int level
        string path
        array path_ids
        boolean leaf
        float confidence
        string matchedKeyword
        boolean needsReview
    }

    CATEGORY ||--o{ CATEGORY : "parent_id"
    PRODUCT ||--|| PRODUCT_CATEGORY : "category[0]"

Category Fields

Category fields use camelCase naming:

FieldTypeDescription
categoryNamestring[]Array of category names
categoryPathstring[]Array of category paths
categoryIdstring[]Array of category IDs

The search API accepts both camelCase and legacy snake_case query parameters for backward compatibility:

?categoryId=filters           # Recommended
?category_id=filters          # Legacy (still supported)

API responses use camelCase fields only:

{
  "id": "part-123",
  "categoryId": ["filters"],
  "categoryName": ["Air Filters"],
  "categoryPath": ["FILTERS > Air Filters"]
}

Normalization Flow

flowchart TD
    A[Product Title] --> B{Find Keyword Match}
    B -->|Found| C[Get Category ID]
    B -->|Not Found| D[UNCATEGORIZED]

    C --> E[Calculate Confidence Score]
    E --> F{Confidence >= threshold?}
    F -->|Yes| G[Build Category Result]
    F -->|No| H[Flag for Review]

    D --> I[Set needsReview = true]
    H --> G
    I --> G

    G --> J[ProductCategoryResult]

    subgraph "Result Structure"
        J --> K[category: CategoryResult[]]
        J --> L[categoryPath: string[]]
        J --> M[categoryId: string[]]
        J --> N[categoryName: string[]]
        J --> O[breadcrumbs: string[]]
        J --> P[uncategorized: boolean]
    end

    style A fill:#e3f2fd
    style J fill:#c8e6c9
    style D fill:#ffebee
    style H fill:#fff3e0

Keyword Matching Algorithm

flowchart LR
    A[Input Title] --> B[Normalize to lowercase]
    B --> C[Sort Keywords by Priority]
    C --> D[Iterate Keywords]

    D --> E{Title includes keyword?}
    E -->|Yes| F[Return Match]
    E -->|No| G{More keywords?}
    G -->|Yes| D
    G -->|No| H[Return null]

    F --> I[categoryId]
    F --> J[keyword]
    F --> K[priority]

    style A fill:#e3f2fd
    style F fill:#c8e6c9
    style H fill:#ffebee

Confidence Score Calculation

flowchart TD
    A[Start: priority value] --> B[Base = 0.5 + priority/200]
    B --> C{Word boundary match?}
    C -->|Yes| D[+0.1]
    C -->|No| E[+0.0]
    D --> F{Starts with keyword?}
    E --> F
    F -->|Yes| G[+0.05]
    F -->|No| H[+0.0]
    G --> I{Multi-word keyword?}
    H --> I
    I -->|Yes| J[+0.1]
    I -->|No| K[+0.0]
    J --> L[Cap at 1.0]
    K --> L
    L --> M[Final Confidence]

    style A fill:#e3f2fd
    style M fill:#c8e6c9

Data Flow Architecture

flowchart TB
    subgraph "Source Data"
        BNS[BNS Parts CSV]
        VNT[VNT Parts CSV]
        NHL[NHL Parts DB]
    end

    subgraph "Transformers"
        BT[BNS Transformer]
        VT[VNT Transformer]
        NT[NHL Transformer]
    end

    subgraph "Category Normalization"
        CN[categorizeProduct]
        TX[TAXONOMY]
        KW[KEYWORD_MAPPINGS]
    end

    subgraph "MongoDB Collections"
        PS[parts_stage]
        PC[categories]
    end

    subgraph "Elasticsearch"
        ES[parts index]
    end

    BNS --> BT
    VNT --> VT
    NHL --> NT

    BT --> CN
    VT --> CN
    NT --> CN

    TX --> CN
    KW --> CN

    CN --> PS
    TX --> PC

    PS --> ES
    PC --> ES

    style CN fill:#c8e6c9
    style TX fill:#fff3e0
    style KW fill:#fff3e0

Category API Usage

Categorize a Single Product

import { categorizeProduct } from '@crop/shared-types/categories';

const result = categorizeProduct('Bearing, Ball 25 mm x 62 mm');

console.log(result);
// {
//   category: [{
//     id: 'bearings',
//     name: 'Bearings',
//     path: 'BEARINGS & BUSHINGS > Bearings',
//     confidence: 0.85,
//     matchedKeyword: 'bearing'
//   }],
//   categoryPath: ['BEARINGS & BUSHINGS > Bearings'],
//   categoryId: ['bearings'],
//   categoryName: ['Bearings'],
//   breadcrumbs: ['BEARINGS & BUSHINGS', 'Bearings'],
//   uncategorized: false
// }

Batch Categorization

import { categorizeProducts, getCategorizeStats } from '@crop/shared-types/categories';

const products = [
  { title: 'SWITCH-PTO PUSH' },
  { title: 'KIT, FILTER' },
  { title: 'Bearing, Ball' },
];

const categorized = categorizeProducts(products);
const stats = getCategorizeStats(categorized);

console.log(stats);
// {
//   total: 3,
//   categorized: 3,
//   uncategorized: 0,
//   needsReview: 0,
//   byCategory: { 'pto': 1, 'filter-kits': 1, 'bearings': 1 },
//   avgConfidence: 0.87
// }

Get All Categories

import { getRootCategories, getCategoryChildren, TAXONOMY } from '@crop/shared-types/categories';

// Get root categories
const roots = getRootCategories();
console.log(roots.map(c => c.name));
// ['AIR FILTRATION', 'ALTERNATORS & STARTERS', ...]

// Get children
const filterChildren = getCategoryChildren('filters');
console.log(filterChildren.map(c => c.name));
// ['Oil Filters', 'Fuel Filters', 'Hydraulic Filters', 'Cabin Filters', 'Filter Kits']

MongoDB Index Strategy

graph LR
    subgraph "Categories Collection Indexes"
        I1[_id - Primary]
        I2[slug - Unique]
        I3[parent_id - Reference]
        I4[level - Filter]
        I5["path_ids - Array"]
    end

    subgraph "Parts Collection Indexes"
        P1["categoryId - Category filter"]
        P2["category.id - Nested"]
        P3["uncategorized - Flag"]
        P4["category.needsReview - QA"]
    end

    style I1 fill:#e8f5e9
    style I2 fill:#e8f5e9
    style P1 fill:#e3f2fd
    style P2 fill:#e3f2fd

Manufacturer Aliases

The manufacturer alias system enables partial manufacturer name queries to resolve to canonical names in autocomplete. For example, "newh" resolves to "New Holland", "nh" resolves to "New Holland", and "briggs" resolves to "Briggs & Stratton".

Location: packages/shared-catalog/src/config/manufacturers.ts

Performance

  • Single query lookup: < 0.05ms (O(1) hash table lookup)
  • 1000 queries: < 50ms
  • Alias map size: ~2KB in memory
  • No external dependencies, no caching needed

Alias Mappings

ManufacturerAliases
New Hollandnh, newh, new-holland, newholland, new holland
McHalemch, mchale
Briggs & Strattonbgs, briggs, stratton, briggs-stratton, briggs and stratton
Great Plainsgp, great-plains, great plains
Club Carcc, clubcar, club car, club-car
Kawasakikws, kawasaki
Ventracven, ventrac
Hotsyhsy, hotsy
EZ-Trailezt, ez-trail, eztrail
Dion AGdio, dion, dion ag
AMCOamc, amco
Harvest Techt, harvesttec, harvest-tec, harvest tec
Ag-Bagagb, ag-bag, agbag
HLAhla

50+ aliases covering 14 manufacturers. All lookups are case-insensitive. Each alias maps to exactly one manufacturer.

Manufacturer API Reference

normalizeManufacturerQuery(query: string): string[]

Convert a query to matching canonical manufacturer names.

import { normalizeManufacturerQuery } from '@crop/shared-catalog';

normalizeManufacturerQuery('newh');           // ['New Holland']
normalizeManufacturerQuery('nh');             // ['New Holland']
normalizeManufacturerQuery('briggs');         // ['Briggs & Stratton']
normalizeManufacturerQuery('unknown');        // []
normalizeManufacturerQuery('');               // []

asManufacturer(key: string): { name: string; code: string }

Get manufacturer info from an alias, code, or direct name.

import { asManufacturer } from '@crop/shared-catalog';

asManufacturer('newh');        // { name: 'New Holland', code: 'NH' }
asManufacturer('nh');          // { name: 'New Holland', code: 'NH' }
asManufacturer('New Holland'); // { name: 'New Holland', code: 'NH' }
asManufacturer('McHale');      // { name: 'McHale', code: 'MCH' }
asManufacturer('unknown');     // { name: 'unknown', code: 'UNK' }

Reverse Lookup

Inspect all aliases for a given manufacturer:

import { DEFAULT_ALIASES } from '@crop/shared-catalog';

const aliases = Object.entries(DEFAULT_ALIASES)
  .filter(([_, info]) => info.name === 'New Holland')
  .map(([alias]) => alias);
// ['nh', 'newh', 'new-holland', 'newholland', 'new holland']

Adding New Aliases

  1. Open packages/shared-catalog/src/config/manufacturers.ts
  2. Add to the DEFAULT_ALIASES map:
export const DEFAULT_ALIASES: Record<string, { name: string; code: string }> = {
  // Existing aliases...

  // New alias
  'holland': { name: 'New Holland', code: 'NH' },
};
  1. Update tests in services/search/src/__tests__/manufacturer-aliases.test.ts
  2. Run tests: bun test src/__tests__/manufacturer-aliases.test.ts

Autocomplete Integration

The alias system integrates into the search autocomplete pipeline at services/search/src/utils/autocomplete-suggestions.ts.

How It Works

  1. User types a query (e.g., "newh")
  2. normalizeManufacturerQuery resolves it to ["New Holland"]
  3. A terms query is added to the Elasticsearch manufacturer aggregation with boost 4.5
  4. New Holland appears in autocomplete suggestions

Integration Code

import { normalizeManufacturerQuery } from '@crop/shared-catalog';

if (!skipExpensiveQueries && caps.manufacturer > 0) {
  const lowerQuery = query.toLowerCase();
  const normalizedMfgNames = normalizeManufacturerQuery(lowerQuery);

  searches.push({ index: indexName, preference });
  searches.push({
    size: 0,
    _source: false,
    track_total_hits: false,
    timeout: '100ms',
    query: {
      bool: {
        should: [
          // 1. Exact manufacturer code match (highest priority)
          {
            term: {
              'manufacturer.code': { value: lowerQuery, boost: 10 },
            },
          },
          // 2. Exact name match
          {
            term: {
              'manufacturer.name.keyword': {
                value: query, boost: 5, case_insensitive: true,
              },
            },
          },
          // 3. Alias expansion match
          ...(normalizedMfgNames.length > 0
            ? [{
                terms: {
                  'manufacturer.name.keyword': normalizedMfgNames,
                  boost: 4.5,
                },
              }]
            : []),
          // 4. Suggest analyzer match
          {
            match: {
              'manufacturer.name.suggest': {
                query: lowerQuery, minimum_should_match: '75%', boost: 3,
              },
            },
          },
          // 5. Ngram match
          {
            match: {
              'manufacturer.name.ngram': {
                query: lowerQuery, minimum_should_match: '50%', boost: 1,
              },
            },
          },
        ],
        minimum_should_match: 1,
      },
    },
    aggs: {
      manufacturers: {
        terms: {
          field: 'manufacturer.name.keyword',
          size: caps.manufacturer,
          shard_size: caps.manufacturer * 3,
          min_doc_count: 1,
          order: { _count: 'desc' },
        },
      },
    },
  });
}

Boost Tuning

The alias terms query boost (4.5) controls how much alias matches contribute to relevance scoring. Adjust based on autocomplete quality:

  • Higher boost = more weight for alias matches
  • 4.5 balances alias matches against direct code/name matches

Monitoring and Debugging

Enable Query Logging

export ES_LOG_QUERIES=true
bun run dev

Check Alias Resolution

import { DEFAULT_ALIASES, normalizeManufacturerQuery } from '@crop/shared-catalog';

// All aliases for a manufacturer
const aliases = Object.entries(DEFAULT_ALIASES)
  .filter(([_, info]) => info.name === 'New Holland')
  .map(([alias]) => alias);

// How a query normalizes
const canonical = normalizeManufacturerQuery('newh');
console.log(canonical[0]); // "New Holland"

File Structure

Category Taxonomy

packages/shared-types/src/categories/
├── index.ts          # Public exports
├── taxonomy.ts       # Category definitions and schema
├── keywords.ts       # Keyword-to-category mappings
└── normalizer.ts     # Categorization functions

Manufacturer Aliases

FilePurpose
packages/shared-catalog/src/config/manufacturers.tsAlias definitions and normalization helpers
services/search/src/utils/autocomplete-suggestions.tsAutocomplete integration point
services/search/src/__tests__/manufacturer-aliases.test.tsUnit tests (37 tests)
services/search/src/utils/manufacturer-detector.tsManufacturer detection utilities

Testing

Category Tests

cd packages/shared-types
bun test

Manufacturer Alias Tests

bun test src/__tests__/manufacturer-aliases.test.ts
# Result: 37 pass, 0 fail, 214 expect() calls

Test coverage includes: all 14 manufacturers, case insensitivity, edge cases (empty/whitespace/unknown), query integration scenarios, alias reverse lookups, and performance benchmarks.

Manual Testing

# Autocomplete with alias
curl "http://localhost:3001/api/autocomplete?q=newh"
# Expected: New Holland in suggestions

# Manufacturer code
curl "http://localhost:3001/api/autocomplete?q=nh"
# Expected: New Holland in suggestions

# Category search
curl "http://localhost:3001/api/search?categoryId=filters"
curl "http://localhost:3001/api/search?q=filter" | jq '.parts[0] | {categoryId, categoryName, categoryPath}'

FAQ

Why is "new" not an alias but "newh" is? Single-word short prefixes risk false positives. "new" could match non-manufacturer terms. "newh" is long enough to be a safe alias.

Can one alias map to multiple manufacturers? No. Each alias maps to exactly one manufacturer for predictable search behavior.

Is alias matching case-sensitive? No. All lookups are case-insensitive: "NH", "nh", "Nh" all work.

How do aliases affect Elasticsearch aggregations? Normalization happens client-side before the query is sent. Elasticsearch receives the expanded canonical name and returns results normally.

What if a user types "NH Tractor"? The alias system handles "NH" (resolves to "New Holland"), then the rest of the query ("Tractor") is processed normally by Elasticsearch multi-word matching.

On this page