CROP

Equipment Fitment

Three data pipelines extracting parts compatibility, service manuals, and model images from vendor data into MongoDB.

Equipment Fitment

Repository: CT-CROP/equipment-fitment Last updated: 2026-02-26 Last synced to docs: 2026-03-10

Three pipelines from vendor data into MongoDB:

  1. Equipment Fitment — extracts parts compatibility data into equipment_fitment collection
  2. Service Manuals — extracts full-text service manual content into service_manual_documents + service_manual_pages
  3. Model Image Scraper — scrapes product images from vendor websites, uploads to GCS

Supported Vendors (6)

CodeProviderKey Trait
VNTVentracClean tables, OCR dehyphenation fixups
FERFerrisGroups with serial number ranges
KUHKuhnMultilingual (FR/EN/DE/IT)
KNZKinzePlanters/grain carts, filename-based model resolution
MCHMcHaleBalers, part descriptions as section names
HVTHarvestTecPreservative applicators

Quick Start

bun install

# Equipment Fitment
bun run extract:ventrac          # Single vendor
bun run extract:all              # All 6 vendors

# Service Manuals
bun run extract:service-manuals:mchale

# Model Images
bun run scrape:images:ventrac

Data Pipeline

PDF (vendor) → CROP-pdf-parser-service (Python OCR) → GCS bucket
  → equipment-fitment (this repo, Bun/TS) → MongoDB

MongoDB Collections

CollectionUnique Index
equipment_fitment{ vendorCode, modelNumber, partNumber, section, referenceNumber }
service_manual_documents{ documentId }
service_manual_pages{ documentId, pageNumber }
equipment_models{ modelKey }

Environment

Requires MONGODB_URI. Uses GCS Application Default Credentials.

Tech Stack

Bun, TypeScript, @google-cloud/storage, MongoDB, Biome.

On this page