ProjectsParts ServicesMedia
SEO Fields Architecture
Data Pipeline Overview
SEO Fields Architecture
Data Pipeline Overview
Scrapers → MongoDB Atlas → Transformers → Elasticsearch → Search API → FrontendLayer 1: Data Sources
| Source | Data | Fields |
|---|---|---|
| Vendor Website | Product catalog | title, partNumber, price, images, categories |
| Manual Input | Corrections, additions | title, description, categories, fitment |
| Amazon | Enrichment | description, specifications, bullets |
Layer 2: MongoDB Atlas
Connection: mongodb+srv://...@crop-gcp.dkwuhg.mongodb.net/
| Database | Collection | Status | Records |
|---|---|---|---|
| crop_stage | parts | PRODUCTION | ~10 |
| crop_prod | parts | READY | 3,740+ |
| crop_parts_archive | nh_unified | DEPRECATED | - |
Format: IndexedPart (pre-transformed)
Layer 3: Transformers
| Collection | Transformer | Action |
|---|---|---|
| crop_stage.parts | passthrough | No transform |
| crop_prod.parts | passthrough | No transform |
| nh_final_v3 | nhV3 | Full transform |
Transformer functions:
- Normalize partNumber → pnNorm
- Generate SKU:
CT-{code}-{partNumber} - Generate slug:
{code}-{title}-{pn}-{id} - Process media (GCP URLs)
- Normalize categories
Layer 4: Elasticsearch
Index: parts_current (alias)
| Field | Type | Source |
|---|---|---|
| id | keyword | MongoDB._id |
| sku | keyword | Generated |
| partNumber | keyword | MongoDB |
| pnNorm | keyword | Normalized |
| slug | keyword | Generated |
| title | text | MongoDB |
| description | text | MongoDB |
| manufacturer.name | keyword | MongoDB |
| manufacturer.code | keyword | MongoDB |
| media.images[].url | keyword | MongoDB |
| media.primaryImage | keyword | MongoDB |
| price.list.value | float | MongoDB |
| inventory.inStock | boolean | MongoDB |
| categoryPath | keyword | MongoDB |
Layer 5: Search API
URL: https://search-service-...run.app
| Endpoint | Returns |
|---|---|
| GET /api/parts/:id | Single IndexedPart |
| GET /api/search | IndexedPart[] + facets |
Layer 6: Frontend (SEO Output)
| IndexedPart Field | SEO Output | HTML Element |
|---|---|---|
| title | metaTitle | <title> |
| description | metaDescription | <meta description> |
| partNumber | schemaData.mpn | JSON-LD |
| sku | schemaData.sku | JSON-LD |
| manufacturer.name | schemaData.brand | JSON-LD |
| media.primaryImage | ogImage | <meta og:image> |
| media.images | schemaData.image | JSON-LD |
| price.list.value | schemaData.offers.price | JSON-LD |
| inventory.inStock | offers.availability | JSON-LD |
| slug | canonicalUrl | <link canonical> |
| categoryPath | BreadcrumbList | JSON-LD |
Consistency Matrix
| Data Point | MongoDB | Elasticsearch | API | SEO |
|---|---|---|---|---|
| Title | title | title | title | metaTitle |
| Description | description | description | description | metaDescription |
| Part Number | partNumber | partNumber | partNumber | schemaData.mpn |
| SKU | sku | sku | sku | schemaData.sku |
| Manufacturer | manufacturer.name | manufacturer.name | manufacturer.name | schemaData.brand |
| Primary Image | media.primaryImage | media.primaryImage | media.primaryImage | ogImage |
| Price | price.list.value | price.list.value | price.list.value | offers.price |
| Stock | inventory.inStock | inventory.inStock | inventory.inStock | offers.availability |
| Categories | categoryPath | categoryPath | categoryPath | BreadcrumbList |
Field Origin Trace
title
Scraper → MongoDB.title → ES.title → API.title → SEO.metaTitlepartNumber
Scraper → MongoDB.partNumber → Transformer(normalize) → ES.partNumber + ES.pnNorm → API → SEO.schemaData.mpnsku
Transformer(generate) → MongoDB.sku → ES.sku → API.sku → SEO.schemaData.sku
Format: CT-{manufacturer.code}-{partNumber}media.primaryImage
GCS Upload → MongoDB.media.images[0].gcpUrl → ES.media.primaryImage → API → SEO.ogImage
URL: https://storage.googleapis.com/crop_parts/...price.list
Scraper/ERP → MongoDB.price.list → ES.price.list → API → SEO.schemaData.offers
Structure: { value: number, currency: string, formatted: string }categoryPath
Scraper → Transformer(normalize) → MongoDB.categoryPath → ES.categoryPath → API → SEO.BreadcrumbList
Format: ["Parts > Filters > Hydraulic Filters"]Naming Convention
camelCase across all layers:
- MongoDB:
partNumber,categoryPath - Elasticsearch:
partNumber,categoryPath - API:
partNumber,categoryPath - SEO:
metaTitle,ogImage,schemaData
Diagram
pbcopy < /Users/vova/Code/CROP/microservices/docs/DIAGRAM_SEO_DATA_FLOW.txt