CROP
ProjectsParts ServicesMedia

SEO Fields Architecture

Data Pipeline Overview

SEO Fields Architecture

Data Pipeline Overview

Scrapers → MongoDB Atlas → Transformers → Elasticsearch → Search API → Frontend

Layer 1: Data Sources

SourceDataFields
Vendor WebsiteProduct catalogtitle, partNumber, price, images, categories
Manual InputCorrections, additionstitle, description, categories, fitment
AmazonEnrichmentdescription, specifications, bullets

Layer 2: MongoDB Atlas

Connection: mongodb+srv://...@crop-gcp.dkwuhg.mongodb.net/

DatabaseCollectionStatusRecords
crop_stagepartsPRODUCTION~10
crop_prodpartsREADY3,740+
crop_parts_archivenh_unifiedDEPRECATED-

Format: IndexedPart (pre-transformed)

Layer 3: Transformers

CollectionTransformerAction
crop_stage.partspassthroughNo transform
crop_prod.partspassthroughNo transform
nh_final_v3nhV3Full transform

Transformer functions:

  • Normalize partNumber → pnNorm
  • Generate SKU: CT-{code}-{partNumber}
  • Generate slug: {code}-{title}-{pn}-{id}
  • Process media (GCP URLs)
  • Normalize categories

Layer 4: Elasticsearch

Index: parts_current (alias)

FieldTypeSource
idkeywordMongoDB._id
skukeywordGenerated
partNumberkeywordMongoDB
pnNormkeywordNormalized
slugkeywordGenerated
titletextMongoDB
descriptiontextMongoDB
manufacturer.namekeywordMongoDB
manufacturer.codekeywordMongoDB
media.images[].urlkeywordMongoDB
media.primaryImagekeywordMongoDB
price.list.valuefloatMongoDB
inventory.inStockbooleanMongoDB
categoryPathkeywordMongoDB

Layer 5: Search API

URL: https://search-service-...run.app

EndpointReturns
GET /api/parts/:idSingle IndexedPart
GET /api/searchIndexedPart[] + facets

Layer 6: Frontend (SEO Output)

IndexedPart FieldSEO OutputHTML Element
titlemetaTitle<title>
descriptionmetaDescription<meta description>
partNumberschemaData.mpnJSON-LD
skuschemaData.skuJSON-LD
manufacturer.nameschemaData.brandJSON-LD
media.primaryImageogImage<meta og:image>
media.imagesschemaData.imageJSON-LD
price.list.valueschemaData.offers.priceJSON-LD
inventory.inStockoffers.availabilityJSON-LD
slugcanonicalUrl<link canonical>
categoryPathBreadcrumbListJSON-LD

Consistency Matrix

Data PointMongoDBElasticsearchAPISEO
TitletitletitletitlemetaTitle
DescriptiondescriptiondescriptiondescriptionmetaDescription
Part NumberpartNumberpartNumberpartNumberschemaData.mpn
SKUskuskuskuschemaData.sku
Manufacturermanufacturer.namemanufacturer.namemanufacturer.nameschemaData.brand
Primary Imagemedia.primaryImagemedia.primaryImagemedia.primaryImageogImage
Priceprice.list.valueprice.list.valueprice.list.valueoffers.price
Stockinventory.inStockinventory.inStockinventory.inStockoffers.availability
CategoriescategoryPathcategoryPathcategoryPathBreadcrumbList

Field Origin Trace

title

Scraper → MongoDB.title → ES.title → API.title → SEO.metaTitle

partNumber

Scraper → MongoDB.partNumber → Transformer(normalize) → ES.partNumber + ES.pnNorm → API → SEO.schemaData.mpn

sku

Transformer(generate) → MongoDB.sku → ES.sku → API.sku → SEO.schemaData.sku
Format: CT-{manufacturer.code}-{partNumber}

media.primaryImage

GCS Upload → MongoDB.media.images[0].gcpUrl → ES.media.primaryImage → API → SEO.ogImage
URL: https://storage.googleapis.com/crop_parts/...

price.list

Scraper/ERP → MongoDB.price.list → ES.price.list → API → SEO.schemaData.offers
Structure: { value: number, currency: string, formatted: string }

categoryPath

Scraper → Transformer(normalize) → MongoDB.categoryPath → ES.categoryPath → API → SEO.BreadcrumbList
Format: ["Parts > Filters > Hydraulic Filters"]

Naming Convention

camelCase across all layers:

  • MongoDB: partNumber, categoryPath
  • Elasticsearch: partNumber, categoryPath
  • API: partNumber, categoryPath
  • SEO: metaTitle, ogImage, schemaData

Diagram

pbcopy < /Users/vova/Code/CROP/microservices/docs/DIAGRAM_SEO_DATA_FLOW.txt

On this page