CROP
ProjectsParts ServicesMedia

Amazon Product Data Enrichment

Comprehensive plan for enriching CROP product data using Amazon's product information.

Amazon Product Data Enrichment

Overview

Comprehensive plan for enriching CROP product data using Amazon's product information.

Full Plan: See Agent Plan output above (12,000+ words)


Quick Start

1. Setup Environment

# Install dependencies
cd /Users/vova/Code/CROP/microservices
bun install paapi5-nodejs-sdk

# Configure API keys
cp .env.example .env.enrichment

Add to .env.enrichment:

# Amazon Product Advertising API
AMAZON_PA_ACCESS_KEY=your_access_key
AMAZON_PA_SECRET_KEY=your_secret_key
AMAZON_PA_PARTNER_TAG=your_partner_tag

# Oxylabs (optional)
OXYLABS_USERNAME=your_username
OXYLABS_PASSWORD=your_password

# Rainforest API (optional)
RAINFOREST_API_KEY=your_api_key

2. Run Pilot Enrichment

# Test with 1 part
bun scripts/enrich-part.ts --partNumber=1722887SM

# Test with pilot batch (100 parts)
bun scripts/enrich-pilot-batch.ts --size=100

3. Check Results

# View enrichment stats
bun scripts/enrichment-stats.ts

# Export enriched data
bun scripts/export-enriched.ts --format=json

Architecture

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│  MongoDB    │────▶│  Enrichment  │────▶│  ES Index   │
│  (Source)   │     │  Service     │     │  (Search)   │
└─────────────┘     └──────────────┘     └─────────────┘


                    ┌──────────────┐
                    │  Amazon API  │
                    │  - PA API    │
                    │  - Oxylabs   │
                    │  - Rainforest│
                    └──────────────┘

Data Fields Enriched

FieldSourcePriority
DescriptionAmazon FeaturesP1
DimensionsProduct InfoP1
WeightProduct InfoP1
UPC/GTINProduct InfoP1
ImagesHigh-res photosP1
CategoriesBrowse nodesP2
SpecificationsTechnical InfoP2
ReviewsCustomer ratingsP3

Matching Strategy

Stage 1: UPC Match (99% confidence)

if (part.upc) {
  match = await amazonAPI.searchByUPC(part.upc);
}

Stage 2: Part Number Exact (85% confidence)

const query = `${part.partNumber}`;
matches = await amazonAPI.search(query);

Stage 3: Manufacturer + Part (75% confidence)

const query = `${manufacturer} ${part.partNumber}`;
matches = await amazonAPI.search(query);

Stage 4: Title Fuzzy (70% confidence)

const query = buildTitleQuery(part.title);
matches = await amazonAPI.search(query);

Cost Estimate

CategoryCost
One-time enrichment (56k parts)$96
Monthly maintenance$25
Re-enrichment (quarterly)$38

Total Year 1: ~$237


Implementation Phases

  • Phase 1 (Weeks 1-2): Foundation & API setup
  • Phase 2 (Week 3): Pilot enrichment (100 parts)
  • Phase 3 (Weeks 4-6): Bulk enrichment (56k parts)
  • Phase 4 (Week 7): Data merge & ES sync
  • Phase 5 (Week 8): Production deployment

Success Metrics

MetricTarget
Match rate70%
Confidence score0.85 avg
Data completeness80%
Processing time<2 weeks
API cost<$100

Files Created

  • /services/enrichment/ - Enrichment service
  • /scripts/enrich-*.ts - Enrichment scripts
  • /packages/shared-types/src/enrichment/ - Types

See full plan document for complete architecture, algorithms, and implementation details.

Sources:

On this page