ProjectsParts ServicesMedia
Amazon Product Data Enrichment
Comprehensive plan for enriching CROP product data using Amazon's product information.
Amazon Product Data Enrichment
Overview
Comprehensive plan for enriching CROP product data using Amazon's product information.
Full Plan: See Agent Plan output above (12,000+ words)
Quick Start
1. Setup Environment
# Install dependencies
cd /Users/vova/Code/CROP/microservices
bun install paapi5-nodejs-sdk
# Configure API keys
cp .env.example .env.enrichmentAdd to .env.enrichment:
# Amazon Product Advertising API
AMAZON_PA_ACCESS_KEY=your_access_key
AMAZON_PA_SECRET_KEY=your_secret_key
AMAZON_PA_PARTNER_TAG=your_partner_tag
# Oxylabs (optional)
OXYLABS_USERNAME=your_username
OXYLABS_PASSWORD=your_password
# Rainforest API (optional)
RAINFOREST_API_KEY=your_api_key2. Run Pilot Enrichment
# Test with 1 part
bun scripts/enrich-part.ts --partNumber=1722887SM
# Test with pilot batch (100 parts)
bun scripts/enrich-pilot-batch.ts --size=1003. Check Results
# View enrichment stats
bun scripts/enrichment-stats.ts
# Export enriched data
bun scripts/export-enriched.ts --format=jsonArchitecture
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ MongoDB │────▶│ Enrichment │────▶│ ES Index │
│ (Source) │ │ Service │ │ (Search) │
└─────────────┘ └──────────────┘ └─────────────┘
│
▼
┌──────────────┐
│ Amazon API │
│ - PA API │
│ - Oxylabs │
│ - Rainforest│
└──────────────┘Data Fields Enriched
| Field | Source | Priority |
|---|---|---|
| Description | Amazon Features | P1 |
| Dimensions | Product Info | P1 |
| Weight | Product Info | P1 |
| UPC/GTIN | Product Info | P1 |
| Images | High-res photos | P1 |
| Categories | Browse nodes | P2 |
| Specifications | Technical Info | P2 |
| Reviews | Customer ratings | P3 |
Matching Strategy
Stage 1: UPC Match (99% confidence)
if (part.upc) {
match = await amazonAPI.searchByUPC(part.upc);
}Stage 2: Part Number Exact (85% confidence)
const query = `${part.partNumber}`;
matches = await amazonAPI.search(query);Stage 3: Manufacturer + Part (75% confidence)
const query = `${manufacturer} ${part.partNumber}`;
matches = await amazonAPI.search(query);Stage 4: Title Fuzzy (70% confidence)
const query = buildTitleQuery(part.title);
matches = await amazonAPI.search(query);Cost Estimate
| Category | Cost |
|---|---|
| One-time enrichment (56k parts) | $96 |
| Monthly maintenance | $25 |
| Re-enrichment (quarterly) | $38 |
Total Year 1: ~$237
Implementation Phases
- Phase 1 (Weeks 1-2): Foundation & API setup
- Phase 2 (Week 3): Pilot enrichment (100 parts)
- Phase 3 (Weeks 4-6): Bulk enrichment (56k parts)
- Phase 4 (Week 7): Data merge & ES sync
- Phase 5 (Week 8): Production deployment
Success Metrics
| Metric | Target |
|---|---|
| Match rate | 70% |
| Confidence score | 0.85 avg |
| Data completeness | 80% |
| Processing time | <2 weeks |
| API cost | <$100 |
Files Created
/services/enrichment/- Enrichment service/scripts/enrich-*.ts- Enrichment scripts/packages/shared-types/src/enrichment/- Types
See full plan document for complete architecture, algorithms, and implementation details.
Sources: