Image Metadata Architecture — Final Detailed
This document defines the complete architecture for embedding, storing, and managing product image metadata for Clinton Tractor & Implement Co. The system...
Image Metadata Architecture — Final Detailed
1. Executive Summary
This document defines the complete architecture for embedding, storing, and managing product image metadata for Clinton Tractor & Implement Co. The system ensures:
- Metadata permanence: Data embedded in images via XMP standard
- Brand identification: Company info travels with every image
- Cost optimization: Hybrid cloud/local processing saves ~$500 per 5M images
- Failover reliability: Automatic fallback from local to cloud
- Consistency: Field names match existing IndexedPart schema
2. Company & SKU System
2.1 Company Information
| Field | Value |
|---|---|
| Company Name | Clinton Tractor & Implement Co. |
| Company Type | Authorized Reseller |
| Website | https://clintontractor.com |
2.2 SKU System
Format: CT-{VENDOR}-{PARTNUMBER}
| Component | Description | Source Field | Example |
|---|---|---|---|
CT | Clinton Tractor prefix | Constant | CT |
VENDOR | Manufacturer code | manufacturer.code | NHL, BNS, GRP |
PARTNUMBER | Normalized part number | partNumber | 00907566 |
Examples:
CT-NHL-00907566— New Holland partCT-BNS-12345678— Branson partCT-GRP-ABC12345— Generic Replacement part
3. Field Naming Convention
CRITICAL: All field names must match the existing IndexedPart schema for consistency.
3.1 Naming Rules
| Pattern | Style | Fields |
|---|---|---|
| All fields | camelCase | partNumber, pnNorm, categoryName, categoryPath, equipmentFitment, etc. |
3.2 Field Mapping Table
| Metadata Field | IndexedPart Field | Type | Style |
|---|---|---|---|
product.sku | sku | string | camelCase |
product.partNumber | partNumber | string | camelCase |
product.pnNorm | pnNorm | string | camelCase |
product.title | title | string | camelCase |
product.description | description | string | camelCase |
product.manufacturer | manufacturer | object | camelCase |
product.manufacturer.name | manufacturer.name | string | camelCase |
product.manufacturer.code | manufacturer.code | string | camelCase |
product.categoryName | categoryName | string[] | camelCase |
product.categoryPath | categoryPath | string[] | camelCase |
product.equipmentFitment | equipmentFitment | string[] | camelCase |
product.status | status | string | camelCase |
image.type | media.images[].type | PartImageType | camelCase |
image.alt | media.images[].alt | string | camelCase |
catalog.slug | slug | string | camelCase |
4. Complete Metadata Schema
4.1 JSON Sidecar Structure
{
"schemaVersion": "1.0",
"createdAt": "2025-11-25T10:00:00Z",
"updatedAt": "2025-11-25T10:00:00Z",
"company": {
"name": "Clinton Tractor & Implement Co.",
"type": "Authorized Reseller",
"website": "https://clintontractor.com",
"contact": "parts@clintontractor.com"
},
"legal": {
"copyright": "© 2025 Clinton Tractor & Implement Co. All rights reserved.",
"license": "Licensed for Clinton Tractor e-commerce use only.",
"termsUrl": "https://clintontractor.com/terms"
},
"product": {
"sku": "CT-NHL-00907566",
"partNumber": "00907566",
"pnNorm": "907566",
"title": "Hydraulic Filter Element",
"description": "High-quality hydraulic filter for T7 series tractors",
"manufacturer": {
"name": "New Holland",
"code": "NHL"
},
"categoryName": ["Filters", "Hydraulic Filters"],
"categoryPath": ["Parts > Filters > Hydraulic Filters"],
"equipmentFitment": ["T7.270", "T7.290", "T7.315"],
"status": "active"
},
"image": {
"type": "marketing",
"sortOrder": 1,
"source": "vendor_scraped",
"originalUrl": "https://vendor.com/images/907566.jpg",
"alt": "Hydraulic Filter Element - New Holland",
"contentHash": "sha256:abc123..."
},
"catalog": {
"url": "https://clintontractor.com/parts/ct-nhl-00907566",
"slug": "hydraulic-filter-element-00907566"
},
"embedding": {
"embedded": true,
"embeddedAt": "2025-11-25T10:30:00Z",
"version": "2025.11.25",
"processor": "local-worker-01"
}
}4.2 XMP Mapping
| JSON Field | XMP Tag | Example |
|---|---|---|
company.name | dc:creator | Clinton Tractor & Implement Co. |
legal.copyright | dc:rights | © 2025 Clinton Tractor... |
product.title | dc:title | Hydraulic Filter Element |
product.sku | crop:SKU | CT-NHL-00907566 |
product.partNumber | crop:PartNumber | 00907566 |
product.manufacturer.name | crop:Manufacturer | New Holland |
product.manufacturer.code | crop:ManufacturerCode | NHL |
product.categoryPath[0] | crop:Category | Parts > Filters > Hydraulic |
legal.license | xmpRights:UsageTerms | Licensed for Clinton... |
catalog.url | crop:CatalogURL | https://clintontractor.com/... |
embedding.version | crop:Version | 2025.11.25 |
4.3 TypeScript Types
// types/image-metadata.ts
import type { PartImageType } from './part';
export interface CompanyInfo {
name: string; // "Clinton Tractor & Implement Co."
type: string; // "Authorized Reseller"
website: string; // "https://clintontractor.com"
contact?: string; // "parts@clintontractor.com"
}
export interface LegalInfo {
copyright: string; // "© 2025 Clinton Tractor..."
license: string; // "Licensed for Clinton Tractor..."
termsUrl?: string; // "https://clintontractor.com/terms"
}
export interface ProductInfo {
sku: string; // "CT-NHL-00907566" (Clinton SKU)
partNumber: string; // "00907566" (OEM)
pnNorm?: string; // "907566" (normalized)
title: string; // "Hydraulic Filter Element"
description?: string;
manufacturer: {
name: string; // "New Holland"
code: string; // "NHL"
};
categoryName?: string[]; // ["Filters", "Hydraulic Filters"]
categoryPath?: string[]; // ["Parts > Filters > Hydraulic"]
equipmentFitment?: string[]; // ["T7.270", "T7.290"]
status?: 'active' | 'discontinued' | 'superseded';
}
export interface ImageInfo {
type: PartImageType; // "marketing", "front", "360", etc.
sortOrder: number; // 1, 2, 3...
source: string; // "vendor_scraped", "ct", "manual"
originalUrl?: string;
alt?: string;
contentHash?: string;
}
export interface CatalogInfo {
url: string; // Full product page URL
slug: string; // URL slug
}
export interface EmbeddingInfo {
embedded: boolean;
embeddedAt?: string; // ISO timestamp
version?: string; // "2025.11.25"
processor?: string; // "local-worker-01"
}
export interface ImageMetadata {
schemaVersion: string;
createdAt: string;
updatedAt: string;
company: CompanyInfo;
legal: LegalInfo;
product: ProductInfo;
image: ImageInfo;
catalog: CatalogInfo;
embedding: EmbeddingInfo;
}4.4 Zod Schema
// schemas/image-metadata.ts
import { z } from 'zod';
const CompanyInfoSchema = z.object({
name: z.string(),
type: z.string(),
website: z.string().url(),
contact: z.string().email().optional(),
});
const LegalInfoSchema = z.object({
copyright: z.string(),
license: z.string(),
termsUrl: z.string().url().optional(),
});
const ManufacturerSchema = z.object({
name: z.string(),
code: z.string(),
});
const ProductInfoSchema = z.object({
sku: z.string().regex(/^CT-[A-Z]{2,3}-\w+$/), // CT-NHL-00907566
partNumber: z.string(),
pnNorm: z.string().optional(),
title: z.string(),
description: z.string().optional(),
manufacturer: ManufacturerSchema,
categoryName: z.array(z.string()).optional(),
categoryPath: z.array(z.string()).optional(),
equipmentFitment: z.array(z.string()).optional(),
status: z.enum(['active', 'discontinued', 'superseded']).optional(),
});
const ImageInfoSchema = z.object({
type: z.string(), // PartImageType
sortOrder: z.number().int().positive(),
source: z.string(),
originalUrl: z.string().url().optional(),
alt: z.string().optional(),
contentHash: z.string().optional(),
});
const CatalogInfoSchema = z.object({
url: z.string().url(),
slug: z.string(),
});
const EmbeddingInfoSchema = z.object({
embedded: z.boolean(),
embeddedAt: z.string().datetime().optional(),
version: z.string().optional(),
processor: z.string().optional(),
});
export const ImageMetadataSchema = z.object({
schemaVersion: z.string(),
createdAt: z.string().datetime(),
updatedAt: z.string().datetime(),
company: CompanyInfoSchema,
legal: LegalInfoSchema,
product: ProductInfoSchema,
image: ImageInfoSchema,
catalog: CatalogInfoSchema,
embedding: EmbeddingInfoSchema,
});
export type ImageMetadata = z.infer<typeof ImageMetadataSchema>;5. Storage Architecture
5.1 Data Flow
┌─────────────────────────────────────────────────────────────────┐
│ SOURCE OF TRUTH │
│ │
│ GCS: gs://crop_parts/{source}/{mediaType}/{vendor}/{pn}/ │
│ ├── {pn}-1.meta.json ← JSON sidecar (source of truth) │
│ ├── {pn}-1.jpg ← Image with XMP embedded │
│ ├── {pn}-2.meta.json │
│ └── {pn}-2.jpg │
│ │
└─────────────────────────────────────────────────────────────────┘
│
│ Sync
▼
┌─────────────────────────────────────────────────────────────────┐
│ INDEXES │
│ │
│ MongoDB (crop_stage.parts) │
│ └── media.images[].embeddedMetadata: {...} │
│ │ │
│ │ Sync │
│ ▼ │
│ Elasticsearch (parts_current) │
│ └── Searchable product data │
│ │
└─────────────────────────────────────────────────────────────────┘5.2 GCS Folder Structure
gs://crop_parts/
├── ct/ # Clinton Tractor sourced
│ ├── gallery/
│ │ └── nhl/ # Vendor code lowercase
│ │ └── 00907566/ # Part number
│ │ ├── 00907566-1.jpg
│ │ ├── 00907566-1.meta.json
│ │ └── ...
│ └── 360/
│ └── nhl/
│ └── 00907566/
│ ├── frame-001.jpg
│ └── ...
├── vendor_scraped/ # Scraped from vendor sites
├── vendor_direct/ # Direct vendor uploads
└── manual/ # Manual uploads5.3 MongoDB Document Update
// After embedding, update MongoDB document
{
"id": "...",
"partNumber": "00907566",
"sku": "CT-NHL-00907566", // Clinton SKU added
"media": {
"hasImage": true,
"hasGcpImages": true,
"hasEmbeddedMetadata": true, // NEW FLAG
"imagesCount": 2,
"images": [
{
"url": "https://storage.googleapis.com/crop_parts/...",
"gcpUrl": "gs://crop_parts/...",
"type": "marketing",
"alt": "Hydraulic Filter Element",
"embeddedMetadata": {
"schemaVersion": "1.0",
"company": { "name": "Clinton Tractor & Implement Co.", ... },
"legal": { "copyright": "© 2025 Clinton Tractor...", ... },
"product": { "sku": "CT-NHL-00907566", ... },
"embedding": { "embedded": true, ... }
}
}
]
}
}6. Hybrid Processing System
6.1 Architecture Overview
┌─────────────────────────────────────────────────────────────────┐
│ ORCHESTRATOR │
│ (Cloud Run - Always On) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ API │ │ Queue │ │ Router │ │ Monitor │ │
│ │ Endpoint │──►│ (Tasks) │──►│ │──►│ & Alerts │ │
│ └──────────┘ └──────────┘ └────┬─────┘ └──────────┘ │
│ │ │
└──────────────────────────────────────┼──────────────────────────┘
│
┌─────────────┴─────────────┐
│ │
▼ ▼
┌─────────────────────────────────┐ ┌─────────────────────────────────┐
│ LOCAL WORKER │ │ CLOUD WORKER │
│ (Physical Server) │ │ (Cloud Run) │
│ │ │ │
│ • Pull tasks from queue │ │ • Auto-scale 0-1000 │
│ • Download images (bulk) │ │ • Process on demand │
│ • Process with Sharp │ │ • Higher cost │
│ • Embed XMP │ │ • Used for fallback │
│ • Upload results │ │ │
│ • Report status │ │ │
│ │ │ │
│ Cost: ~FREE │ │ Cost: $0.12/GB + CPU │
└─────────────────────────────────┘ └─────────────────────────────────┘6.2 Routing Logic
// router/routing-logic.ts
interface RoutingDecision {
target: 'local' | 'cloud';
reason: string;
}
function routeTask(task: ProcessingTask, config: SystemConfig): RoutingDecision {
// 1. Cloud-only mode
if (config.mode === 'cloud_only') {
return { target: 'cloud', reason: 'Cloud-only mode' };
}
// 2. Local-only mode
if (config.mode === 'local_only') {
if (!isLocalHealthy()) {
throw new Error('Local worker unavailable');
}
return { target: 'local', reason: 'Local-only mode' };
}
// 3. Hybrid mode - check local health
if (!isLocalHealthy()) {
sendAlert('fallback_activated', { reason: 'Local unhealthy' });
return { target: 'cloud', reason: 'Fallback: local unhealthy' };
}
// 4. Urgent tasks go to cloud for speed
if (task.priority === 'urgent') {
return { target: 'cloud', reason: 'Urgent task' };
}
// 5. Default: use local (cost optimization)
return { target: 'local', reason: 'Normal routing' };
}6.3 Health Check
interface WorkerHeartbeat {
workerId: string;
status: 'healthy' | 'degraded' | 'unhealthy';
timestamp: string;
capacity: {
queueSize: number;
maxParallel: number;
processingNow: number;
};
resources: {
cpuPercent: number;
memoryPercent: number;
diskFreeGb: number;
};
}
const FALLBACK_TRIGGERS = {
heartbeatTimeoutMs: 120_000, // 2 minutes
cpuThreshold: 95,
memoryThreshold: 90,
diskMinGb: 10,
queueMax: 1000,
};7. Processing Pipeline
7.1 Generate Metadata Function
function generateMetadata(
product: IndexedPart,
imageType: PartImageType,
source: string
): ImageMetadata {
const now = new Date().toISOString();
const sku = `CT-${product.manufacturer.code}-${product.partNumber}`;
return {
schemaVersion: '1.0',
createdAt: now,
updatedAt: now,
company: {
name: 'Clinton Tractor & Implement Co.',
type: 'Authorized Reseller',
website: 'https://clintontractor.com',
contact: 'parts@clintontractor.com',
},
legal: {
copyright: `© ${new Date().getFullYear()} Clinton Tractor & Implement Co. All rights reserved.`,
license: 'Licensed for Clinton Tractor e-commerce use only.',
termsUrl: 'https://clintontractor.com/terms',
},
product: {
sku,
partNumber: product.partNumber || '',
pnNorm: product.pnNorm,
title: product.title || product.description || '',
manufacturer: {
name: product.manufacturer.name,
code: product.manufacturer.code,
},
categoryName: product.categoryName,
categoryPath: product.categoryPath,
equipmentFitment: product.equipmentFitment?.slice(0, 10),
status: product.status || 'active',
},
image: {
type: imageType,
sortOrder: 1,
source,
alt: product.title || `${product.partNumber} - ${product.manufacturer.name}`,
},
catalog: {
url: `https://clintontractor.com/parts/${product.slug}`,
slug: product.slug,
},
embedding: {
embedded: false,
version: now.slice(0, 10).replace(/-/g, '.'),
processor: WORKER_ID,
},
};
}7.2 Embed XMP Function
import { exiftool } from 'exiftool-vendored';
async function embedXmpMetadata(
imagePath: string,
metadata: ImageMetadata
): Promise<void> {
await exiftool.write(imagePath, {
// Dublin Core (standard)
'XMP-dc:Title': metadata.product.title,
'XMP-dc:Creator': metadata.company.name,
'XMP-dc:Rights': metadata.legal.copyright,
// XMP Rights
'XMP-xmpRights:UsageTerms': metadata.legal.license,
// Custom CROP namespace
'XMP-crop:SKU': metadata.product.sku,
'XMP-crop:PartNumber': metadata.product.partNumber,
'XMP-crop:PartNumberNorm': metadata.product.pnNorm,
'XMP-crop:Manufacturer': metadata.product.manufacturer.name,
'XMP-crop:ManufacturerCode': metadata.product.manufacturer.code,
'XMP-crop:Category': metadata.product.categoryPath?.[0],
'XMP-crop:EquipmentFitment': metadata.product.equipmentFitment?.join(', '),
'XMP-crop:Status': metadata.product.status,
'XMP-crop:ImageType': metadata.image.type,
'XMP-crop:CatalogURL': metadata.catalog.url,
'XMP-crop:CompanyName': metadata.company.name,
'XMP-crop:CompanyType': metadata.company.type,
'XMP-crop:Website': metadata.company.website,
'XMP-crop:Version': metadata.embedding.version,
'XMP-crop:EmbeddedAt': new Date().toISOString(),
}, ['-overwrite_original']);
}8. Cost Analysis
8.1 Per-Image Cost
| Operation | Cloud | Local |
|---|---|---|
| Download (egress) | $0.00006 | $0 (bulk) |
| CPU processing | $0.000048 | $0 |
| Upload (write) | $0.000005 | $0.000005 |
| Total per image | $0.000113 | $0.000005 |
8.2 Scale Projections
| Scale | Cloud Cost | Local Cost | Savings |
|---|---|---|---|
| 1,000 | $0.12 | $0.01 | $0.11 |
| 100,000 | $12 | $0.50 | $11.50 |
| 1,000,000 | $120 | $5 | $115 |
| 5,000,000 | $590 | $50 | $540 |
9. Implementation Checklist
Phase 1: Core
- Define TypeScript types (
ImageMetadata) - Create Zod schema validation
- Implement
generateMetadata()function - Implement
embedXmpMetadata()function
Phase 2: Infrastructure
- Set up Cloud Tasks queue
- Implement Router API
- Create Local Worker script
- Create Cloud Worker (Cloud Run)
Phase 3: Orchestration
- Health check system
- Heartbeat mechanism
- Routing logic
- Fallback triggers
- Alert system (Linear/Other)
Phase 4: Migration
- Bulk download script
- Batch processing pipeline
- MongoDB update script
- Verification tools
10. Security
10.1 Data in Metadata
| Include | Exclude |
|---|---|
| Company name | Prices |
| SKU | Cost data |
| Part number | Inventory levels |
| Manufacturer | Internal IDs |
| Copyright | Customer data |
| Catalog URL | API keys |
Image Metadata Architecture
Embed product metadata directly into images (XMP) so it's never lost, even when images are shared or found via Google.
Media Coverage API
The Media Coverage API provides comprehensive analytics on media richness across the NHL parts catalog (3,740 parts). It tracks three media types: gallery...