CROP
ProjectsParts ServicesMedia

Image Metadata Architecture — Final Detailed

This document defines the complete architecture for embedding, storing, and managing product image metadata for Clinton Tractor & Implement Co. The system...

Image Metadata Architecture — Final Detailed

1. Executive Summary

This document defines the complete architecture for embedding, storing, and managing product image metadata for Clinton Tractor & Implement Co. The system ensures:

  • Metadata permanence: Data embedded in images via XMP standard
  • Brand identification: Company info travels with every image
  • Cost optimization: Hybrid cloud/local processing saves ~$500 per 5M images
  • Failover reliability: Automatic fallback from local to cloud
  • Consistency: Field names match existing IndexedPart schema

2. Company & SKU System

2.1 Company Information

FieldValue
Company NameClinton Tractor & Implement Co.
Company TypeAuthorized Reseller
Websitehttps://clintontractor.com

2.2 SKU System

Format: CT-{VENDOR}-{PARTNUMBER}

ComponentDescriptionSource FieldExample
CTClinton Tractor prefixConstantCT
VENDORManufacturer codemanufacturer.codeNHL, BNS, GRP
PARTNUMBERNormalized part numberpartNumber00907566

Examples:

  • CT-NHL-00907566 — New Holland part
  • CT-BNS-12345678 — Branson part
  • CT-GRP-ABC12345 — Generic Replacement part

3. Field Naming Convention

CRITICAL: All field names must match the existing IndexedPart schema for consistency.

3.1 Naming Rules

PatternStyleFields
All fieldscamelCasepartNumber, pnNorm, categoryName, categoryPath, equipmentFitment, etc.

3.2 Field Mapping Table

Metadata FieldIndexedPart FieldTypeStyle
product.skuskustringcamelCase
product.partNumberpartNumberstringcamelCase
product.pnNormpnNormstringcamelCase
product.titletitlestringcamelCase
product.descriptiondescriptionstringcamelCase
product.manufacturermanufacturerobjectcamelCase
product.manufacturer.namemanufacturer.namestringcamelCase
product.manufacturer.codemanufacturer.codestringcamelCase
product.categoryNamecategoryNamestring[]camelCase
product.categoryPathcategoryPathstring[]camelCase
product.equipmentFitmentequipmentFitmentstring[]camelCase
product.statusstatusstringcamelCase
image.typemedia.images[].typePartImageTypecamelCase
image.altmedia.images[].altstringcamelCase
catalog.slugslugstringcamelCase

4. Complete Metadata Schema

4.1 JSON Sidecar Structure

{
  "schemaVersion": "1.0",
  "createdAt": "2025-11-25T10:00:00Z",
  "updatedAt": "2025-11-25T10:00:00Z",

  "company": {
    "name": "Clinton Tractor & Implement Co.",
    "type": "Authorized Reseller",
    "website": "https://clintontractor.com",
    "contact": "parts@clintontractor.com"
  },

  "legal": {
    "copyright": "© 2025 Clinton Tractor & Implement Co. All rights reserved.",
    "license": "Licensed for Clinton Tractor e-commerce use only.",
    "termsUrl": "https://clintontractor.com/terms"
  },

  "product": {
    "sku": "CT-NHL-00907566",
    "partNumber": "00907566",
    "pnNorm": "907566",
    "title": "Hydraulic Filter Element",
    "description": "High-quality hydraulic filter for T7 series tractors",
    "manufacturer": {
      "name": "New Holland",
      "code": "NHL"
    },
    "categoryName": ["Filters", "Hydraulic Filters"],
    "categoryPath": ["Parts > Filters > Hydraulic Filters"],
    "equipmentFitment": ["T7.270", "T7.290", "T7.315"],
    "status": "active"
  },

  "image": {
    "type": "marketing",
    "sortOrder": 1,
    "source": "vendor_scraped",
    "originalUrl": "https://vendor.com/images/907566.jpg",
    "alt": "Hydraulic Filter Element - New Holland",
    "contentHash": "sha256:abc123..."
  },

  "catalog": {
    "url": "https://clintontractor.com/parts/ct-nhl-00907566",
    "slug": "hydraulic-filter-element-00907566"
  },

  "embedding": {
    "embedded": true,
    "embeddedAt": "2025-11-25T10:30:00Z",
    "version": "2025.11.25",
    "processor": "local-worker-01"
  }
}

4.2 XMP Mapping

JSON FieldXMP TagExample
company.namedc:creatorClinton Tractor & Implement Co.
legal.copyrightdc:rights© 2025 Clinton Tractor...
product.titledc:titleHydraulic Filter Element
product.skucrop:SKUCT-NHL-00907566
product.partNumbercrop:PartNumber00907566
product.manufacturer.namecrop:ManufacturerNew Holland
product.manufacturer.codecrop:ManufacturerCodeNHL
product.categoryPath[0]crop:CategoryParts > Filters > Hydraulic
legal.licensexmpRights:UsageTermsLicensed for Clinton...
catalog.urlcrop:CatalogURLhttps://clintontractor.com/...
embedding.versioncrop:Version2025.11.25

4.3 TypeScript Types

// types/image-metadata.ts

import type { PartImageType } from './part';

export interface CompanyInfo {
  name: string;                    // "Clinton Tractor & Implement Co."
  type: string;                    // "Authorized Reseller"
  website: string;                 // "https://clintontractor.com"
  contact?: string;                // "parts@clintontractor.com"
}

export interface LegalInfo {
  copyright: string;               // "© 2025 Clinton Tractor..."
  license: string;                 // "Licensed for Clinton Tractor..."
  termsUrl?: string;               // "https://clintontractor.com/terms"
}

export interface ProductInfo {
  sku: string;                     // "CT-NHL-00907566" (Clinton SKU)
  partNumber: string;              // "00907566" (OEM)
  pnNorm?: string;                 // "907566" (normalized)
  title: string;                   // "Hydraulic Filter Element"
  description?: string;
  manufacturer: {
    name: string;                  // "New Holland"
    code: string;                  // "NHL"
  };
  categoryName?: string[];          // ["Filters", "Hydraulic Filters"]
  categoryPath?: string[];          // ["Parts > Filters > Hydraulic"]
  equipmentFitment?: string[];     // ["T7.270", "T7.290"]
  status?: 'active' | 'discontinued' | 'superseded';
}

export interface ImageInfo {
  type: PartImageType;             // "marketing", "front", "360", etc.
  sortOrder: number;               // 1, 2, 3...
  source: string;                  // "vendor_scraped", "ct", "manual"
  originalUrl?: string;
  alt?: string;
  contentHash?: string;
}

export interface CatalogInfo {
  url: string;                     // Full product page URL
  slug: string;                    // URL slug
}

export interface EmbeddingInfo {
  embedded: boolean;
  embeddedAt?: string;             // ISO timestamp
  version?: string;                // "2025.11.25"
  processor?: string;              // "local-worker-01"
}

export interface ImageMetadata {
  schemaVersion: string;
  createdAt: string;
  updatedAt: string;
  company: CompanyInfo;
  legal: LegalInfo;
  product: ProductInfo;
  image: ImageInfo;
  catalog: CatalogInfo;
  embedding: EmbeddingInfo;
}

4.4 Zod Schema

// schemas/image-metadata.ts

import { z } from 'zod';

const CompanyInfoSchema = z.object({
  name: z.string(),
  type: z.string(),
  website: z.string().url(),
  contact: z.string().email().optional(),
});

const LegalInfoSchema = z.object({
  copyright: z.string(),
  license: z.string(),
  termsUrl: z.string().url().optional(),
});

const ManufacturerSchema = z.object({
  name: z.string(),
  code: z.string(),
});

const ProductInfoSchema = z.object({
  sku: z.string().regex(/^CT-[A-Z]{2,3}-\w+$/), // CT-NHL-00907566
  partNumber: z.string(),
  pnNorm: z.string().optional(),
  title: z.string(),
  description: z.string().optional(),
  manufacturer: ManufacturerSchema,
  categoryName: z.array(z.string()).optional(),
  categoryPath: z.array(z.string()).optional(),
  equipmentFitment: z.array(z.string()).optional(),
  status: z.enum(['active', 'discontinued', 'superseded']).optional(),
});

const ImageInfoSchema = z.object({
  type: z.string(), // PartImageType
  sortOrder: z.number().int().positive(),
  source: z.string(),
  originalUrl: z.string().url().optional(),
  alt: z.string().optional(),
  contentHash: z.string().optional(),
});

const CatalogInfoSchema = z.object({
  url: z.string().url(),
  slug: z.string(),
});

const EmbeddingInfoSchema = z.object({
  embedded: z.boolean(),
  embeddedAt: z.string().datetime().optional(),
  version: z.string().optional(),
  processor: z.string().optional(),
});

export const ImageMetadataSchema = z.object({
  schemaVersion: z.string(),
  createdAt: z.string().datetime(),
  updatedAt: z.string().datetime(),
  company: CompanyInfoSchema,
  legal: LegalInfoSchema,
  product: ProductInfoSchema,
  image: ImageInfoSchema,
  catalog: CatalogInfoSchema,
  embedding: EmbeddingInfoSchema,
});

export type ImageMetadata = z.infer<typeof ImageMetadataSchema>;

5. Storage Architecture

5.1 Data Flow

┌─────────────────────────────────────────────────────────────────┐
│                      SOURCE OF TRUTH                             │
│                                                                  │
│   GCS: gs://crop_parts/{source}/{mediaType}/{vendor}/{pn}/      │
│   ├── {pn}-1.meta.json         ← JSON sidecar (source of truth) │
│   ├── {pn}-1.jpg               ← Image with XMP embedded        │
│   ├── {pn}-2.meta.json                                          │
│   └── {pn}-2.jpg                                                │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

                              │ Sync

┌─────────────────────────────────────────────────────────────────┐
│                         INDEXES                                  │
│                                                                  │
│   MongoDB (crop_stage.parts)                                    │
│   └── media.images[].embeddedMetadata: {...}                    │
│                              │                                   │
│                              │ Sync                              │
│                              ▼                                   │
│   Elasticsearch (parts_current)                                 │
│   └── Searchable product data                                   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

5.2 GCS Folder Structure

gs://crop_parts/
├── ct/                          # Clinton Tractor sourced
│   ├── gallery/
│   │   └── nhl/                 # Vendor code lowercase
│   │       └── 00907566/        # Part number
│   │           ├── 00907566-1.jpg
│   │           ├── 00907566-1.meta.json
│   │           └── ...
│   └── 360/
│       └── nhl/
│           └── 00907566/
│               ├── frame-001.jpg
│               └── ...
├── vendor_scraped/              # Scraped from vendor sites
├── vendor_direct/               # Direct vendor uploads
└── manual/                      # Manual uploads

5.3 MongoDB Document Update

// After embedding, update MongoDB document
{
  "id": "...",
  "partNumber": "00907566",
  "sku": "CT-NHL-00907566",  // Clinton SKU added
  "media": {
    "hasImage": true,
    "hasGcpImages": true,
    "hasEmbeddedMetadata": true,  // NEW FLAG
    "imagesCount": 2,
    "images": [
      {
        "url": "https://storage.googleapis.com/crop_parts/...",
        "gcpUrl": "gs://crop_parts/...",
        "type": "marketing",
        "alt": "Hydraulic Filter Element",
        "embeddedMetadata": {
          "schemaVersion": "1.0",
          "company": { "name": "Clinton Tractor & Implement Co.", ... },
          "legal": { "copyright": "© 2025 Clinton Tractor...", ... },
          "product": { "sku": "CT-NHL-00907566", ... },
          "embedding": { "embedded": true, ... }
        }
      }
    ]
  }
}

6. Hybrid Processing System

6.1 Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                        ORCHESTRATOR                              │
│                      (Cloud Run - Always On)                     │
│                                                                  │
│   ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐   │
│   │  API     │   │  Queue   │   │  Router  │   │ Monitor  │   │
│   │ Endpoint │──►│ (Tasks)  │──►│          │──►│ & Alerts │   │
│   └──────────┘   └──────────┘   └────┬─────┘   └──────────┘   │
│                                      │                          │
└──────────────────────────────────────┼──────────────────────────┘

                         ┌─────────────┴─────────────┐
                         │                           │
                         ▼                           ▼
┌─────────────────────────────────┐   ┌─────────────────────────────────┐
│         LOCAL WORKER            │   │        CLOUD WORKER             │
│      (Physical Server)          │   │       (Cloud Run)               │
│                                 │   │                                 │
│  • Pull tasks from queue        │   │  • Auto-scale 0-1000            │
│  • Download images (bulk)       │   │  • Process on demand            │
│  • Process with Sharp           │   │  • Higher cost                  │
│  • Embed XMP                    │   │  • Used for fallback            │
│  • Upload results               │   │                                 │
│  • Report status                │   │                                 │
│                                 │   │                                 │
│  Cost: ~FREE                    │   │  Cost: $0.12/GB + CPU           │
└─────────────────────────────────┘   └─────────────────────────────────┘

6.2 Routing Logic

// router/routing-logic.ts

interface RoutingDecision {
  target: 'local' | 'cloud';
  reason: string;
}

function routeTask(task: ProcessingTask, config: SystemConfig): RoutingDecision {
  // 1. Cloud-only mode
  if (config.mode === 'cloud_only') {
    return { target: 'cloud', reason: 'Cloud-only mode' };
  }

  // 2. Local-only mode
  if (config.mode === 'local_only') {
    if (!isLocalHealthy()) {
      throw new Error('Local worker unavailable');
    }
    return { target: 'local', reason: 'Local-only mode' };
  }

  // 3. Hybrid mode - check local health
  if (!isLocalHealthy()) {
    sendAlert('fallback_activated', { reason: 'Local unhealthy' });
    return { target: 'cloud', reason: 'Fallback: local unhealthy' };
  }

  // 4. Urgent tasks go to cloud for speed
  if (task.priority === 'urgent') {
    return { target: 'cloud', reason: 'Urgent task' };
  }

  // 5. Default: use local (cost optimization)
  return { target: 'local', reason: 'Normal routing' };
}

6.3 Health Check

interface WorkerHeartbeat {
  workerId: string;
  status: 'healthy' | 'degraded' | 'unhealthy';
  timestamp: string;
  capacity: {
    queueSize: number;
    maxParallel: number;
    processingNow: number;
  };
  resources: {
    cpuPercent: number;
    memoryPercent: number;
    diskFreeGb: number;
  };
}

const FALLBACK_TRIGGERS = {
  heartbeatTimeoutMs: 120_000,  // 2 minutes
  cpuThreshold: 95,
  memoryThreshold: 90,
  diskMinGb: 10,
  queueMax: 1000,
};

7. Processing Pipeline

7.1 Generate Metadata Function

function generateMetadata(
  product: IndexedPart,
  imageType: PartImageType,
  source: string
): ImageMetadata {
  const now = new Date().toISOString();
  const sku = `CT-${product.manufacturer.code}-${product.partNumber}`;

  return {
    schemaVersion: '1.0',
    createdAt: now,
    updatedAt: now,

    company: {
      name: 'Clinton Tractor & Implement Co.',
      type: 'Authorized Reseller',
      website: 'https://clintontractor.com',
      contact: 'parts@clintontractor.com',
    },

    legal: {
      copyright: `© ${new Date().getFullYear()} Clinton Tractor & Implement Co. All rights reserved.`,
      license: 'Licensed for Clinton Tractor e-commerce use only.',
      termsUrl: 'https://clintontractor.com/terms',
    },

    product: {
      sku,
      partNumber: product.partNumber || '',
      pnNorm: product.pnNorm,
      title: product.title || product.description || '',
      manufacturer: {
        name: product.manufacturer.name,
        code: product.manufacturer.code,
      },
      categoryName: product.categoryName,
      categoryPath: product.categoryPath,
      equipmentFitment: product.equipmentFitment?.slice(0, 10),
      status: product.status || 'active',
    },

    image: {
      type: imageType,
      sortOrder: 1,
      source,
      alt: product.title || `${product.partNumber} - ${product.manufacturer.name}`,
    },

    catalog: {
      url: `https://clintontractor.com/parts/${product.slug}`,
      slug: product.slug,
    },

    embedding: {
      embedded: false,
      version: now.slice(0, 10).replace(/-/g, '.'),
      processor: WORKER_ID,
    },
  };
}

7.2 Embed XMP Function

import { exiftool } from 'exiftool-vendored';

async function embedXmpMetadata(
  imagePath: string,
  metadata: ImageMetadata
): Promise<void> {
  await exiftool.write(imagePath, {
    // Dublin Core (standard)
    'XMP-dc:Title': metadata.product.title,
    'XMP-dc:Creator': metadata.company.name,
    'XMP-dc:Rights': metadata.legal.copyright,

    // XMP Rights
    'XMP-xmpRights:UsageTerms': metadata.legal.license,

    // Custom CROP namespace
    'XMP-crop:SKU': metadata.product.sku,
    'XMP-crop:PartNumber': metadata.product.partNumber,
    'XMP-crop:PartNumberNorm': metadata.product.pnNorm,
    'XMP-crop:Manufacturer': metadata.product.manufacturer.name,
    'XMP-crop:ManufacturerCode': metadata.product.manufacturer.code,
    'XMP-crop:Category': metadata.product.categoryPath?.[0],
    'XMP-crop:EquipmentFitment': metadata.product.equipmentFitment?.join(', '),
    'XMP-crop:Status': metadata.product.status,
    'XMP-crop:ImageType': metadata.image.type,
    'XMP-crop:CatalogURL': metadata.catalog.url,
    'XMP-crop:CompanyName': metadata.company.name,
    'XMP-crop:CompanyType': metadata.company.type,
    'XMP-crop:Website': metadata.company.website,
    'XMP-crop:Version': metadata.embedding.version,
    'XMP-crop:EmbeddedAt': new Date().toISOString(),
  }, ['-overwrite_original']);
}

8. Cost Analysis

8.1 Per-Image Cost

OperationCloudLocal
Download (egress)$0.00006$0 (bulk)
CPU processing$0.000048$0
Upload (write)$0.000005$0.000005
Total per image$0.000113$0.000005

8.2 Scale Projections

ScaleCloud CostLocal CostSavings
1,000$0.12$0.01$0.11
100,000$12$0.50$11.50
1,000,000$120$5$115
5,000,000$590$50$540

9. Implementation Checklist

Phase 1: Core

  • Define TypeScript types (ImageMetadata)
  • Create Zod schema validation
  • Implement generateMetadata() function
  • Implement embedXmpMetadata() function

Phase 2: Infrastructure

  • Set up Cloud Tasks queue
  • Implement Router API
  • Create Local Worker script
  • Create Cloud Worker (Cloud Run)

Phase 3: Orchestration

  • Health check system
  • Heartbeat mechanism
  • Routing logic
  • Fallback triggers
  • Alert system (Linear/Other)

Phase 4: Migration

  • Bulk download script
  • Batch processing pipeline
  • MongoDB update script
  • Verification tools

10. Security

10.1 Data in Metadata

IncludeExclude
Company namePrices
SKUCost data
Part numberInventory levels
ManufacturerInternal IDs
CopyrightCustomer data
Catalog URLAPI keys

On this page