CROP
ProjectsParts Services

Photo Workflow - Implementation Plan

⚠️ Mac Studio Local Structure — READ ONLY

Photo Workflow - Implementation Plan

Context

  • Sync script already exists and works on Mac Studio
  • 467 test files already uploaded to GCP
  • Need to migrate to vendor-first structure
  • XMP embedding included (Phase 4) — native exiftool script
  • CLIP embedding OUT OF SCOPE (Phase 6, separate task)

Constraints

⚠️ Mac Studio Local Structure — READ ONLY

We CANNOT change local folder structure — controlled by Ortery software (photo capture).

Local (Ortery creates this — DO NOT MODIFY):
/Users/john/crop_parts/{Manufacturer}/{PartNumber}/
├── {pn}.jpg                    ← naming varies
├── {pn}-Front/{pn}-Front1.jpg  ← multiview
└── {pn}-360/images/lv2/        ← 360° frames

What we CAN do:

  • Modify sync scripts
  • Normalize paths during transformation
  • Transform to standardized GCP structure

Path Normalization (in sync script)

Ortery output varies. Sync script must normalize:

Ortery OutputNormalized GCP Path
5023082.jpggallery/5023082-1.jpg
5023082.JPGgallery/5023082-1.jpg (lowercase)
35.0154-Front1.jpggallery/35.0154-front-1.jpg
35.0154-FRONT1.jpggallery/35.0154-front-1.jpg
images/lv2/img0.jpg360/R01_C01.jpg
images/lv2/img23.jpg360/R01_C24.jpg

Normalization rules:

  1. Lowercase extensions: .JPG.jpg
  2. Lowercase view names: FRONTfront
  3. Add -1 suffix if missing: {pn}.jpg{pn}-1.jpg
  4. Convert 360° index: img{N}R01_C{N+1}

Phase 0: Migrate GCP Structure

Goal: Move existing files from type-first to vendor-first

Current (type-first):

gs://crop_parts/ct/gallery/{vendor}/{pn}/
gs://crop_parts/ct/360/{vendor}/{pn}/

Target (vendor-first):

gs://crop_parts/ct/{vendor}/{pn}/gallery/
gs://crop_parts/ct/{vendor}/{pn}/360/

Tasks:

  1. Create migration script:

    # Example transformation:
    # Old: gs://crop_parts/ct/gallery/bgs/53.0215/53.0215-1.jpg
    # New: gs://crop_parts/ct/bgs/53.0215/gallery/53.0215-1.jpg
  2. Steps:

    • List all files: gsutil ls -r gs://crop_parts/ct/**
    • Generate mapping (old → new)
    • Copy to new locations
    • Verify accessibility
    • Delete old paths
  3. Clear uploaded.txt after migration (reset tracking)

Output:

  • Migration script
  • Verification report

Phase 1: Update Sync Script

Goal: Modify existing crop_sync.sh for vendor-first structure

Location: /Users/john/crop_parts/sync/

Tasks:

  1. Modify transform_path() function in crop_sync.sh:

    # Current output:
    # gs://crop_parts/ct/gallery/{vendor}/{pn}/{pn}-1.jpg
    
    # New output:
    # gs://crop_parts/ct/{vendor}/{pn}/gallery/{pn}-1.jpg
  2. Update patterns:

    • simple{bucket}/ct/{vendor}/{pn}/gallery/{pn}-1.{ext}
    • multiview{bucket}/ct/{vendor}/{pn}/gallery/{pn}-{view}-{num}.{ext}
    • 360{bucket}/ct/{vendor}/{pn}/360/{pn}-R01_C{col}.{ext}
  3. Update bulk_upload.sh with same changes

  4. Update test_transform.sh for verification

Output:

  • Updated sync scripts
  • Test results

Phase 2: Add .meta.json Generation

Goal: Generate JSON metadata files alongside images

Tasks:

  1. Create generate_metadata.sh script:

    • Input: partNumber, vendor, image type
    • Query MongoDB for product data (via API)
    • Generate JSON with schema fields
    • Save as {pn}-{N}.meta.json
  2. Integrate into crop_sync.sh:

    • After successful image upload
    • Generate .meta.json
    • Upload to same GCP location
  3. JSON schema:

    {
      "schemaVersion": "1.0",
      "createdAt": "ISO timestamp",
      "product": {
        "sku", "partNumber", "pnNorm", "title",
        "manufacturer": {"name", "code"},
        "categoryPath", "equipmentFitment",
        "canonicalUrl": "https://clintontractor.com/parts/{sku}"
      },
      "image": {"type", "sortOrder", "source"},
      "company": {"name", "copyright", "website"}
    }

    Canonical URL — link to product page for SEO

  4. For 360°: NO .meta.json (fixed structure, no SEO value)

Output:

  • Metadata generator script
  • Updated sync script

Phase 3: MongoDB Integration

Goal: Update MongoDB when images uploaded

Tasks:

  1. Create API endpoint in health-analytics or search service:

    POST /api/media/upload-notification
    {
      "partNumber": "53.0215",
      "vendor": "bgs",
      "type": "gallery",
      "filename": "53.0215-1.jpg",
      "gcpUrl": "gs://crop_parts/ct/bgs/53.0215/gallery/53.0215-1.jpg"
    }
  2. API updates MongoDB:

    db.parts.updateOne(
      { partNumber: "53.0215" },
      {
        $push: { "media.images": {...} },
        $set: { "media.hasImage": true, "media.hasGcpImages": true },
        $inc: { "media.imagesCount": 1 }
      }
    )
  3. Integrate API call into crop_sync.sh:

    # After successful upload
    curl -X POST "$API_URL/api/media/upload-notification" \
      -H "Content-Type: application/json" \
      -d '{"partNumber":"...", "vendor":"...", ...}'

Output:

  • API endpoint
  • Updated sync script

Phase 4: XMP Embedding (Native Script)

Goal: Embed metadata into gallery images before upload

Approach:

  • Native bash script with exiftool
  • Runs on Mac Studio (files already there)
  • Embed XMP → then upload to GCP
  • No Docker overhead for MVP

Tasks:

  1. Install exiftool on Mac Studio:

    brew install exiftool
  2. Create embed_xmp.sh script:

    #!/bin/bash
    # Input: image path, .meta.json path
    
    IMAGE="$1"
    META="$2"
    
    # Read from .meta.json
    TITLE=$(jq -r '.product.title' "$META")
    SKU=$(jq -r '.product.sku' "$META")
    PN=$(jq -r '.product.partNumber' "$META")
    MFR=$(jq -r '.product.manufacturer.name' "$META")
    COPYRIGHT=$(jq -r '.company.copyright' "$META")
    CANONICAL=$(jq -r '.product.canonicalUrl' "$META")
    WEBSITE=$(jq -r '.company.website' "$META")
    
    # Embed XMP tags
    exiftool -overwrite_original \
      -XMP-dc:Title="$TITLE" \
      -XMP-dc:Creator="Clinton Tractor & Implement Co." \
      -XMP-dc:Rights="$COPYRIGHT" \
      -XMP-dc:Description="$TITLE - $MFR" \
      -XMP-dc:Identifier="$CANONICAL" \
      -XMP-photoshop:Credit="$MFR" \
      -XMP-photoshop:Source="$CANONICAL" \
      -XMP-xmp:CreatorTool="$WEBSITE" \
      -XMP-iptc:Keywords="$MFR,$PN,parts,agriculture" \
      "$IMAGE"
  3. Integrate into crop_sync.sh:

    # After generating .meta.json, before upload:
    if [[ "$TYPE" == "gallery" ]]; then
      ./embed_xmp.sh "$IMAGE" "$META_JSON"
    fi
    # Then upload image (now with XMP embedded)
  4. Skip 360° — no XMP embedding (24 files, no SEO value)

XMP Tag Mapping:

JSON fieldXMP tag
product.titledc:Title
company.namedc:Creator
company.copyrightdc:Rights
product.manufacturer.namephotoshop:Credit
product.canonicalUrlxmp:Identifier, photoshop:Source
company.websitexmp:CreatorTool
product.partNumber + manufactureriptc:Keywords

Output:

  • embed_xmp.sh script
  • Updated crop_sync.sh with XMP step

Future Migration Path:

MVP: Mac Studio → exiftool → upload

Scale: Upload raw → Cloud Run (exiftool in Docker) → update GCS

Phase 5: Analytics

Goal: Track upload statistics in health-analytics service

Service: services/health-analytics (already has media routes)

Metrics to Track:

MetricDescriptionPriority
Parts photographed per dayUnique partNumbersHigh
Photos per dayFile countMedium
By type (gallery/360°)BreakdownHigh
By manufacturer/vendorbgs, nhl, ven, mch, ravMedium
Failed uploadsFor monitoringHigh
Processing timeFor optimizationLow

Tasks:

  1. Add upload tracking to MongoDB:

    • Store in media.images[].uploadedAt
    • Store media.view360.uploadedAt
  2. Create aggregation queries:

    // Parts photographed per day
    db.parts.aggregate([
      { $match: { "media.images.source": "ct" } },
      { $group: {
          _id: { $dateToString: { format: "%Y-%m-%d", date: "$media.images.uploadedAt" } },
          uniqueParts: { $addToSet: "$partNumber" }
        }
      },
      { $project: { date: "$_id", count: { $size: "$uniqueParts" } } }
    ])
  3. Add API endpoints to health-analytics:

    GET /api/health/media/uploads/daily
    GET /api/health/media/uploads/by-vendor
    GET /api/health/media/uploads/by-type
  4. Dashboard UI at /dashboard/health/analytics/photo-stats


Phase 6: CLIP Embedding (FUTURE — Separate Task)

⚠️ NOT IN CURRENT SCOPE — This will be a separate Linear task in a future session

Goal: AI-powered visual search for similar products

What is CLIP?

  • AI creates numeric "fingerprint" of image (512 numbers)
  • Enables: "find similar parts" and text-to-image search
  • Different from XMP (XMP = text metadata for SEO, CLIP = AI vectors for search)

Planned Stack (JS/TS only, no Python):

// Transformers.js — CLIP in pure JavaScript
import { pipeline } from '@xenova/transformers';

const extractor = await pipeline(
  'image-feature-extraction',
  'Xenova/clip-vit-base-patch32'
);

const embedding = await extractor(imageUrl);
// → [0.23, -0.15, 0.89, ...] 512 floats

Planned Architecture:

Image uploaded to GCP

Trigger CLIP microservice (Hono + Transformers.js)

Generate embedding [512 floats]

Store in MongoDB: parts.media.images[].embedding

Enable vector search: "find similar parts"

Future Tasks (for next session):

  • Create services/clip-embedding/ Hono microservice
  • Implement image embedding endpoint
  • Create MongoDB vector index
  • Add search API: GET /search/similar?imageId=xxx
  • Integrate with product pages

XMP vs CLIP Summary:

XMP (Phase 4)CLIP (Phase 6)
WhatText metadata in fileAI vector in database
ForSEO, Google ImagesVisual search on site
WhenCurrent taskFuture task

Phase Summary

PhaseFocusStatus
0GCP migrationTODO
1Update sync scriptTODO
2.meta.json generationTODO
3MongoDB integrationTODO
4XMP embedding (native)TODO
5AnalyticsLATER
6CLIP embeddingFUTURE (separate task)

Out of Scope

  • Image optimization/resize
  • CDN configuration
  • Cloud-based XMP processing (future migration)

Workflow Order (per image)

Bella takes photo

fswatch detects new file

[Phase 2] Generate .meta.json (query MongoDB for product data)

[Phase 4] Embed XMP into image (read from .meta.json)

Upload image + .meta.json to GCP

[Phase 3] Call API to update MongoDB media.images[]

Implementation Order

Phase 0 (GCP migration) ← do first, clean slate

Phase 1 (update paths in sync scripts)

Phase 2 (.meta.json generator) ← needs API to fetch product data

Phase 4 (XMP embedding) ← runs before upload, reads .meta.json

Phase 3 (MongoDB update API) ← called after upload

Phase 5 (analytics) ← after data flowing

Open Question

Phase 2 needs product data from MongoDB:

  • Option A: Sync script calls existing search API
  • Option B: Create dedicated /api/parts/{pn}/metadata endpoint
  • Option C: Direct MongoDB query from bash (via mongosh)

Recommended: Option B — clean API, reusable

Files to Modify

Mac Studio (/Users/john/crop_parts/sync/)

FilePhaseChanges
crop_sync.sh1, 2, 3, 4transform_path, metadata gen, XMP embed, API call
bulk_upload.sh1transform_path
config.sh2, 3Add API_URL, API endpoints
test_transform.sh1Update test cases
generate_metadata.sh2NEW: fetch product data, create .meta.json
embed_xmp.sh4NEW: read .meta.json, embed XMP tags

Backend API (new endpoints needed)

EndpointPhasePurpose
GET /api/parts/{pn}/metadata2Return product data for .meta.json
POST /api/media/upload-notification3Update MongoDB after upload

SSH Access

ssh crop-john
cd ~/crop_parts/sync

Open Questions & Decisions

1. API Access from Mac Studio

Current: Only SSH access from vova's machine (ssh crop-john)

Options for API calls:

OptionStatus
Script runs on vova's machine via SSH✅ Current
Direct API from Mac Studio❌ Not available now
mongosh direct queryPossible fallback

Decision: For now, scripts that need API access run remotely via SSH, or use direct MongoDB query from Mac Studio.

2. uploadedBy — Machine ID (not person)

Question: How to identify upload source?

# In config.sh
export UPLOADER_ID="mac-studio-john"

Decision: Use machine identifier, not person name.

  • Only one machine (Mac Studio) has scanner
  • Simpler than tracking individual photographers
  • Value: mac-studio-john or just john

3. Product Not Found in MongoDB

Question: What if partNumber doesn't exist?

ScenarioAction
Product existsGenerate full .meta.json
Product NOT foundUpload image anyway, minimal metadata, log warning

Decision: Don't block upload, partial success is better than failure

4. 360° Completeness Check

Question: When to update MongoDB for 360°?

Current behavior: Each frame uploaded separately

Proposed:

# After uploading frame, check if all 24 exist
frame_count=$(gsutil ls "gs://.../${pn}/360/" | wc -l)
if [[ $frame_count -eq 24 ]]; then
    # Update MongoDB media.view360
fi

Question: What order for different views?

ViewsortOrder
simple (no suffix)1
front1
back2
left3
right4
top5
bottom6

6. Existing Media in MongoDB

Question: Part already has media.images[] from other source?

Decision:

  • Append new images (don't replace)
  • Filter by source: "ct" for CT-specific queries
  • Each source manages its own images

7. Domain for Canonical URL

Status: Domain NOT finalized yet

# In config.sh — PLACEHOLDER, will change for production
export DOMAIN="https://clintontractor.com"

Decision:

  • Use variable $DOMAIN everywhere (easy to change later)
  • Current placeholder: clintontractor.com
  • Will be updated when going to production

8. Error Handling Strategy

ErrorAction
API timeoutRetry 3x with backoff
XMP embedding failsLog error, upload without XMP
GCP upload failsRetry 3x, then queue for later
Product not foundPartial metadata, log warning

All errors logged to ~/crop_parts/logs/sync.error.log

On this page