Photo Workflow - Implementation Plan

Context

Sync script already exists and works on Mac Studio
467 test files already uploaded to GCP
Need to migrate to vendor-first structure
XMP embedding included (Phase 4) — native exiftool script
CLIP embedding OUT OF SCOPE (Phase 6, separate task)

Constraints

⚠️ Mac Studio Local Structure — READ ONLY

We CANNOT change local folder structure — controlled by Ortery software (photo capture).

Local (Ortery creates this — DO NOT MODIFY):
/Users/john/crop_parts/{Manufacturer}/{PartNumber}/
├── {pn}.jpg                    ← naming varies
├── {pn}-Front/{pn}-Front1.jpg  ← multiview
└── {pn}-360/images/lv2/        ← 360° frames

What we CAN do:

Modify sync scripts
Normalize paths during transformation
Transform to standardized GCP structure

Path Normalization (in sync script)

Ortery output varies. Sync script must normalize:

Ortery Output	Normalized GCP Path
`5023082.jpg`	`gallery/5023082-1.jpg`
`5023082.JPG`	`gallery/5023082-1.jpg` (lowercase)
`35.0154-Front1.jpg`	`gallery/35.0154-front-1.jpg`
`35.0154-FRONT1.jpg`	`gallery/35.0154-front-1.jpg`
`images/lv2/img0.jpg`	`360/R01_C01.jpg`
`images/lv2/img23.jpg`	`360/R01_C24.jpg`

Normalization rules:

Lowercase extensions: .JPG → .jpg
Lowercase view names: FRONT → front
Add -1 suffix if missing: {pn}.jpg → {pn}-1.jpg
Convert 360° index: img{N} → R01_C{N+1}

Phase 0: Migrate GCP Structure

Goal: Move existing files from type-first to vendor-first

Current (type-first):

gs://crop_parts/ct/gallery/{vendor}/{pn}/
gs://crop_parts/ct/360/{vendor}/{pn}/

Target (vendor-first):

gs://crop_parts/ct/{vendor}/{pn}/gallery/
gs://crop_parts/ct/{vendor}/{pn}/360/

Tasks:

Create migration script:

# Example transformation:
# Old: gs://crop_parts/ct/gallery/bgs/53.0215/53.0215-1.jpg
# New: gs://crop_parts/ct/bgs/53.0215/gallery/53.0215-1.jpg

Steps:
- List all files: gsutil ls -r gs://crop_parts/ct/**
- Generate mapping (old → new)
- Copy to new locations
- Verify accessibility
- Delete old paths
Clear uploaded.txt after migration (reset tracking)

Output:

Migration script
Verification report

Phase 1: Update Sync Script

Goal: Modify existing crop_sync.sh for vendor-first structure

Location: `/Users/john/crop_parts/sync/`

Tasks:

Modify transform_path() function in crop_sync.sh:

# Current output:
# gs://crop_parts/ct/gallery/{vendor}/{pn}/{pn}-1.jpg

# New output:
# gs://crop_parts/ct/{vendor}/{pn}/gallery/{pn}-1.jpg

Update patterns:
- simple → {bucket}/ct/{vendor}/{pn}/gallery/{pn}-1.{ext}
- multiview → {bucket}/ct/{vendor}/{pn}/gallery/{pn}-{view}-{num}.{ext}
- 360 → {bucket}/ct/{vendor}/{pn}/360/{pn}-R01_C{col}.{ext}
Update bulk_upload.sh with same changes
Update test_transform.sh for verification

Output:

Updated sync scripts
Test results

Phase 2: Add .meta.json Generation

Goal: Generate JSON metadata files alongside images

Tasks:

Create generate_metadata.sh script:
- Input: partNumber, vendor, image type
- Query MongoDB for product data (via API)
- Generate JSON with schema fields
- Save as {pn}-{N}.meta.json
Integrate into crop_sync.sh:
- After successful image upload
- Generate .meta.json
- Upload to same GCP location

JSON schema:

{
  "schemaVersion": "1.0",
  "createdAt": "ISO timestamp",
  "product": {
    "sku", "partNumber", "pnNorm", "title",
    "manufacturer": {"name", "code"},
    "categoryPath", "equipmentFitment",
    "canonicalUrl": "https://clintontractor.com/parts/{sku}"
  },
  "image": {"type", "sortOrder", "source"},
  "company": {"name", "copyright", "website"}
}

Canonical URL — link to product page for SEO

For 360°: NO .meta.json (fixed structure, no SEO value)

Output:

Metadata generator script
Updated sync script

Phase 3: MongoDB Integration

Goal: Update MongoDB when images uploaded

Tasks:

Create API endpoint in health-analytics or search service:

POST /api/media/upload-notification
{
  "partNumber": "53.0215",
  "vendor": "bgs",
  "type": "gallery",
  "filename": "53.0215-1.jpg",
  "gcpUrl": "gs://crop_parts/ct/bgs/53.0215/gallery/53.0215-1.jpg"
}

API updates MongoDB:

db.parts.updateOne(
  { partNumber: "53.0215" },
  {
    $push: { "media.images": {...} },
    $set: { "media.hasImage": true, "media.hasGcpImages": true },
    $inc: { "media.imagesCount": 1 }
  }
)

Integrate API call into crop_sync.sh:

# After successful upload
curl -X POST "$API_URL/api/media/upload-notification" \
  -H "Content-Type: application/json" \
  -d '{"partNumber":"...", "vendor":"...", ...}'

Output:

API endpoint
Updated sync script

Phase 4: XMP Embedding (Native Script)

Goal: Embed metadata into gallery images before upload

Approach:

Native bash script with exiftool
Runs on Mac Studio (files already there)
Embed XMP → then upload to GCP
No Docker overhead for MVP

Tasks:

Install exiftool on Mac Studio:
```
brew install exiftool
```

Create embed_xmp.sh script:

#!/bin/bash
# Input: image path, .meta.json path

IMAGE="$1"
META="$2"

# Read from .meta.json
TITLE=$(jq -r '.product.title' "$META")
SKU=$(jq -r '.product.sku' "$META")
PN=$(jq -r '.product.partNumber' "$META")
MFR=$(jq -r '.product.manufacturer.name' "$META")
COPYRIGHT=$(jq -r '.company.copyright' "$META")
CANONICAL=$(jq -r '.product.canonicalUrl' "$META")
WEBSITE=$(jq -r '.company.website' "$META")

# Embed XMP tags
exiftool -overwrite_original \
  -XMP-dc:Title="$TITLE" \
  -XMP-dc:Creator="Clinton Tractor & Implement Co." \
  -XMP-dc:Rights="$COPYRIGHT" \
  -XMP-dc:Description="$TITLE - $MFR" \
  -XMP-dc:Identifier="$CANONICAL" \
  -XMP-photoshop:Credit="$MFR" \
  -XMP-photoshop:Source="$CANONICAL" \
  -XMP-xmp:CreatorTool="$WEBSITE" \
  -XMP-iptc:Keywords="$MFR,$PN,parts,agriculture" \
  "$IMAGE"

Integrate into crop_sync.sh:

# After generating .meta.json, before upload:
if [[ "$TYPE" == "gallery" ]]; then
  ./embed_xmp.sh "$IMAGE" "$META_JSON"
fi
# Then upload image (now with XMP embedded)

Skip 360° — no XMP embedding (24 files, no SEO value)

XMP Tag Mapping:

JSON field	XMP tag
product.title	dc:Title
company.name	dc:Creator
company.copyright	dc:Rights
product.manufacturer.name	photoshop:Credit
product.canonicalUrl	xmp:Identifier, photoshop:Source
company.website	xmp:CreatorTool
product.partNumber + manufacturer	iptc:Keywords

Output:

embed_xmp.sh script
Updated crop_sync.sh with XMP step

Future Migration Path:

MVP: Mac Studio → exiftool → upload
     ↓
Scale: Upload raw → Cloud Run (exiftool in Docker) → update GCS

Phase 5: Analytics

Goal: Track upload statistics in health-analytics service

Service: services/health-analytics (already has media routes)

Metrics to Track:

Metric	Description	Priority
Parts photographed per day	Unique partNumbers	High
Photos per day	File count	Medium
By type (gallery/360°)	Breakdown	High
By manufacturer/vendor	bgs, nhl, ven, mch, rav	Medium
Failed uploads	For monitoring	High
Processing time	For optimization	Low

Tasks:

Add upload tracking to MongoDB:
- Store in media.images[].uploadedAt
- Store media.view360.uploadedAt

Create aggregation queries:

// Parts photographed per day
db.parts.aggregate([
  { $match: { "media.images.source": "ct" } },
  { $group: {
      _id: { $dateToString: { format: "%Y-%m-%d", date: "$media.images.uploadedAt" } },
      uniqueParts: { $addToSet: "$partNumber" }
    }
  },
  { $project: { date: "$_id", count: { $size: "$uniqueParts" } } }
])

Add API endpoints to health-analytics:

GET /api/health/media/uploads/daily
GET /api/health/media/uploads/by-vendor
GET /api/health/media/uploads/by-type

Dashboard UI at /dashboard/health/analytics/photo-stats

Phase 6: CLIP Embedding (FUTURE — Separate Task)

⚠️ NOT IN CURRENT SCOPE — This will be a separate Linear task in a future session

Goal: AI-powered visual search for similar products

What is CLIP?

AI creates numeric "fingerprint" of image (512 numbers)
Enables: "find similar parts" and text-to-image search
Different from XMP (XMP = text metadata for SEO, CLIP = AI vectors for search)

Planned Stack (JS/TS only, no Python):

// Transformers.js — CLIP in pure JavaScript
import { pipeline } from '@xenova/transformers';

const extractor = await pipeline(
  'image-feature-extraction',
  'Xenova/clip-vit-base-patch32'
);

const embedding = await extractor(imageUrl);
// → [0.23, -0.15, 0.89, ...] 512 floats

Planned Architecture:

Image uploaded to GCP
        ↓
Trigger CLIP microservice (Hono + Transformers.js)
        ↓
Generate embedding [512 floats]
        ↓
Store in MongoDB: parts.media.images[].embedding
        ↓
Enable vector search: "find similar parts"

Future Tasks (for next session):

Create services/clip-embedding/ Hono microservice
Implement image embedding endpoint
Create MongoDB vector index
Add search API: GET /search/similar?imageId=xxx
Integrate with product pages

XMP vs CLIP Summary:

	XMP (Phase 4)	CLIP (Phase 6)
What	Text metadata in file	AI vector in database
For	SEO, Google Images	Visual search on site
When	Current task	Future task

Phase Summary

Phase	Focus	Status
0	GCP migration	TODO
1	Update sync script	TODO
2	.meta.json generation	TODO
3	MongoDB integration	TODO
4	XMP embedding (native)	TODO
5	Analytics	LATER
6	CLIP embedding	FUTURE (separate task)

Out of Scope

Image optimization/resize
CDN configuration
Cloud-based XMP processing (future migration)

Workflow Order (per image)

Bella takes photo
        ↓
fswatch detects new file
        ↓
[Phase 2] Generate .meta.json (query MongoDB for product data)
        ↓
[Phase 4] Embed XMP into image (read from .meta.json)
        ↓
Upload image + .meta.json to GCP
        ↓
[Phase 3] Call API to update MongoDB media.images[]

Implementation Order

Phase 0 (GCP migration) ← do first, clean slate
    ↓
Phase 1 (update paths in sync scripts)
    ↓
Phase 2 (.meta.json generator) ← needs API to fetch product data
    ↓
Phase 4 (XMP embedding) ← runs before upload, reads .meta.json
    ↓
Phase 3 (MongoDB update API) ← called after upload
    ↓
Phase 5 (analytics) ← after data flowing

Open Question

Phase 2 needs product data from MongoDB:

Option A: Sync script calls existing search API
Option B: Create dedicated /api/parts/{pn}/metadata endpoint
Option C: Direct MongoDB query from bash (via mongosh)

Recommended: Option B — clean API, reusable

Files to Modify

Mac Studio (`/Users/john/crop_parts/sync/`)

File	Phase	Changes
`crop_sync.sh`	1, 2, 3, 4	transform_path, metadata gen, XMP embed, API call
`bulk_upload.sh`	1	transform_path
`config.sh`	2, 3	Add API_URL, API endpoints
`test_transform.sh`	1	Update test cases
`generate_metadata.sh`	2	NEW: fetch product data, create .meta.json
`embed_xmp.sh`	4	NEW: read .meta.json, embed XMP tags

Backend API (new endpoints needed)

Endpoint	Phase	Purpose
`GET /api/parts/{pn}/metadata`	2	Return product data for .meta.json
`POST /api/media/upload-notification`	3	Update MongoDB after upload

SSH Access

ssh crop-john
cd ~/crop_parts/sync

Open Questions & Decisions

1. API Access from Mac Studio

Current: Only SSH access from vova's machine (ssh crop-john)

Options for API calls:

Option	Status
Script runs on vova's machine via SSH	✅ Current
Direct API from Mac Studio	❌ Not available now
mongosh direct query	Possible fallback

Decision: For now, scripts that need API access run remotely via SSH, or use direct MongoDB query from Mac Studio.

2. uploadedBy — Machine ID (not person)

Question: How to identify upload source?

# In config.sh
export UPLOADER_ID="mac-studio-john"

Decision: Use machine identifier, not person name.

Only one machine (Mac Studio) has scanner
Simpler than tracking individual photographers
Value: mac-studio-john or just john

3. Product Not Found in MongoDB

Question: What if partNumber doesn't exist?

Scenario	Action
Product exists	Generate full .meta.json
Product NOT found	Upload image anyway, minimal metadata, log warning

Decision: Don't block upload, partial success is better than failure

4. 360° Completeness Check

Question: When to update MongoDB for 360°?

Current behavior: Each frame uploaded separately

Proposed:

# After uploading frame, check if all 24 exist
frame_count=$(gsutil ls "gs://.../${pn}/360/" | wc -l)
if [[ $frame_count -eq 24 ]]; then
    # Update MongoDB media.view360
fi

5. sortOrder for Gallery Images

Question: What order for different views?

View	sortOrder
simple (no suffix)	1
front	1
back	2
left	3
right	4
top	5
bottom	6

6. Existing Media in MongoDB

Question: Part already has media.images[] from other source?

Decision:

Append new images (don't replace)
Filter by source: "ct" for CT-specific queries
Each source manages its own images

7. Domain for Canonical URL

Status: Domain NOT finalized yet

# In config.sh — PLACEHOLDER, will change for production
export DOMAIN="https://clintontractor.com"

Decision:

Use variable $DOMAIN everywhere (easy to change later)
Current placeholder: clintontractor.com
Will be updated when going to production

8. Error Handling Strategy

Error	Action
API timeout	Retry 3x with backoff
XMP embedding fails	Log error, upload without XMP
GCP upload fails	Retry 3x, then queue for later
Product not found	Partial metadata, log warning

All errors logged to ~/crop_parts/logs/sync.error.log

Photo Workflow - Implementation Plan

On this page