Photo Workflow - Implementation Plan
⚠️ Mac Studio Local Structure — READ ONLY
Photo Workflow - Implementation Plan
Context
- Sync script already exists and works on Mac Studio
- 467 test files already uploaded to GCP
- Need to migrate to vendor-first structure
- XMP embedding included (Phase 4) — native exiftool script
- CLIP embedding OUT OF SCOPE (Phase 6, separate task)
Constraints
⚠️ Mac Studio Local Structure — READ ONLY
We CANNOT change local folder structure — controlled by Ortery software (photo capture).
Local (Ortery creates this — DO NOT MODIFY):
/Users/john/crop_parts/{Manufacturer}/{PartNumber}/
├── {pn}.jpg ← naming varies
├── {pn}-Front/{pn}-Front1.jpg ← multiview
└── {pn}-360/images/lv2/ ← 360° framesWhat we CAN do:
- Modify sync scripts
- Normalize paths during transformation
- Transform to standardized GCP structure
Path Normalization (in sync script)
Ortery output varies. Sync script must normalize:
| Ortery Output | Normalized GCP Path |
|---|---|
5023082.jpg | gallery/5023082-1.jpg |
5023082.JPG | gallery/5023082-1.jpg (lowercase) |
35.0154-Front1.jpg | gallery/35.0154-front-1.jpg |
35.0154-FRONT1.jpg | gallery/35.0154-front-1.jpg |
images/lv2/img0.jpg | 360/R01_C01.jpg |
images/lv2/img23.jpg | 360/R01_C24.jpg |
Normalization rules:
- Lowercase extensions:
.JPG→.jpg - Lowercase view names:
FRONT→front - Add
-1suffix if missing:{pn}.jpg→{pn}-1.jpg - Convert 360° index:
img{N}→R01_C{N+1}
Phase 0: Migrate GCP Structure
Goal: Move existing files from type-first to vendor-first
Current (type-first):
gs://crop_parts/ct/gallery/{vendor}/{pn}/
gs://crop_parts/ct/360/{vendor}/{pn}/Target (vendor-first):
gs://crop_parts/ct/{vendor}/{pn}/gallery/
gs://crop_parts/ct/{vendor}/{pn}/360/Tasks:
-
Create migration script:
# Example transformation: # Old: gs://crop_parts/ct/gallery/bgs/53.0215/53.0215-1.jpg # New: gs://crop_parts/ct/bgs/53.0215/gallery/53.0215-1.jpg -
Steps:
- List all files:
gsutil ls -r gs://crop_parts/ct/** - Generate mapping (old → new)
- Copy to new locations
- Verify accessibility
- Delete old paths
- List all files:
-
Clear
uploaded.txtafter migration (reset tracking)
Output:
- Migration script
- Verification report
Phase 1: Update Sync Script
Goal: Modify existing crop_sync.sh for vendor-first structure
Location: /Users/john/crop_parts/sync/
Tasks:
-
Modify
transform_path()function incrop_sync.sh:# Current output: # gs://crop_parts/ct/gallery/{vendor}/{pn}/{pn}-1.jpg # New output: # gs://crop_parts/ct/{vendor}/{pn}/gallery/{pn}-1.jpg -
Update patterns:
simple→{bucket}/ct/{vendor}/{pn}/gallery/{pn}-1.{ext}multiview→{bucket}/ct/{vendor}/{pn}/gallery/{pn}-{view}-{num}.{ext}360→{bucket}/ct/{vendor}/{pn}/360/{pn}-R01_C{col}.{ext}
-
Update
bulk_upload.shwith same changes -
Update
test_transform.shfor verification
Output:
- Updated sync scripts
- Test results
Phase 2: Add .meta.json Generation
Goal: Generate JSON metadata files alongside images
Tasks:
-
Create
generate_metadata.shscript:- Input: partNumber, vendor, image type
- Query MongoDB for product data (via API)
- Generate JSON with schema fields
- Save as
{pn}-{N}.meta.json
-
Integrate into
crop_sync.sh:- After successful image upload
- Generate .meta.json
- Upload to same GCP location
-
JSON schema:
{ "schemaVersion": "1.0", "createdAt": "ISO timestamp", "product": { "sku", "partNumber", "pnNorm", "title", "manufacturer": {"name", "code"}, "categoryPath", "equipmentFitment", "canonicalUrl": "https://clintontractor.com/parts/{sku}" }, "image": {"type", "sortOrder", "source"}, "company": {"name", "copyright", "website"} }Canonical URL — link to product page for SEO
-
For 360°: NO .meta.json (fixed structure, no SEO value)
Output:
- Metadata generator script
- Updated sync script
Phase 3: MongoDB Integration
Goal: Update MongoDB when images uploaded
Tasks:
-
Create API endpoint in health-analytics or search service:
POST /api/media/upload-notification { "partNumber": "53.0215", "vendor": "bgs", "type": "gallery", "filename": "53.0215-1.jpg", "gcpUrl": "gs://crop_parts/ct/bgs/53.0215/gallery/53.0215-1.jpg" } -
API updates MongoDB:
db.parts.updateOne( { partNumber: "53.0215" }, { $push: { "media.images": {...} }, $set: { "media.hasImage": true, "media.hasGcpImages": true }, $inc: { "media.imagesCount": 1 } } ) -
Integrate API call into
crop_sync.sh:# After successful upload curl -X POST "$API_URL/api/media/upload-notification" \ -H "Content-Type: application/json" \ -d '{"partNumber":"...", "vendor":"...", ...}'
Output:
- API endpoint
- Updated sync script
Phase 4: XMP Embedding (Native Script)
Goal: Embed metadata into gallery images before upload
Approach:
- Native bash script with
exiftool - Runs on Mac Studio (files already there)
- Embed XMP → then upload to GCP
- No Docker overhead for MVP
Tasks:
-
Install exiftool on Mac Studio:
brew install exiftool -
Create
embed_xmp.shscript:#!/bin/bash # Input: image path, .meta.json path IMAGE="$1" META="$2" # Read from .meta.json TITLE=$(jq -r '.product.title' "$META") SKU=$(jq -r '.product.sku' "$META") PN=$(jq -r '.product.partNumber' "$META") MFR=$(jq -r '.product.manufacturer.name' "$META") COPYRIGHT=$(jq -r '.company.copyright' "$META") CANONICAL=$(jq -r '.product.canonicalUrl' "$META") WEBSITE=$(jq -r '.company.website' "$META") # Embed XMP tags exiftool -overwrite_original \ -XMP-dc:Title="$TITLE" \ -XMP-dc:Creator="Clinton Tractor & Implement Co." \ -XMP-dc:Rights="$COPYRIGHT" \ -XMP-dc:Description="$TITLE - $MFR" \ -XMP-dc:Identifier="$CANONICAL" \ -XMP-photoshop:Credit="$MFR" \ -XMP-photoshop:Source="$CANONICAL" \ -XMP-xmp:CreatorTool="$WEBSITE" \ -XMP-iptc:Keywords="$MFR,$PN,parts,agriculture" \ "$IMAGE" -
Integrate into
crop_sync.sh:# After generating .meta.json, before upload: if [[ "$TYPE" == "gallery" ]]; then ./embed_xmp.sh "$IMAGE" "$META_JSON" fi # Then upload image (now with XMP embedded) -
Skip 360° — no XMP embedding (24 files, no SEO value)
XMP Tag Mapping:
| JSON field | XMP tag |
|---|---|
| product.title | dc:Title |
| company.name | dc:Creator |
| company.copyright | dc:Rights |
| product.manufacturer.name | photoshop:Credit |
| product.canonicalUrl | xmp:Identifier, photoshop:Source |
| company.website | xmp:CreatorTool |
| product.partNumber + manufacturer | iptc:Keywords |
Output:
embed_xmp.shscript- Updated
crop_sync.shwith XMP step
Future Migration Path:
MVP: Mac Studio → exiftool → upload
↓
Scale: Upload raw → Cloud Run (exiftool in Docker) → update GCSPhase 5: Analytics
Goal: Track upload statistics in health-analytics service
Service: services/health-analytics (already has media routes)
Metrics to Track:
| Metric | Description | Priority |
|---|---|---|
| Parts photographed per day | Unique partNumbers | High |
| Photos per day | File count | Medium |
| By type (gallery/360°) | Breakdown | High |
| By manufacturer/vendor | bgs, nhl, ven, mch, rav | Medium |
| Failed uploads | For monitoring | High |
| Processing time | For optimization | Low |
Tasks:
-
Add upload tracking to MongoDB:
- Store in
media.images[].uploadedAt - Store
media.view360.uploadedAt
- Store in
-
Create aggregation queries:
// Parts photographed per day db.parts.aggregate([ { $match: { "media.images.source": "ct" } }, { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$media.images.uploadedAt" } }, uniqueParts: { $addToSet: "$partNumber" } } }, { $project: { date: "$_id", count: { $size: "$uniqueParts" } } } ]) -
Add API endpoints to health-analytics:
GET /api/health/media/uploads/daily GET /api/health/media/uploads/by-vendor GET /api/health/media/uploads/by-type -
Dashboard UI at
/dashboard/health/analytics/photo-stats
Phase 6: CLIP Embedding (FUTURE — Separate Task)
⚠️ NOT IN CURRENT SCOPE — This will be a separate Linear task in a future session
Goal: AI-powered visual search for similar products
What is CLIP?
- AI creates numeric "fingerprint" of image (512 numbers)
- Enables: "find similar parts" and text-to-image search
- Different from XMP (XMP = text metadata for SEO, CLIP = AI vectors for search)
Planned Stack (JS/TS only, no Python):
// Transformers.js — CLIP in pure JavaScript
import { pipeline } from '@xenova/transformers';
const extractor = await pipeline(
'image-feature-extraction',
'Xenova/clip-vit-base-patch32'
);
const embedding = await extractor(imageUrl);
// → [0.23, -0.15, 0.89, ...] 512 floatsPlanned Architecture:
Image uploaded to GCP
↓
Trigger CLIP microservice (Hono + Transformers.js)
↓
Generate embedding [512 floats]
↓
Store in MongoDB: parts.media.images[].embedding
↓
Enable vector search: "find similar parts"Future Tasks (for next session):
- Create
services/clip-embedding/Hono microservice - Implement image embedding endpoint
- Create MongoDB vector index
- Add search API: GET /search/similar?imageId=xxx
- Integrate with product pages
XMP vs CLIP Summary:
| XMP (Phase 4) | CLIP (Phase 6) | |
|---|---|---|
| What | Text metadata in file | AI vector in database |
| For | SEO, Google Images | Visual search on site |
| When | Current task | Future task |
Phase Summary
| Phase | Focus | Status |
|---|---|---|
| 0 | GCP migration | TODO |
| 1 | Update sync script | TODO |
| 2 | .meta.json generation | TODO |
| 3 | MongoDB integration | TODO |
| 4 | XMP embedding (native) | TODO |
| 5 | Analytics | LATER |
| 6 | CLIP embedding | FUTURE (separate task) |
Out of Scope
- Image optimization/resize
- CDN configuration
- Cloud-based XMP processing (future migration)
Workflow Order (per image)
Bella takes photo
↓
fswatch detects new file
↓
[Phase 2] Generate .meta.json (query MongoDB for product data)
↓
[Phase 4] Embed XMP into image (read from .meta.json)
↓
Upload image + .meta.json to GCP
↓
[Phase 3] Call API to update MongoDB media.images[]Implementation Order
Phase 0 (GCP migration) ← do first, clean slate
↓
Phase 1 (update paths in sync scripts)
↓
Phase 2 (.meta.json generator) ← needs API to fetch product data
↓
Phase 4 (XMP embedding) ← runs before upload, reads .meta.json
↓
Phase 3 (MongoDB update API) ← called after upload
↓
Phase 5 (analytics) ← after data flowingOpen Question
Phase 2 needs product data from MongoDB:
- Option A: Sync script calls existing search API
- Option B: Create dedicated
/api/parts/{pn}/metadataendpoint - Option C: Direct MongoDB query from bash (via mongosh)
Recommended: Option B — clean API, reusable
Files to Modify
Mac Studio (/Users/john/crop_parts/sync/)
| File | Phase | Changes |
|---|---|---|
crop_sync.sh | 1, 2, 3, 4 | transform_path, metadata gen, XMP embed, API call |
bulk_upload.sh | 1 | transform_path |
config.sh | 2, 3 | Add API_URL, API endpoints |
test_transform.sh | 1 | Update test cases |
generate_metadata.sh | 2 | NEW: fetch product data, create .meta.json |
embed_xmp.sh | 4 | NEW: read .meta.json, embed XMP tags |
Backend API (new endpoints needed)
| Endpoint | Phase | Purpose |
|---|---|---|
GET /api/parts/{pn}/metadata | 2 | Return product data for .meta.json |
POST /api/media/upload-notification | 3 | Update MongoDB after upload |
SSH Access
ssh crop-john
cd ~/crop_parts/syncOpen Questions & Decisions
1. API Access from Mac Studio
Current: Only SSH access from vova's machine (ssh crop-john)
Options for API calls:
| Option | Status |
|---|---|
| Script runs on vova's machine via SSH | ✅ Current |
| Direct API from Mac Studio | ❌ Not available now |
| mongosh direct query | Possible fallback |
Decision: For now, scripts that need API access run remotely via SSH, or use direct MongoDB query from Mac Studio.
2. uploadedBy — Machine ID (not person)
Question: How to identify upload source?
# In config.sh
export UPLOADER_ID="mac-studio-john"Decision: Use machine identifier, not person name.
- Only one machine (Mac Studio) has scanner
- Simpler than tracking individual photographers
- Value:
mac-studio-johnor justjohn
3. Product Not Found in MongoDB
Question: What if partNumber doesn't exist?
| Scenario | Action |
|---|---|
| Product exists | Generate full .meta.json |
| Product NOT found | Upload image anyway, minimal metadata, log warning |
Decision: Don't block upload, partial success is better than failure
4. 360° Completeness Check
Question: When to update MongoDB for 360°?
Current behavior: Each frame uploaded separately
Proposed:
# After uploading frame, check if all 24 exist
frame_count=$(gsutil ls "gs://.../${pn}/360/" | wc -l)
if [[ $frame_count -eq 24 ]]; then
# Update MongoDB media.view360
fi5. sortOrder for Gallery Images
Question: What order for different views?
| View | sortOrder |
|---|---|
| simple (no suffix) | 1 |
| front | 1 |
| back | 2 |
| left | 3 |
| right | 4 |
| top | 5 |
| bottom | 6 |
6. Existing Media in MongoDB
Question: Part already has media.images[] from other source?
Decision:
- Append new images (don't replace)
- Filter by
source: "ct"for CT-specific queries - Each source manages its own images
7. Domain for Canonical URL
Status: Domain NOT finalized yet
# In config.sh — PLACEHOLDER, will change for production
export DOMAIN="https://clintontractor.com"Decision:
- Use variable
$DOMAINeverywhere (easy to change later) - Current placeholder:
clintontractor.com - Will be updated when going to production
8. Error Handling Strategy
| Error | Action |
|---|---|
| API timeout | Retry 3x with backoff |
| XMP embedding fails | Log error, upload without XMP |
| GCP upload fails | Retry 3x, then queue for later |
| Product not found | Partial metadata, log warning |
All errors logged to ~/crop_parts/logs/sync.error.log