CROP
ProjectsAdmin Panel

GCS Media Storage Migration

This document outlines the migration plan for reorganizing the CROP auto parts media storage in Google Cloud Storage (GCS). The migration restructures...

GCS Media Storage Migration

Executive Summary

This document outlines the migration plan for reorganizing the CROP auto parts media storage in Google Cloud Storage (GCS). The migration restructures approximately 32GB of media assets currently stored in gs://crop_parts/newholland/images/ into a source-based organization system that supports priority-based media selection and CDN optimization.

Current State

  • Total Size: ~32GB
  • Location: gs://crop_parts/newholland/images/
  • Structure: Flat or minimally organized
  • Issues: No source tracking, no priority system, difficult to manage updates

Target State

  • Location: gs://crop_parts/
  • Structure: Source-based directories with standardized naming
  • Benefits: Priority-based selection, clear provenance, CDN-optimized paths

Final Storage Structure

gs://crop_parts/
├── vendor_scraped/           # Priority: 50 - Scraped from manufacturer websites
│   ├── gallery/{vendor}/{part_number}/{part_number}-{n}.jpg
│   ├── 360/{vendor}/{part_number}/{part_number}-R{row}_C{col}.jpg
│   ├── docs/{vendor}/{part_number}/{part_number}-{type}.pdf
│   └── videos/{vendor}/{part_number}/{part_number}-{type}.mp4

├── ct/                       # Priority: 100 - Clinton Tractor photos
│   ├── gallery/{vendor}/{part_number}/{part_number}-{n}.jpg
│   ├── 360/{vendor}/{part_number}/{part_number}-R{row}_C{col}.jpg
│   ├── docs/{vendor}/{part_number}/{part_number}-{type}.pdf
│   └── videos/{vendor}/{part_number}/{part_number}-{type}.mp4

├── vendor_direct/            # Priority: 90 - Direct from manufacturer FTP/API
│   ├── gallery/{vendor}/{part_number}/{part_number}-{n}.jpg
│   ├── 360/{vendor}/{part_number}/{part_number}-R{row}_C{col}.jpg
│   ├── docs/{vendor}/{part_number}/{part_number}-{type}.pdf
│   └── videos/{vendor}/{part_number}/{part_number}-{type}.mp4

├── mycnh/                    # Priority: 70 - MyCNH portal
│   ├── gallery/{vendor}/{part_number}/{part_number}-{n}.jpg
│   ├── 360/{vendor}/{part_number}/{part_number}-R{row}_C{col}.jpg
│   ├── docs/{vendor}/{part_number}/{part_number}-{type}.pdf
│   └── videos/{vendor}/{part_number}/{part_number}-{type}.mp4

└── manual/                   # Priority: 30 - Manual uploads
    ├── gallery/{vendor}/{part_number}/{part_number}-{n}.jpg
    ├── 360/{vendor}/{part_number}/{part_number}-R{row}_C{col}.jpg
    ├── docs/{vendor}/{part_number}/{part_number}-{type}.pdf
    └── videos/{vendor}/{part_number}/{part_number}-{type}.mp4

Media Sources

SourcePriorityDescriptionQualityUse Case
ct100Clinton Tractor professional photosHighestPrimary display images, marketing
vendor_direct90Direct from manufacturer via FTP/APIHighOfficial product images
mycnh70MyCNH portal downloadsGoodCNH brand parts
vendor_scraped50Scraped from manufacturer websitesGoodFallback images
manual30Manual upload by staffVariableCustom/rare parts

Priority Resolution Logic

When a part has media from multiple sources, the system selects based on priority:

function selectBestMedia(part: Part, mediaType: 'gallery' | 'view360' | 'docs' | 'videos'): string[] {
  const sources = ['ct', 'vendor_direct', 'mycnh', 'vendor_scraped', 'manual'];

  for (const source of sources) {
    const media = part.mediaSources?.[source]?.[mediaType];
    if (media && media.length > 0) {
      return media;
    }
  }

  return [];
}

Naming Conventions

  • Format: {part_number}-{order}.jpg
  • Examples:
    • 00100715-1.jpg (first image)
    • 00100715-2.jpg (second image)
    • 00100715-3.jpg (third image)

360-Degree Views

  • Format: {part_number}-R{row}_C{col}.jpg
  • Row: Zero-padded 2 digits (01-99)
  • Column: Zero-padded 2 digits (01-99)
  • Examples:
    • 00100816-R01_C01.jpg (row 1, column 1)
    • 00100816-R01_C24.jpg (row 1, column 24)
    • 00100816-R02_C12.jpg (row 2, column 12)

Documents

  • Format: {part_number}-{type}.pdf
  • Types: manual, spec, install, warranty, safety, diagram
  • Examples:
    • ABC12345-manual.pdf
    • ABC12345-spec.pdf
    • ABC12345-install.pdf

Videos

  • Format: {part_number}-{type}.mp4
  • Types: demo, install, overview, repair, comparison
  • Examples:
    • DEF67890-demo.mp4
    • DEF67890-install.mp4
    • DEF67890-overview.mp4

URL Examples

Base CDN URL

https://media.cropparts.com
# Clinton Tractor photo (Priority 100)
https://media.cropparts.com/ct/gallery/newholland/00100715/00100715-1.jpg

# Vendor scraped (Priority 50)
https://media.cropparts.com/vendor_scraped/gallery/newholland/00100715/00100715-1.jpg

# Manual upload (Priority 30)
https://media.cropparts.com/manual/gallery/caseih/XYZ98765/XYZ98765-1.jpg

360-Degree Views

# Full 360 spin (24 frames)
https://media.cropparts.com/ct/360/newholland/00100816/00100816-R01_C01.jpg
https://media.cropparts.com/ct/360/newholland/00100816/00100816-R01_C02.jpg
...
https://media.cropparts.com/ct/360/newholland/00100816/00100816-R01_C24.jpg

Documents

# Installation manual
https://media.cropparts.com/vendor_direct/docs/newholland/ABC12345/ABC12345-install.pdf

# Specification sheet
https://media.cropparts.com/mycnh/docs/caseih/DEF67890/DEF67890-spec.pdf

Videos

# Product demo
https://media.cropparts.com/ct/videos/newholland/GHI11111/GHI11111-demo.mp4

# Installation guide
https://media.cropparts.com/vendor_direct/videos/kubota/JKL22222/JKL22222-install.mp4

MongoDB Schema Changes

Updated Part Interface

interface Part {
  _id: ObjectId;
  partNumber: string;
  name: string;
  description?: string;
  brand: ObjectId;
  category: ObjectId;

  // Legacy fields (deprecated, kept for backward compatibility)
  imageUrl?: string;

  // New media arrays - resolved URLs based on priority
  gallery: string[];           // Primary gallery images
  view360: string[];           // 360-degree view frames
  docs: string[];              // PDF documents
  videos: string[];            // Video files

  // Source tracking - stores URLs by source for priority resolution
  mediaSources: {
    ct?: MediaSourceData;
    vendor_direct?: MediaSourceData;
    mycnh?: MediaSourceData;
    vendor_scraped?: MediaSourceData;
    manual?: MediaSourceData;
  };

  // Metadata
  price?: number;
  stock?: number;
  status?: 'active' | 'inactive' | 'discontinued';
  createdAt: Date;
  updatedAt: Date;
}

interface MediaSourceData {
  gallery?: string[];
  view360?: string[];
  docs?: string[];
  videos?: string[];
  lastUpdated?: Date;
  fileCount?: number;
}

Example Document

{
  "_id": ObjectId("..."),
  "partNumber": "00100715",
  "name": "Hydraulic Filter",
  "brand": ObjectId("..."),
  "category": ObjectId("..."),

  // Resolved arrays (highest priority source)
  "gallery": [
    "https://media.cropparts.com/ct/gallery/newholland/00100715/00100715-1.jpg",
    "https://media.cropparts.com/ct/gallery/newholland/00100715/00100715-2.jpg"
  ],
  "view360": [],
  "docs": [
    "https://media.cropparts.com/vendor_direct/docs/newholland/00100715/00100715-spec.pdf"
  ],
  "videos": [],

  // Source tracking
  "mediaSources": {
    "ct": {
      "gallery": [
        "https://media.cropparts.com/ct/gallery/newholland/00100715/00100715-1.jpg",
        "https://media.cropparts.com/ct/gallery/newholland/00100715/00100715-2.jpg"
      ],
      "lastUpdated": ISODate("2024-01-15T10:30:00Z"),
      "fileCount": 2
    },
    "vendor_scraped": {
      "gallery": [
        "https://media.cropparts.com/vendor_scraped/gallery/newholland/00100715/00100715-1.jpg"
      ],
      "lastUpdated": ISODate("2023-11-20T08:00:00Z"),
      "fileCount": 1
    },
    "vendor_direct": {
      "docs": [
        "https://media.cropparts.com/vendor_direct/docs/newholland/00100715/00100715-spec.pdf"
      ],
      "lastUpdated": ISODate("2024-01-10T14:00:00Z"),
      "fileCount": 1
    }
  },

  "createdAt": ISODate("2023-01-01T00:00:00Z"),
  "updatedAt": ISODate("2024-01-15T10:30:00Z")
}

Migration Query Example

// Update part with new media structure
db.parts.updateOne(
  { partNumber: "00100715" },
  {
    $set: {
      "gallery": ["https://media.cropparts.com/ct/gallery/newholland/00100715/00100715-1.jpg"],
      "mediaSources.ct.gallery": ["https://media.cropparts.com/ct/gallery/newholland/00100715/00100715-1.jpg"],
      "mediaSources.ct.lastUpdated": new Date(),
      "mediaSources.ct.fileCount": 1,
      "updatedAt": new Date()
    }
  }
);

Migration Phases

Phase 1: Discovery & Manifest Generation

Duration: 30-45 minutes

  1. Scan existing storage

    gsutil ls -r gs://crop_parts/newholland/images/ > existing_files.txt
  2. Generate migration manifest

    • Parse existing file paths
    • Determine source based on metadata/naming patterns
    • Map to new paths
    • Output JSON manifest
  3. Validate manifest

    • Check for naming conflicts
    • Verify part numbers exist in MongoDB
    • Flag orphaned files

Output: migration_manifest.json

Phase 2: Dry Run

Duration: 30-60 minutes

  1. Simulate migration

    gsutil -m cp -n -r gs://crop_parts/newholland/images/* gs://crop_parts_staging/
  2. Verify file count

    gsutil ls -r gs://crop_parts_staging/ | wc -l
  3. Check storage usage

    gsutil du -s gs://crop_parts_staging/
  4. Review sample files for correct placement

Output: Dry run report with any issues

Phase 3: Migration Execution

Duration: 2-4 hours

  1. Create backup

    gsutil -m cp -r gs://crop_parts/newholland/ gs://crop_parts_backup/newholland_$(date +%Y%m%d)/
  2. Execute migration

    # Run migration script with manifest
    node scripts/gcs-migrate.js --manifest migration_manifest.json --execute
  3. Monitor progress

    • Track files processed
    • Log errors
    • Report completion percentage
  4. Verify integrity

    • Compare file counts
    • Spot-check random files
    • Verify file sizes match

Output: Migration execution log

Phase 4: Verification

Duration: 30-60 minutes

  1. Structural verification

    # Verify all directories exist
    gsutil ls gs://crop_parts/ct/
    gsutil ls gs://crop_parts/vendor_direct/
    gsutil ls gs://crop_parts/mycnh/
    gsutil ls gs://crop_parts/vendor_scraped/
    gsutil ls gs://crop_parts/manual/
  2. Content verification

    • Sample 100 random files
    • Verify accessibility via CDN
    • Check file integrity (MD5)
  3. URL accessibility test

    curl -I https://media.cropparts.com/ct/gallery/newholland/00100715/00100715-1.jpg

Output: Verification report

Phase 5: MongoDB Update

Duration: 30-60 minutes

  1. Backup collections

    mongodump --db crop --collection parts --out ./backup_$(date +%Y%m%d)
  2. Run update script

    node scripts/update-part-media-urls.js --manifest migration_manifest.json
  3. Verify updates

    • Check sample parts in MongoDB
    • Verify URLs resolve correctly
    • Test admin dashboard display
  4. Update indexes (if needed)

    db.parts.createIndex({ "mediaSources.ct.lastUpdated": 1 });

Output: MongoDB update report

Phase 6: Cleanup

Duration: After 7-day monitoring period

  1. Monitor for issues

    • Track 404 errors in CDN logs
    • Monitor application errors
    • Check user reports
  2. Remove old files (after 7 days)

    gsutil -m rm -r gs://crop_parts/newholland/images/
  3. Remove staging (if used)

    gsutil -m rm -r gs://crop_parts_staging/
  4. Archive backup or remove after 30 days

Output: Cleanup confirmation


Cloudflare CDN Configuration

DNS Setup

CNAME media.cropparts.com -> storage.googleapis.com

Page Rules

URL Pattern: media.cropparts.com/*/gallery/*
Settings:
  Cache Level: Cache Everything
  Edge Cache TTL: 1 month
  Browser Cache TTL: 1 week
  Polish: Lossy (for WebP conversion)
  Mirage: On

Rule 2: 360-Degree Views

URL Pattern: media.cropparts.com/*/360/*
Settings:
  Cache Level: Cache Everything
  Edge Cache TTL: 1 month
  Browser Cache TTL: 1 week
  Polish: Lossless
  Mirage: On

Rule 3: Documents

URL Pattern: media.cropparts.com/*/docs/*
Settings:
  Cache Level: Cache Everything
  Edge Cache TTL: 1 week
  Browser Cache TTL: 1 day
  Polish: Off

Rule 4: Videos

URL Pattern: media.cropparts.com/*/videos/*
Settings:
  Cache Level: Cache Everything
  Edge Cache TTL: 1 month
  Browser Cache TTL: 1 week
  Polish: Off
  Rocket Loader: Off

Cache TTL Summary

Media TypeEdge CacheBrowser CacheNotes
Gallery1 month1 weekWith image optimization
3601 month1 weekLossless compression
Docs1 week1 dayMay update more frequently
Videos1 month1 weekLarge files, stable content

Security Settings

Hotlink Protection: On
Allowed Domains:
  - cropparts.com
  - *.cropparts.com
  - localhost:3000
  - localhost:3001

WAF Rules:
  - Block requests without referer from non-allowed domains
  - Rate limit: 100 requests/minute per IP for /docs/
  - Block direct bucket access (gs://)

SSL/TLS:
  - Full (Strict)
  - Always Use HTTPS: On
  - Minimum TLS Version: 1.2

Transform Rules (URL Rewriting)

// Rewrite to GCS backend
(http.host eq "media.cropparts.com") {
  set destination = "https://storage.googleapis.com/crop_parts" + http.request.uri.path
}

Rollback Procedures

Scenario 1: Partial Migration Failure

Symptoms: Some files not migrated, mixed state

Steps:

  1. Stop migration script immediately
  2. Document which files were migrated
  3. Restore from backup for affected files
    gsutil -m cp -r gs://crop_parts_backup/newholland_YYYYMMDD/* gs://crop_parts/newholland/
  4. Revert MongoDB changes
    mongorestore --db crop --collection parts ./backup_YYYYMMDD/crop/parts.bson --drop
  5. Investigate root cause before retry

Scenario 2: CDN Issues

Symptoms: 404 errors, slow loading, broken images

Steps:

  1. Check Cloudflare dashboard for errors
  2. Purge CDN cache
    curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/purge_cache" \
      -H "Authorization: Bearer {token}" \
      -H "Content-Type: application/json" \
      --data '{"purge_everything":true}'
  3. Verify GCS bucket permissions
  4. Update page rules if needed

Scenario 3: Complete Rollback

Symptoms: Major issues requiring full revert

Steps:

  1. Restore files from backup
    gsutil -m cp -r gs://crop_parts_backup/newholland_YYYYMMDD/* gs://crop_parts/newholland/
  2. Restore MongoDB
    mongorestore --db crop --collection parts ./backup_YYYYMMDD/crop/parts.bson --drop
  3. Update CDN rules to point to old structure
  4. Purge CDN cache
  5. Monitor for 24 hours
  6. Document lessons learned

Rollback Checklist

  • Stop all migration processes
  • Notify team of rollback
  • Restore GCS files from backup
  • Restore MongoDB from backup
  • Purge CDN cache
  • Test sample URLs
  • Verify admin dashboard
  • Verify customer-facing site
  • Monitor error logs
  • Document incident

Success Metrics

Pre-Migration Baseline

Capture these metrics before migration:

  • Total file count in existing storage
  • Total storage size (GB)
  • Average page load time (with images)
  • CDN cache hit ratio
  • 404 error rate for media files
  • Part records with imageUrl field populated

Post-Migration Targets

MetricTargetMeasurement Method
File Migration100% of files migratedCompare file counts
Zero Data Loss0 missing filesMD5 checksum verification
URL Resolution100% of new URLs workAutomated URL checker
MongoDB Update100% of parts updatedQuery count verification
CDN Performance< 200ms TTFBCloudflare analytics
Cache Hit Ratio> 90% after 48 hoursCloudflare dashboard
Error Rate< 0.1% 404 errorsApplication monitoring
Page Load TimeNo regressionLighthouse audit

Verification Queries

// Count parts with new media structure
db.parts.countDocuments({ "gallery.0": { $exists: true } });

// Count parts with source tracking
db.parts.countDocuments({ "mediaSources": { $exists: true } });

// Find parts missing media after migration
db.parts.find({
  "imageUrl": { $exists: true, $ne: "" },
  "gallery.0": { $exists: false }
});

// Count by source
db.parts.aggregate([
  { $match: { "mediaSources.ct": { $exists: true } } },
  { $count: "ct_source_count" }
]);

Monitoring Dashboard

After migration, monitor:

  1. Cloudflare Analytics

    • Request volume by path
    • Cache hit ratio
    • Error rates
    • Bandwidth usage
  2. Application Logs

    • 404 errors for media URLs
    • Slow media loading
    • Failed image renders
  3. User Feedback

    • Broken image reports
    • Slow loading complaints
    • Missing documentation reports

Sign-off Criteria

Migration is complete when:

  • All files migrated and verified
  • MongoDB updated for all affected parts
  • CDN serving new URLs correctly
  • Cache hit ratio > 90%
  • Error rate < 0.1%
  • Page load time matches or beats baseline
  • 7-day monitoring period complete with no issues
  • Old files cleaned up
  • Documentation updated
  • Team notified of completion

Appendix

Useful Commands

# List all files in a source directory
gsutil ls -r gs://crop_parts/ct/gallery/

# Count files
gsutil ls -r gs://crop_parts/ct/ | wc -l

# Check storage size
gsutil du -sh gs://crop_parts/ct/

# Copy with parallel processing
gsutil -m cp -r source/ destination/

# Sync directories (only copy new/changed files)
gsutil -m rsync -r source/ destination/

# Set public read access
gsutil -m acl ch -r -u AllUsers:R gs://crop_parts/

# Generate signed URL (for private files)
gsutil signurl -d 1h key.json gs://crop_parts/docs/part.pdf

Contact

For questions about this migration:

  • Technical Lead: [Add contact]
  • DevOps: [Add contact]
  • Database Admin: [Add contact]

Document Version: 1.0 Created: 2024-11-20 Last Updated: 2024-11-20

On this page