GCS Media Storage Migration
This document outlines the migration plan for reorganizing the CROP auto parts media storage in Google Cloud Storage (GCS). The migration restructures...
GCS Media Storage Migration
Executive Summary
This document outlines the migration plan for reorganizing the CROP auto parts media storage in Google Cloud Storage (GCS). The migration restructures approximately 32GB of media assets currently stored in gs://crop_parts/newholland/images/ into a source-based organization system that supports priority-based media selection and CDN optimization.
Current State
- Total Size: ~32GB
- Location:
gs://crop_parts/newholland/images/ - Structure: Flat or minimally organized
- Issues: No source tracking, no priority system, difficult to manage updates
Target State
- Location:
gs://crop_parts/ - Structure: Source-based directories with standardized naming
- Benefits: Priority-based selection, clear provenance, CDN-optimized paths
Final Storage Structure
gs://crop_parts/
├── vendor_scraped/ # Priority: 50 - Scraped from manufacturer websites
│ ├── gallery/{vendor}/{part_number}/{part_number}-{n}.jpg
│ ├── 360/{vendor}/{part_number}/{part_number}-R{row}_C{col}.jpg
│ ├── docs/{vendor}/{part_number}/{part_number}-{type}.pdf
│ └── videos/{vendor}/{part_number}/{part_number}-{type}.mp4
│
├── ct/ # Priority: 100 - Clinton Tractor photos
│ ├── gallery/{vendor}/{part_number}/{part_number}-{n}.jpg
│ ├── 360/{vendor}/{part_number}/{part_number}-R{row}_C{col}.jpg
│ ├── docs/{vendor}/{part_number}/{part_number}-{type}.pdf
│ └── videos/{vendor}/{part_number}/{part_number}-{type}.mp4
│
├── vendor_direct/ # Priority: 90 - Direct from manufacturer FTP/API
│ ├── gallery/{vendor}/{part_number}/{part_number}-{n}.jpg
│ ├── 360/{vendor}/{part_number}/{part_number}-R{row}_C{col}.jpg
│ ├── docs/{vendor}/{part_number}/{part_number}-{type}.pdf
│ └── videos/{vendor}/{part_number}/{part_number}-{type}.mp4
│
├── mycnh/ # Priority: 70 - MyCNH portal
│ ├── gallery/{vendor}/{part_number}/{part_number}-{n}.jpg
│ ├── 360/{vendor}/{part_number}/{part_number}-R{row}_C{col}.jpg
│ ├── docs/{vendor}/{part_number}/{part_number}-{type}.pdf
│ └── videos/{vendor}/{part_number}/{part_number}-{type}.mp4
│
└── manual/ # Priority: 30 - Manual uploads
├── gallery/{vendor}/{part_number}/{part_number}-{n}.jpg
├── 360/{vendor}/{part_number}/{part_number}-R{row}_C{col}.jpg
├── docs/{vendor}/{part_number}/{part_number}-{type}.pdf
└── videos/{vendor}/{part_number}/{part_number}-{type}.mp4Media Sources
| Source | Priority | Description | Quality | Use Case |
|---|---|---|---|---|
ct | 100 | Clinton Tractor professional photos | Highest | Primary display images, marketing |
vendor_direct | 90 | Direct from manufacturer via FTP/API | High | Official product images |
mycnh | 70 | MyCNH portal downloads | Good | CNH brand parts |
vendor_scraped | 50 | Scraped from manufacturer websites | Good | Fallback images |
manual | 30 | Manual upload by staff | Variable | Custom/rare parts |
Priority Resolution Logic
When a part has media from multiple sources, the system selects based on priority:
function selectBestMedia(part: Part, mediaType: 'gallery' | 'view360' | 'docs' | 'videos'): string[] {
const sources = ['ct', 'vendor_direct', 'mycnh', 'vendor_scraped', 'manual'];
for (const source of sources) {
const media = part.mediaSources?.[source]?.[mediaType];
if (media && media.length > 0) {
return media;
}
}
return [];
}Naming Conventions
Gallery Images
- Format:
{part_number}-{order}.jpg - Examples:
00100715-1.jpg(first image)00100715-2.jpg(second image)00100715-3.jpg(third image)
360-Degree Views
- Format:
{part_number}-R{row}_C{col}.jpg - Row: Zero-padded 2 digits (01-99)
- Column: Zero-padded 2 digits (01-99)
- Examples:
00100816-R01_C01.jpg(row 1, column 1)00100816-R01_C24.jpg(row 1, column 24)00100816-R02_C12.jpg(row 2, column 12)
Documents
- Format:
{part_number}-{type}.pdf - Types:
manual,spec,install,warranty,safety,diagram - Examples:
ABC12345-manual.pdfABC12345-spec.pdfABC12345-install.pdf
Videos
- Format:
{part_number}-{type}.mp4 - Types:
demo,install,overview,repair,comparison - Examples:
DEF67890-demo.mp4DEF67890-install.mp4DEF67890-overview.mp4
URL Examples
Base CDN URL
https://media.cropparts.comGallery Images
# Clinton Tractor photo (Priority 100)
https://media.cropparts.com/ct/gallery/newholland/00100715/00100715-1.jpg
# Vendor scraped (Priority 50)
https://media.cropparts.com/vendor_scraped/gallery/newholland/00100715/00100715-1.jpg
# Manual upload (Priority 30)
https://media.cropparts.com/manual/gallery/caseih/XYZ98765/XYZ98765-1.jpg360-Degree Views
# Full 360 spin (24 frames)
https://media.cropparts.com/ct/360/newholland/00100816/00100816-R01_C01.jpg
https://media.cropparts.com/ct/360/newholland/00100816/00100816-R01_C02.jpg
...
https://media.cropparts.com/ct/360/newholland/00100816/00100816-R01_C24.jpgDocuments
# Installation manual
https://media.cropparts.com/vendor_direct/docs/newholland/ABC12345/ABC12345-install.pdf
# Specification sheet
https://media.cropparts.com/mycnh/docs/caseih/DEF67890/DEF67890-spec.pdfVideos
# Product demo
https://media.cropparts.com/ct/videos/newholland/GHI11111/GHI11111-demo.mp4
# Installation guide
https://media.cropparts.com/vendor_direct/videos/kubota/JKL22222/JKL22222-install.mp4MongoDB Schema Changes
Updated Part Interface
interface Part {
_id: ObjectId;
partNumber: string;
name: string;
description?: string;
brand: ObjectId;
category: ObjectId;
// Legacy fields (deprecated, kept for backward compatibility)
imageUrl?: string;
// New media arrays - resolved URLs based on priority
gallery: string[]; // Primary gallery images
view360: string[]; // 360-degree view frames
docs: string[]; // PDF documents
videos: string[]; // Video files
// Source tracking - stores URLs by source for priority resolution
mediaSources: {
ct?: MediaSourceData;
vendor_direct?: MediaSourceData;
mycnh?: MediaSourceData;
vendor_scraped?: MediaSourceData;
manual?: MediaSourceData;
};
// Metadata
price?: number;
stock?: number;
status?: 'active' | 'inactive' | 'discontinued';
createdAt: Date;
updatedAt: Date;
}
interface MediaSourceData {
gallery?: string[];
view360?: string[];
docs?: string[];
videos?: string[];
lastUpdated?: Date;
fileCount?: number;
}Example Document
{
"_id": ObjectId("..."),
"partNumber": "00100715",
"name": "Hydraulic Filter",
"brand": ObjectId("..."),
"category": ObjectId("..."),
// Resolved arrays (highest priority source)
"gallery": [
"https://media.cropparts.com/ct/gallery/newholland/00100715/00100715-1.jpg",
"https://media.cropparts.com/ct/gallery/newholland/00100715/00100715-2.jpg"
],
"view360": [],
"docs": [
"https://media.cropparts.com/vendor_direct/docs/newholland/00100715/00100715-spec.pdf"
],
"videos": [],
// Source tracking
"mediaSources": {
"ct": {
"gallery": [
"https://media.cropparts.com/ct/gallery/newholland/00100715/00100715-1.jpg",
"https://media.cropparts.com/ct/gallery/newholland/00100715/00100715-2.jpg"
],
"lastUpdated": ISODate("2024-01-15T10:30:00Z"),
"fileCount": 2
},
"vendor_scraped": {
"gallery": [
"https://media.cropparts.com/vendor_scraped/gallery/newholland/00100715/00100715-1.jpg"
],
"lastUpdated": ISODate("2023-11-20T08:00:00Z"),
"fileCount": 1
},
"vendor_direct": {
"docs": [
"https://media.cropparts.com/vendor_direct/docs/newholland/00100715/00100715-spec.pdf"
],
"lastUpdated": ISODate("2024-01-10T14:00:00Z"),
"fileCount": 1
}
},
"createdAt": ISODate("2023-01-01T00:00:00Z"),
"updatedAt": ISODate("2024-01-15T10:30:00Z")
}Migration Query Example
// Update part with new media structure
db.parts.updateOne(
{ partNumber: "00100715" },
{
$set: {
"gallery": ["https://media.cropparts.com/ct/gallery/newholland/00100715/00100715-1.jpg"],
"mediaSources.ct.gallery": ["https://media.cropparts.com/ct/gallery/newholland/00100715/00100715-1.jpg"],
"mediaSources.ct.lastUpdated": new Date(),
"mediaSources.ct.fileCount": 1,
"updatedAt": new Date()
}
}
);Migration Phases
Phase 1: Discovery & Manifest Generation
Duration: 30-45 minutes
-
Scan existing storage
gsutil ls -r gs://crop_parts/newholland/images/ > existing_files.txt -
Generate migration manifest
- Parse existing file paths
- Determine source based on metadata/naming patterns
- Map to new paths
- Output JSON manifest
-
Validate manifest
- Check for naming conflicts
- Verify part numbers exist in MongoDB
- Flag orphaned files
Output: migration_manifest.json
Phase 2: Dry Run
Duration: 30-60 minutes
-
Simulate migration
gsutil -m cp -n -r gs://crop_parts/newholland/images/* gs://crop_parts_staging/ -
Verify file count
gsutil ls -r gs://crop_parts_staging/ | wc -l -
Check storage usage
gsutil du -s gs://crop_parts_staging/ -
Review sample files for correct placement
Output: Dry run report with any issues
Phase 3: Migration Execution
Duration: 2-4 hours
-
Create backup
gsutil -m cp -r gs://crop_parts/newholland/ gs://crop_parts_backup/newholland_$(date +%Y%m%d)/ -
Execute migration
# Run migration script with manifest node scripts/gcs-migrate.js --manifest migration_manifest.json --execute -
Monitor progress
- Track files processed
- Log errors
- Report completion percentage
-
Verify integrity
- Compare file counts
- Spot-check random files
- Verify file sizes match
Output: Migration execution log
Phase 4: Verification
Duration: 30-60 minutes
-
Structural verification
# Verify all directories exist gsutil ls gs://crop_parts/ct/ gsutil ls gs://crop_parts/vendor_direct/ gsutil ls gs://crop_parts/mycnh/ gsutil ls gs://crop_parts/vendor_scraped/ gsutil ls gs://crop_parts/manual/ -
Content verification
- Sample 100 random files
- Verify accessibility via CDN
- Check file integrity (MD5)
-
URL accessibility test
curl -I https://media.cropparts.com/ct/gallery/newholland/00100715/00100715-1.jpg
Output: Verification report
Phase 5: MongoDB Update
Duration: 30-60 minutes
-
Backup collections
mongodump --db crop --collection parts --out ./backup_$(date +%Y%m%d) -
Run update script
node scripts/update-part-media-urls.js --manifest migration_manifest.json -
Verify updates
- Check sample parts in MongoDB
- Verify URLs resolve correctly
- Test admin dashboard display
-
Update indexes (if needed)
db.parts.createIndex({ "mediaSources.ct.lastUpdated": 1 });
Output: MongoDB update report
Phase 6: Cleanup
Duration: After 7-day monitoring period
-
Monitor for issues
- Track 404 errors in CDN logs
- Monitor application errors
- Check user reports
-
Remove old files (after 7 days)
gsutil -m rm -r gs://crop_parts/newholland/images/ -
Remove staging (if used)
gsutil -m rm -r gs://crop_parts_staging/ -
Archive backup or remove after 30 days
Output: Cleanup confirmation
Cloudflare CDN Configuration
DNS Setup
CNAME media.cropparts.com -> storage.googleapis.comPage Rules
Rule 1: Gallery Images
URL Pattern: media.cropparts.com/*/gallery/*
Settings:
Cache Level: Cache Everything
Edge Cache TTL: 1 month
Browser Cache TTL: 1 week
Polish: Lossy (for WebP conversion)
Mirage: OnRule 2: 360-Degree Views
URL Pattern: media.cropparts.com/*/360/*
Settings:
Cache Level: Cache Everything
Edge Cache TTL: 1 month
Browser Cache TTL: 1 week
Polish: Lossless
Mirage: OnRule 3: Documents
URL Pattern: media.cropparts.com/*/docs/*
Settings:
Cache Level: Cache Everything
Edge Cache TTL: 1 week
Browser Cache TTL: 1 day
Polish: OffRule 4: Videos
URL Pattern: media.cropparts.com/*/videos/*
Settings:
Cache Level: Cache Everything
Edge Cache TTL: 1 month
Browser Cache TTL: 1 week
Polish: Off
Rocket Loader: OffCache TTL Summary
| Media Type | Edge Cache | Browser Cache | Notes |
|---|---|---|---|
| Gallery | 1 month | 1 week | With image optimization |
| 360 | 1 month | 1 week | Lossless compression |
| Docs | 1 week | 1 day | May update more frequently |
| Videos | 1 month | 1 week | Large files, stable content |
Security Settings
Hotlink Protection: On
Allowed Domains:
- cropparts.com
- *.cropparts.com
- localhost:3000
- localhost:3001
WAF Rules:
- Block requests without referer from non-allowed domains
- Rate limit: 100 requests/minute per IP for /docs/
- Block direct bucket access (gs://)
SSL/TLS:
- Full (Strict)
- Always Use HTTPS: On
- Minimum TLS Version: 1.2Transform Rules (URL Rewriting)
// Rewrite to GCS backend
(http.host eq "media.cropparts.com") {
set destination = "https://storage.googleapis.com/crop_parts" + http.request.uri.path
}Rollback Procedures
Scenario 1: Partial Migration Failure
Symptoms: Some files not migrated, mixed state
Steps:
- Stop migration script immediately
- Document which files were migrated
- Restore from backup for affected files
gsutil -m cp -r gs://crop_parts_backup/newholland_YYYYMMDD/* gs://crop_parts/newholland/ - Revert MongoDB changes
mongorestore --db crop --collection parts ./backup_YYYYMMDD/crop/parts.bson --drop - Investigate root cause before retry
Scenario 2: CDN Issues
Symptoms: 404 errors, slow loading, broken images
Steps:
- Check Cloudflare dashboard for errors
- Purge CDN cache
curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/purge_cache" \ -H "Authorization: Bearer {token}" \ -H "Content-Type: application/json" \ --data '{"purge_everything":true}' - Verify GCS bucket permissions
- Update page rules if needed
Scenario 3: Complete Rollback
Symptoms: Major issues requiring full revert
Steps:
- Restore files from backup
gsutil -m cp -r gs://crop_parts_backup/newholland_YYYYMMDD/* gs://crop_parts/newholland/ - Restore MongoDB
mongorestore --db crop --collection parts ./backup_YYYYMMDD/crop/parts.bson --drop - Update CDN rules to point to old structure
- Purge CDN cache
- Monitor for 24 hours
- Document lessons learned
Rollback Checklist
- Stop all migration processes
- Notify team of rollback
- Restore GCS files from backup
- Restore MongoDB from backup
- Purge CDN cache
- Test sample URLs
- Verify admin dashboard
- Verify customer-facing site
- Monitor error logs
- Document incident
Success Metrics
Pre-Migration Baseline
Capture these metrics before migration:
- Total file count in existing storage
- Total storage size (GB)
- Average page load time (with images)
- CDN cache hit ratio
- 404 error rate for media files
- Part records with
imageUrlfield populated
Post-Migration Targets
| Metric | Target | Measurement Method |
|---|---|---|
| File Migration | 100% of files migrated | Compare file counts |
| Zero Data Loss | 0 missing files | MD5 checksum verification |
| URL Resolution | 100% of new URLs work | Automated URL checker |
| MongoDB Update | 100% of parts updated | Query count verification |
| CDN Performance | < 200ms TTFB | Cloudflare analytics |
| Cache Hit Ratio | > 90% after 48 hours | Cloudflare dashboard |
| Error Rate | < 0.1% 404 errors | Application monitoring |
| Page Load Time | No regression | Lighthouse audit |
Verification Queries
// Count parts with new media structure
db.parts.countDocuments({ "gallery.0": { $exists: true } });
// Count parts with source tracking
db.parts.countDocuments({ "mediaSources": { $exists: true } });
// Find parts missing media after migration
db.parts.find({
"imageUrl": { $exists: true, $ne: "" },
"gallery.0": { $exists: false }
});
// Count by source
db.parts.aggregate([
{ $match: { "mediaSources.ct": { $exists: true } } },
{ $count: "ct_source_count" }
]);Monitoring Dashboard
After migration, monitor:
-
Cloudflare Analytics
- Request volume by path
- Cache hit ratio
- Error rates
- Bandwidth usage
-
Application Logs
- 404 errors for media URLs
- Slow media loading
- Failed image renders
-
User Feedback
- Broken image reports
- Slow loading complaints
- Missing documentation reports
Sign-off Criteria
Migration is complete when:
- All files migrated and verified
- MongoDB updated for all affected parts
- CDN serving new URLs correctly
- Cache hit ratio > 90%
- Error rate < 0.1%
- Page load time matches or beats baseline
- 7-day monitoring period complete with no issues
- Old files cleaned up
- Documentation updated
- Team notified of completion
Appendix
Useful Commands
# List all files in a source directory
gsutil ls -r gs://crop_parts/ct/gallery/
# Count files
gsutil ls -r gs://crop_parts/ct/ | wc -l
# Check storage size
gsutil du -sh gs://crop_parts/ct/
# Copy with parallel processing
gsutil -m cp -r source/ destination/
# Sync directories (only copy new/changed files)
gsutil -m rsync -r source/ destination/
# Set public read access
gsutil -m acl ch -r -u AllUsers:R gs://crop_parts/
# Generate signed URL (for private files)
gsutil signurl -d 1h key.json gs://crop_parts/docs/part.pdfRelated Documentation
Contact
For questions about this migration:
- Technical Lead: [Add contact]
- DevOps: [Add contact]
- Database Admin: [Add contact]
Document Version: 1.0 Created: 2024-11-20 Last Updated: 2024-11-20