RAG Pipelines
Queue-based PDF processing service with RabbitMQ workers for parts extraction.
RAG Pipelines
Repository: CT-CROP/CROP-RAG-Pipelines Last updated: 2026-02-18 Last synced to docs: 2026-03-10
Queue-based PDF processing service for agricultural equipment manuals.
Features
- PDF Parsing: Extract text, tables, part numbers (13+ patterns), images
- Queue Processing: RabbitMQ for reliable task distribution
- Job Tracking: Redis with 7-day TTL
- GCS Storage: Save results to Google Cloud Storage
- Scalable: 10+ workers with automatic load balancing
Architecture
Client → API (8080) → RabbitMQ → Batch Worker → RabbitMQ → PDF Workers (×10) → GCS
↓ ↓
Redis (job status) Redis (progress)Quick Start
docker-compose up --build
# API: http://localhost:8080
# RabbitMQ UI: http://localhost:15672 (admin/admin)API Endpoints
POST /api/upload-pdf # Upload single PDF
POST /api/process-bucket-pdfs # Batch process from GCS
GET /api/jobs/{batch_id} # Job status
GET /api/jobs # List all jobs
GET /health # Health checkTech Stack
FastAPI, RabbitMQ, Redis, PyMuPDF, Pillow, Google Cloud Storage, Docker Compose.
Related Documentation
- RAG Pipelines docs — overview and architecture