RAG Pipelines

Repository: CT-CROP/CROP-RAG-Pipelines Last updated: 2026-02-18 Last synced to docs: 2026-03-10

Queue-based PDF processing service for agricultural equipment manuals.

Features

PDF Parsing: Extract text, tables, part numbers (13+ patterns), images
Queue Processing: RabbitMQ for reliable task distribution
Job Tracking: Redis with 7-day TTL
GCS Storage: Save results to Google Cloud Storage
Scalable: 10+ workers with automatic load balancing

Architecture

Client → API (8080) → RabbitMQ → Batch Worker → RabbitMQ → PDF Workers (×10) → GCS
                          ↓                                        ↓
                       Redis (job status)                   Redis (progress)

Quick Start

docker-compose up --build
# API: http://localhost:8080
# RabbitMQ UI: http://localhost:15672 (admin/admin)

API Endpoints

POST /api/upload-pdf                    # Upload single PDF
POST /api/process-bucket-pdfs           # Batch process from GCS
GET /api/jobs/{batch_id}                # Job status
GET /api/jobs                           # List all jobs
GET /health                             # Health check

Tech Stack

FastAPI, RabbitMQ, Redis, PyMuPDF, Pillow, Google Cloud Storage, Docker Compose.

RAG Pipelines docs — overview and architecture

RAG Pipelines

RAG Pipelines

Features

Architecture

Quick Start

API Endpoints

Tech Stack

Related Documentation

On this page