CROP

PDF Parser Service

AI-powered PDF parsing and parts extraction system for agricultural equipment manuals.

PDF Parser Service

Repository: CT-CROP/CROP-pdf-parser-service Last updated: 2026-02-18 Last synced to docs: 2026-03-10

Complete microservices-based system for parsing PDF manuals and providing AI-powered parts search using RAG (Retrieval-Augmented Generation).

Key Features

  • Semantic search with natural language queries
  • PDF parsing with automatic extraction of tables, schemas, and images
  • AI Agents with LangChain/LangGraph and LLaMA 3.1 8B
  • CLIP-based visual search for image similarity
  • Weaviate vector database for RAG retrieval
  • React chat interface with interactive PDF viewer

Services

ServicePortDescription
PDF Parser8080FastAPI service for parsing PDF manuals
Data Preparation8004MLflow-based pipeline for data processing
Weaviate Service8005Vector database service with BGE-Large embeddings
AI Service8001LangChain/LangGraph agents for parts lookup
CLIP Service8002Image vectorization (GPU required)
Barcode Service8003Barcode/QR code detection
Frontend3000React/TypeScript chat interface
MLflow5000Pipeline monitoring and experiment tracking

Quick Start

cp env.deploy.template .env.deploy
# Edit .env.deploy with your configuration
docker-compose up -d

API Examples

# Parse PDF
curl -X POST "http://localhost:8080/parse" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@manual.pdf"

# Search Parts
curl -X POST "http://localhost:8001/query" \
  -H "Content-Type: application/json" \
  -d '{"query": "Find oil filter for combine"}'

Tech Stack

Python 3.11+, FastAPI, LangChain/LangGraph, PyMuPDF, Weaviate, CLIP, Docker Compose, GCP Cloud Run.

On this page