AI-powered PDF parsing and parts extraction system for agricultural equipment manuals.

PDF Parser Service

Repository: CT-CROP/CROP-pdf-parser-service Last updated: 2026-02-18 Last synced to docs: 2026-03-10

Complete microservices-based system for parsing PDF manuals and providing AI-powered parts search using RAG (Retrieval-Augmented Generation).

Key Features

Semantic search with natural language queries
PDF parsing with automatic extraction of tables, schemas, and images
AI Agents with LangChain/LangGraph and LLaMA 3.1 8B
CLIP-based visual search for image similarity
Weaviate vector database for RAG retrieval
React chat interface with interactive PDF viewer

Services

Service	Port	Description
PDF Parser	8080	FastAPI service for parsing PDF manuals
Data Preparation	8004	MLflow-based pipeline for data processing
Weaviate Service	8005	Vector database service with BGE-Large embeddings
AI Service	8001	LangChain/LangGraph agents for parts lookup
CLIP Service	8002	Image vectorization (GPU required)
Barcode Service	8003	Barcode/QR code detection
Frontend	3000	React/TypeScript chat interface
MLflow	5000	Pipeline monitoring and experiment tracking

Quick Start

cp env.deploy.template .env.deploy
# Edit .env.deploy with your configuration
docker-compose up -d

API Examples

# Parse PDF
curl -X POST "http://localhost:8080/parse" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@manual.pdf"

# Search Parts
curl -X POST "http://localhost:8001/query" \
  -H "Content-Type: application/json" \
  -d '{"query": "Find oil filter for combine"}'

Tech Stack

Python 3.11+, FastAPI, LangChain/LangGraph, PyMuPDF, Weaviate, CLIP, Docker Compose, GCP Cloud Run.

PDF Parser docs — detailed architecture and API documentation

PDF Parser Service

PDF Parser Service

Key Features

Services

Quick Start

API Examples

Tech Stack

Related Documentation

On this page