PDF Parser Service
AI-powered PDF parsing and parts extraction system for agricultural equipment manuals.
PDF Parser Service
Repository: CT-CROP/CROP-pdf-parser-service Last updated: 2026-02-18 Last synced to docs: 2026-03-10
Complete microservices-based system for parsing PDF manuals and providing AI-powered parts search using RAG (Retrieval-Augmented Generation).
Key Features
- Semantic search with natural language queries
- PDF parsing with automatic extraction of tables, schemas, and images
- AI Agents with LangChain/LangGraph and LLaMA 3.1 8B
- CLIP-based visual search for image similarity
- Weaviate vector database for RAG retrieval
- React chat interface with interactive PDF viewer
Services
| Service | Port | Description |
|---|---|---|
| PDF Parser | 8080 | FastAPI service for parsing PDF manuals |
| Data Preparation | 8004 | MLflow-based pipeline for data processing |
| Weaviate Service | 8005 | Vector database service with BGE-Large embeddings |
| AI Service | 8001 | LangChain/LangGraph agents for parts lookup |
| CLIP Service | 8002 | Image vectorization (GPU required) |
| Barcode Service | 8003 | Barcode/QR code detection |
| Frontend | 3000 | React/TypeScript chat interface |
| MLflow | 5000 | Pipeline monitoring and experiment tracking |
Quick Start
cp env.deploy.template .env.deploy
# Edit .env.deploy with your configuration
docker-compose up -dAPI Examples
# Parse PDF
curl -X POST "http://localhost:8080/parse" \
-H "Content-Type: multipart/form-data" \
-F "file=@manual.pdf"
# Search Parts
curl -X POST "http://localhost:8001/query" \
-H "Content-Type: application/json" \
-d '{"query": "Find oil filter for combine"}'Tech Stack
Python 3.11+, FastAPI, LangChain/LangGraph, PyMuPDF, Weaviate, CLIP, Docker Compose, GCP Cloud Run.
Related Documentation
- PDF Parser docs — detailed architecture and API documentation