AI Service - McHale Parts Co-Pilot
AI service with LangChain/LangGraph agents for parts lookup and assistance.
AI Service - McHale Parts Co-Pilot
AI service with LangChain/LangGraph agents for parts lookup and assistance.
Agent Identity
McHale Vendor Agent Specialist - A professional parts search co-pilot for McHale/Clinton Tractor.
Personality: Professional, knowledgeable, helpful, and proactive parts specialist.
Core Function: Act as a co-pilot for parts search, always offering store links when part numbers are found.
See AGENT_PERSONALITY.md for complete personality and role definition.
Architecture
- Main Agent (McHale Co-Pilot): Professional parts search assistant with defined personality and business rules
- Expert Agent: Expert on schemas and tables for specialized technical queries
- RAG Pipeline: Retrieval-Augmented Generation with Weaviate vector store
- LLM: LLaMA 3.1 8B (deployed with vLLM on GCP)
- Embeddings: BGE-Large (BAAI/bge-large-en-v1.5) - high-quality embeddings for RAG
- Alternative:
thenlper/gte-largeortext-embedding-3-large - Important: Separate embedding model, NOT using LLaMA for embeddings
- Alternative:
Business Rules
- Store Link Rule: ALWAYS ask users if they want a store link when a part number is found
- Consistency Rule: Use consistent response format defined by business
- Co-Pilot Function: Act as a search co-pilot that assists in parts discovery
- Professionalism: Maintain professional, helpful tone in all responses
Features
- RAG-based parts search in PDF manuals
- Structured JSON responses with part numbers, descriptions, coordinates
- Store link integration (with user confirmation)
- Expert agent for schema/table queries
- LangGraph workflow for complex agent reasoning
- Fine-tuning support (optional, see Fine-Tuning + RAG section below)
Setup
1. Install dependencies:
cd ai_service
pip install -r requirements.txt2. Set environment variables:
# Weaviate configuration
export WEAVIATE_URL="http://localhost:8080" # or your Weaviate cloud URL
export WEAVIATE_API_KEY="" # Only needed for Weaviate Cloud
# LLaMA API configuration (choose one)
# Option 1: Self-hosted vLLM
export LLAMA_API_URL="http://localhost:8000" # or your vLLM server URL
# Option 2: Vertex AI
export VERTEX_AI_ENDPOINT="https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT/locations/us-central1/endpoints/ENDPOINT"
# PDF Parser Service (for data preparation)
export PDF_PARSER_API_URL="http://localhost:8000"
# Embedding model configuration (optional)
export EMBEDDING_MODEL="BAAI/bge-large-en-v1.5" # or "thenlper/gte-large"
export EMBEDDING_DEVICE="cpu" # or "cuda" if GPU available3. Prepare data from PDF:
# Make sure pdf-parser-service is running
# Make sure F5-540-F5-540C.pdf is in the project root
export PDF_PARSER_API_URL="http://localhost:8000"
python data_preparation.pyThis will create:
data/tables.json- Extracted tablesdata/schemas.json- Extracted schemas with coordinatesdata/rag_documents.json- Documents prepared for RAG indexing
4. Validate prepared data:
# Validate data quality and suitability for RAG
python validate_data.pyThis will:
- Check data structure and completeness
- Test embedding generation
- Test retrieval quality with sample queries
- Generate a validation report in
data/validation_report.txt
See data validation section below for detailed information.
5. Set up Weaviate:
Local (Docker):
docker run -d -p 8080:8080 -p 50051:50051 \
-e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true \
-e PERSISTENCE_DATA_PATH=/var/lib/weaviate \
-e DEFAULT_VECTORIZER_MODULE=none \
-e ENABLE_MODULES= \
-e CLUSTER_HOSTNAME=node1 \
semitechnologies/weaviate:latestWeaviate Cloud:
- Sign up at https://weaviate.io/cloud
- Get your cluster URL and API key
- Set
WEAVIATE_URLenvironment variable
6. Run the service:
python main.pyService will run on http://localhost:8001
API Endpoints
POST /query
Query the main AI assistant.
Request:
{
"query": "Find part number 10",
"include_store_link": false
}Response:
{
"answer": "Found part 10: WASHER SPRING 10MM H.D Z/P",
"parts": [
{
"part_number": "10",
"description": "WASHER SPRING 10MM H.D Z/P",
"pdf_link": "https://bucket.mchale.com/manuals/F5-540-F5-540C.pdf#page=6",
"page": 6,
"coordinates": [50.0, 728.0, 61.0, 783.0],
"store_link": null
}
],
"confidence": 0.8,
"needs_confirmation": true,
"store_link_question": "Would you like a link to view this part on clintontractor.com/parts/productpage/10?"
}To confirm store link, make another request with include_store_link: true:
{
"query": "Find part number 10",
"include_store_link": true
}Response with store link:
{
"answer": "...",
"parts": [
{
"part_number": "10",
"description": "WASHER SPRING 10MM H.D Z/P",
"store_link": "https://clintontractor.com/parts/productpage/10",
...
}
],
"needs_confirmation": false
}POST /query-expert
Query the expert agent for schemas and tables.
GET /health
Health check endpoint.
Deployment to GCP
See deployment section below for detailed instructions.
Quick steps:
- Deploy Weaviate (Cloud or self-hosted)
- Deploy LLaMA 3.1 8B on GCP Compute Engine with GPU
- Prepare data using
data_preparation.py - Deploy AI service to Cloud Run:
./deploy.shOr manually:
gcloud builds submit --config cloudbuild.yaml . \
--substitutions=_REGION=us-central1,_WEAVIATE_URL=...,_LLAMA_API_URL=...Set environment variables in Cloud Run:
gcloud run services update ai-service \
--region=us-central1 \
--set-env-vars="WEAVIATE_URL=...,WEAVIATE_API_KEY=...,LLAMA_API_URL=..."Data Structure
RAG Documents Format:
{
"id": "part_10_page_6",
"part_number": "10",
"description": "WASHER SPRING 10MM H.D Z/P",
"page": 6,
"type": "table",
"content": "Part Number: 10\nDescription: WASHER SPRING 10MM H.D Z/P\nPage: 6",
"metadata": {
"page": 6,
"type": "table",
"bbox": [x0, y0, x1, y1]
}
}Agent Workflows
Main Agent (LangGraph):
- Search RAG → Extract parts → Check store link → Generate response
Expert Agent (LangGraph):
- Search RAG → Extract parts → Generate structured response
Fine-Tuning + RAG (Recommended Approach)
Recommended Workflow: Fine-Tuning FIRST, then RAG. This gives the best results.
Why Fine-Tuning + RAG?
-
Fine-Tuning teaches the model:
- Domain-specific behavior (McHale Parts Co-Pilot personality)
- Consistent response format
- Business rules (store link questions, etc.)
- How to use RAG context effectively
-
RAG provides:
- Real-time data from PDFs
- Actual part numbers, descriptions, pages
- Up-to-date information without retraining
Complete Workflow
-
Prepare data from PDFs:
python data_preparation.py -
Prepare training data:
python prepare_training_data.pyCreates
training_data.jsonwith 1000+ examples -
Fine-tune model:
python fine_tune_lora.pyCreates LoRA adapters in
./lora_adapters/ -
Deploy fine-tuned model: Deploy the fine-tuned model to your vLLM server or Vertex AI endpoint.
-
Index RAG data:
python main.py # Automatically indexes RAG documents -
Use together: Fine-tuned model + RAG during inference
Full guide: See Fine-Tuning + RAG section above for complete workflow details.
Data Requirements
For Fine-Tuning
- Minimum: 1000 training examples (Q&A pairs)
- Recommended: 2000-5000 examples
- Source: Generated from RAG documents + business examples
For RAG
- ✅ 130 pages is enough to start
- ✅ Indexed in Weaviate
- ✅ Can add more PDFs anytime (no retraining needed)
Combined Approach
- Fine-tuning: 1000+ examples from RAG documents
- RAG: 130+ pages for actual data
- Result: Best quality responses with real data