AI Service - McHale Parts Co-Pilot

AI service with LangChain/LangGraph agents for parts lookup and assistance.

Agent Identity

McHale Vendor Agent Specialist - A professional parts search co-pilot for McHale/Clinton Tractor.

Personality: Professional, knowledgeable, helpful, and proactive parts specialist.

Core Function: Act as a co-pilot for parts search, always offering store links when part numbers are found.

See AGENT_PERSONALITY.md for complete personality and role definition.

Architecture

Main Agent (McHale Co-Pilot): Professional parts search assistant with defined personality and business rules
Expert Agent: Expert on schemas and tables for specialized technical queries
RAG Pipeline: Retrieval-Augmented Generation with Weaviate vector store
LLM: LLaMA 3.1 8B (deployed with vLLM on GCP)
Embeddings: BGE-Large (BAAI/bge-large-en-v1.5) - high-quality embeddings for RAG
- Alternative: thenlper/gte-large or text-embedding-3-large
- Important: Separate embedding model, NOT using LLaMA for embeddings

Business Rules

Store Link Rule: ALWAYS ask users if they want a store link when a part number is found
Consistency Rule: Use consistent response format defined by business
Co-Pilot Function: Act as a search co-pilot that assists in parts discovery
Professionalism: Maintain professional, helpful tone in all responses

Features

RAG-based parts search in PDF manuals
Structured JSON responses with part numbers, descriptions, coordinates
Store link integration (with user confirmation)
Expert agent for schema/table queries
LangGraph workflow for complex agent reasoning
Fine-tuning support (optional, see Fine-Tuning + RAG section below)

Setup

1. Install dependencies:

cd ai_service
pip install -r requirements.txt

2. Set environment variables:

# Weaviate configuration
export WEAVIATE_URL="http://localhost:8080"  # or your Weaviate cloud URL
export WEAVIATE_API_KEY=""  # Only needed for Weaviate Cloud

# LLaMA API configuration (choose one)
# Option 1: Self-hosted vLLM
export LLAMA_API_URL="http://localhost:8000"  # or your vLLM server URL

# Option 2: Vertex AI
export VERTEX_AI_ENDPOINT="https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT/locations/us-central1/endpoints/ENDPOINT"

# PDF Parser Service (for data preparation)
export PDF_PARSER_API_URL="http://localhost:8000"

# Embedding model configuration (optional)
export EMBEDDING_MODEL="BAAI/bge-large-en-v1.5"  # or "thenlper/gte-large"
export EMBEDDING_DEVICE="cpu"  # or "cuda" if GPU available

3. Prepare data from PDF:

# Make sure pdf-parser-service is running
# Make sure F5-540-F5-540C.pdf is in the project root
export PDF_PARSER_API_URL="http://localhost:8000"
python data_preparation.py

This will create:

data/tables.json - Extracted tables
data/schemas.json - Extracted schemas with coordinates
data/rag_documents.json - Documents prepared for RAG indexing

4. Validate prepared data:

# Validate data quality and suitability for RAG
python validate_data.py

This will:

Check data structure and completeness
Test embedding generation
Test retrieval quality with sample queries
Generate a validation report in data/validation_report.txt

See data validation section below for detailed information.

5. Set up Weaviate:

Local (Docker):

docker run -d -p 8080:8080 -p 50051:50051 \
  -e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true \
  -e PERSISTENCE_DATA_PATH=/var/lib/weaviate \
  -e DEFAULT_VECTORIZER_MODULE=none \
  -e ENABLE_MODULES= \
  -e CLUSTER_HOSTNAME=node1 \
  semitechnologies/weaviate:latest

Weaviate Cloud:

Sign up at https://weaviate.io/cloud
Get your cluster URL and API key
Set WEAVIATE_URL environment variable

6. Run the service:

python main.py

Service will run on http://localhost:8001

API Endpoints

`POST /query`

Query the main AI assistant.

Request:

{
  "query": "Find part number 10",
  "include_store_link": false
}

Response:

{
  "answer": "Found part 10: WASHER SPRING 10MM H.D Z/P",
  "parts": [
    {
      "part_number": "10",
      "description": "WASHER SPRING 10MM H.D Z/P",
      "pdf_link": "https://bucket.mchale.com/manuals/F5-540-F5-540C.pdf#page=6",
      "page": 6,
      "coordinates": [50.0, 728.0, 61.0, 783.0],
      "store_link": null
    }
  ],
  "confidence": 0.8,
  "needs_confirmation": true,
  "store_link_question": "Would you like a link to view this part on clintontractor.com/parts/productpage/10?"
}

To confirm store link, make another request with include_store_link: true:

{
  "query": "Find part number 10",
  "include_store_link": true
}

Response with store link:

{
  "answer": "...",
  "parts": [
    {
      "part_number": "10",
      "description": "WASHER SPRING 10MM H.D Z/P",
      "store_link": "https://clintontractor.com/parts/productpage/10",
      ...
    }
  ],
  "needs_confirmation": false
}

`POST /query-expert`

Query the expert agent for schemas and tables.

`GET /health`

Health check endpoint.

Deployment to GCP

See deployment section below for detailed instructions.

Quick steps:

Deploy Weaviate (Cloud or self-hosted)
Deploy LLaMA 3.1 8B on GCP Compute Engine with GPU
Prepare data using data_preparation.py
Deploy AI service to Cloud Run:

./deploy.sh

Or manually:

gcloud builds submit --config cloudbuild.yaml . \
    --substitutions=_REGION=us-central1,_WEAVIATE_URL=...,_LLAMA_API_URL=...

Set environment variables in Cloud Run:

gcloud run services update ai-service \
  --region=us-central1 \
  --set-env-vars="WEAVIATE_URL=...,WEAVIATE_API_KEY=...,LLAMA_API_URL=..."

Data Structure

RAG Documents Format:

{
  "id": "part_10_page_6",
  "part_number": "10",
  "description": "WASHER SPRING 10MM H.D Z/P",
  "page": 6,
  "type": "table",
  "content": "Part Number: 10\nDescription: WASHER SPRING 10MM H.D Z/P\nPage: 6",
  "metadata": {
    "page": 6,
    "type": "table",
    "bbox": [x0, y0, x1, y1]
  }
}

Agent Workflows

Main Agent (LangGraph):

Search RAG → Extract parts → Check store link → Generate response

Expert Agent (LangGraph):

Search RAG → Extract parts → Generate structured response

Fine-Tuning + RAG (Recommended Approach)

Recommended Workflow: Fine-Tuning FIRST, then RAG. This gives the best results.

Why Fine-Tuning + RAG?

Fine-Tuning teaches the model:
- Domain-specific behavior (McHale Parts Co-Pilot personality)
- Consistent response format
- Business rules (store link questions, etc.)
- How to use RAG context effectively
RAG provides:
- Real-time data from PDFs
- Actual part numbers, descriptions, pages
- Up-to-date information without retraining

Complete Workflow

Prepare data from PDFs:
```
python data_preparation.py
```
Prepare training data:
```
python prepare_training_data.py
```
Creates training_data.json with 1000+ examples
Fine-tune model:
```
python fine_tune_lora.py
```
Creates LoRA adapters in ./lora_adapters/
Deploy fine-tuned model: Deploy the fine-tuned model to your vLLM server or Vertex AI endpoint.

Index RAG data:

python main.py  # Automatically indexes RAG documents

Use together: Fine-tuned model + RAG during inference

Full guide: See Fine-Tuning + RAG section above for complete workflow details.

Data Requirements

For Fine-Tuning

Minimum: 1000 training examples (Q&A pairs)
Recommended: 2000-5000 examples
Source: Generated from RAG documents + business examples

For RAG

✅ 130 pages is enough to start
✅ Indexed in Weaviate
✅ Can add more PDFs anytime (no retraining needed)

Combined Approach

Fine-tuning: 1000+ examples from RAG documents
RAG: 130+ pages for actual data
Result: Best quality responses with real data

AI Service - McHale Parts Co-Pilot

On this page