CROP
ProjectsPDF Parser

AI Service - McHale Parts Co-Pilot

AI service with LangChain/LangGraph agents for parts lookup and assistance.

AI Service - McHale Parts Co-Pilot

AI service with LangChain/LangGraph agents for parts lookup and assistance.

Agent Identity

McHale Vendor Agent Specialist - A professional parts search co-pilot for McHale/Clinton Tractor.

Personality: Professional, knowledgeable, helpful, and proactive parts specialist.

Core Function: Act as a co-pilot for parts search, always offering store links when part numbers are found.

See AGENT_PERSONALITY.md for complete personality and role definition.

Architecture

  • Main Agent (McHale Co-Pilot): Professional parts search assistant with defined personality and business rules
  • Expert Agent: Expert on schemas and tables for specialized technical queries
  • RAG Pipeline: Retrieval-Augmented Generation with Weaviate vector store
  • LLM: LLaMA 3.1 8B (deployed with vLLM on GCP)
  • Embeddings: BGE-Large (BAAI/bge-large-en-v1.5) - high-quality embeddings for RAG
    • Alternative: thenlper/gte-large or text-embedding-3-large
    • Important: Separate embedding model, NOT using LLaMA for embeddings

Business Rules

  1. Store Link Rule: ALWAYS ask users if they want a store link when a part number is found
  2. Consistency Rule: Use consistent response format defined by business
  3. Co-Pilot Function: Act as a search co-pilot that assists in parts discovery
  4. Professionalism: Maintain professional, helpful tone in all responses

Features

  • RAG-based parts search in PDF manuals
  • Structured JSON responses with part numbers, descriptions, coordinates
  • Store link integration (with user confirmation)
  • Expert agent for schema/table queries
  • LangGraph workflow for complex agent reasoning
  • Fine-tuning support (optional, see Fine-Tuning + RAG section below)

Setup

1. Install dependencies:

cd ai_service
pip install -r requirements.txt

2. Set environment variables:

# Weaviate configuration
export WEAVIATE_URL="http://localhost:8080"  # or your Weaviate cloud URL
export WEAVIATE_API_KEY=""  # Only needed for Weaviate Cloud

# LLaMA API configuration (choose one)
# Option 1: Self-hosted vLLM
export LLAMA_API_URL="http://localhost:8000"  # or your vLLM server URL

# Option 2: Vertex AI
export VERTEX_AI_ENDPOINT="https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT/locations/us-central1/endpoints/ENDPOINT"

# PDF Parser Service (for data preparation)
export PDF_PARSER_API_URL="http://localhost:8000"

# Embedding model configuration (optional)
export EMBEDDING_MODEL="BAAI/bge-large-en-v1.5"  # or "thenlper/gte-large"
export EMBEDDING_DEVICE="cpu"  # or "cuda" if GPU available

3. Prepare data from PDF:

# Make sure pdf-parser-service is running
# Make sure F5-540-F5-540C.pdf is in the project root
export PDF_PARSER_API_URL="http://localhost:8000"
python data_preparation.py

This will create:

  • data/tables.json - Extracted tables
  • data/schemas.json - Extracted schemas with coordinates
  • data/rag_documents.json - Documents prepared for RAG indexing

4. Validate prepared data:

# Validate data quality and suitability for RAG
python validate_data.py

This will:

  • Check data structure and completeness
  • Test embedding generation
  • Test retrieval quality with sample queries
  • Generate a validation report in data/validation_report.txt

See data validation section below for detailed information.

5. Set up Weaviate:

Local (Docker):

docker run -d -p 8080:8080 -p 50051:50051 \
  -e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true \
  -e PERSISTENCE_DATA_PATH=/var/lib/weaviate \
  -e DEFAULT_VECTORIZER_MODULE=none \
  -e ENABLE_MODULES= \
  -e CLUSTER_HOSTNAME=node1 \
  semitechnologies/weaviate:latest

Weaviate Cloud:

6. Run the service:

python main.py

Service will run on http://localhost:8001

API Endpoints

POST /query

Query the main AI assistant.

Request:

{
  "query": "Find part number 10",
  "include_store_link": false
}

Response:

{
  "answer": "Found part 10: WASHER SPRING 10MM H.D Z/P",
  "parts": [
    {
      "part_number": "10",
      "description": "WASHER SPRING 10MM H.D Z/P",
      "pdf_link": "https://bucket.mchale.com/manuals/F5-540-F5-540C.pdf#page=6",
      "page": 6,
      "coordinates": [50.0, 728.0, 61.0, 783.0],
      "store_link": null
    }
  ],
  "confidence": 0.8,
  "needs_confirmation": true,
  "store_link_question": "Would you like a link to view this part on clintontractor.com/parts/productpage/10?"
}

To confirm store link, make another request with include_store_link: true:

{
  "query": "Find part number 10",
  "include_store_link": true
}

Response with store link:

{
  "answer": "...",
  "parts": [
    {
      "part_number": "10",
      "description": "WASHER SPRING 10MM H.D Z/P",
      "store_link": "https://clintontractor.com/parts/productpage/10",
      ...
    }
  ],
  "needs_confirmation": false
}

POST /query-expert

Query the expert agent for schemas and tables.

GET /health

Health check endpoint.

Deployment to GCP

See deployment section below for detailed instructions.

Quick steps:

  1. Deploy Weaviate (Cloud or self-hosted)
  2. Deploy LLaMA 3.1 8B on GCP Compute Engine with GPU
  3. Prepare data using data_preparation.py
  4. Deploy AI service to Cloud Run:
./deploy.sh

Or manually:

gcloud builds submit --config cloudbuild.yaml . \
    --substitutions=_REGION=us-central1,_WEAVIATE_URL=...,_LLAMA_API_URL=...

Set environment variables in Cloud Run:

gcloud run services update ai-service \
  --region=us-central1 \
  --set-env-vars="WEAVIATE_URL=...,WEAVIATE_API_KEY=...,LLAMA_API_URL=..."

Data Structure

RAG Documents Format:

{
  "id": "part_10_page_6",
  "part_number": "10",
  "description": "WASHER SPRING 10MM H.D Z/P",
  "page": 6,
  "type": "table",
  "content": "Part Number: 10\nDescription: WASHER SPRING 10MM H.D Z/P\nPage: 6",
  "metadata": {
    "page": 6,
    "type": "table",
    "bbox": [x0, y0, x1, y1]
  }
}

Agent Workflows

Main Agent (LangGraph):

  1. Search RAG → Extract parts → Check store link → Generate response

Expert Agent (LangGraph):

  1. Search RAG → Extract parts → Generate structured response

Recommended Workflow: Fine-Tuning FIRST, then RAG. This gives the best results.

Why Fine-Tuning + RAG?

  1. Fine-Tuning teaches the model:

    • Domain-specific behavior (McHale Parts Co-Pilot personality)
    • Consistent response format
    • Business rules (store link questions, etc.)
    • How to use RAG context effectively
  2. RAG provides:

    • Real-time data from PDFs
    • Actual part numbers, descriptions, pages
    • Up-to-date information without retraining

Complete Workflow

  1. Prepare data from PDFs:

    python data_preparation.py
  2. Prepare training data:

    python prepare_training_data.py

    Creates training_data.json with 1000+ examples

  3. Fine-tune model:

    python fine_tune_lora.py

    Creates LoRA adapters in ./lora_adapters/

  4. Deploy fine-tuned model: Deploy the fine-tuned model to your vLLM server or Vertex AI endpoint.

  5. Index RAG data:

    python main.py  # Automatically indexes RAG documents
  6. Use together: Fine-tuned model + RAG during inference

Full guide: See Fine-Tuning + RAG section above for complete workflow details.

Data Requirements

For Fine-Tuning

  • Minimum: 1000 training examples (Q&A pairs)
  • Recommended: 2000-5000 examples
  • Source: Generated from RAG documents + business examples

For RAG

  • 130 pages is enough to start
  • ✅ Indexed in Weaviate
  • ✅ Can add more PDFs anytime (no retraining needed)

Combined Approach

  • Fine-tuning: 1000+ examples from RAG documents
  • RAG: 130+ pages for actual data
  • Result: Best quality responses with real data

On this page