Docs GitHub CROP Documentation

Tools

Shared

Projects

Frontend

Backend

PDF parsing service for extracting parts data from manufacturer catalogs

Getting Started

CROP PDF Parser Service - Documentation

API

API Query Format

Data

GCS PDF Links Generation

Architecture

System Architecture - Complete Guide

Reference

Agent Prompts and Purchasing Consultation Features Checklist

Data Preparation Service

Data Preparation Service Testing Data Preparation Pipeline

Clip Service

API Usage Example for Uploading Part Photos CLIP Image Vectorization Service CLIP Embeddings and Metadata Search - Complete Guide

Grafana Service

Grafana Service for Event Sourcing

Pdf Parser Service

PDF Parser Service Architecture PDF Parser Service

Gcs Service

Ai Service

McHale Vendor Agent Specialist - Personality & Role Definition Changelog - LLaMA 3.1 8B & BGE-Large Embeddings Embeddings Configuration MongoDB Integration for Training Data AI Service - McHale Parts Co-Pilot

Barcode Service

Barcode/QR Code Detection Service

Frontend

Weaviate Event Sourcing Dashboard

Linear Service

Linear Integration Service

Mlflow Service

Weaviate Service

Event Sourcing Guide for Weaviate Service Weaviate Service

ProjectsPDF Parser

Changelog - LLaMA 3.1 8B & BGE-Large Embeddings

1. Replaced OpenAI with LLaMA 3.1 8B - ✅ Created for LLaMA API configuration - ✅ Updated to use LLaMA instead of OpenAI - ✅ Support for vLLM (self-hosted) and...

Changelog - LLaMA 3.1 8B & BGE-Large Embeddings

Changes Made

1. Replaced OpenAI with LLaMA 3.1 8B

✅ Created llm_config.py for LLaMA API configuration
✅ Updated agents.py to use LLaMA instead of OpenAI
✅ Support for vLLM (self-hosted) and Vertex AI endpoints
✅ Removed dependency on OpenAI API key

2. Replaced Embeddings with BGE-Large

✅ Updated main.py to use BAAI/bge-large-en-v1.5 embeddings
✅ Configurable via EMBEDDING_MODEL environment variable
✅ Support for alternative models: thenlper/gte-large
✅ Added normalize_embeddings=True for better similarity search
✅ Important: Using dedicated embedding model, NOT LLaMA for embeddings

3. vLLM Deployment Scripts

✅ deploy_llama.sh - Creates GCP VM with GPU
✅ setup_llama_vm.sh - Part 1: Install NVIDIA drivers
✅ setup_llama_vm_part2.sh - Part 2: Install CUDA and vLLM
✅ Systemd service configuration for vLLM
✅ Support for HuggingFace token for model access

4. Updated Configuration

✅ cloudbuild.yaml - Updated environment variables
✅ Removed OPENAI_API_KEY dependency
✅ Added LLAMA_API_URL, EMBEDDING_MODEL, EMBEDDING_DEVICE
✅ Updated requirements.txt - Removed OpenAI, added sentence-transformers

5. Documentation

✅ DEPLOYMENT.md - Complete deployment guide
✅ EMBEDDINGS.md - Embedding models guide
✅ README.md - Updated with new configuration
✅ weaviate_gcp_setup.md - Weaviate setup instructions
✅ llama_gcp_setup.md - LLaMA setup instructions

Environment Variables

Required

WEAVIATE_URL - Weaviate cluster URL
LLAMA_API_URL - vLLM server URL (e.g., http://EXTERNAL_IP:8000)

Optional

WEAVIATE_API_KEY - For Weaviate Cloud
VERTEX_AI_ENDPOINT - Alternative to vLLM (Vertex AI)
EMBEDDING_MODEL - Embedding model (default: BAAI/bge-large-en-v1.5)
EMBEDDING_DEVICE - Device for embeddings (cpu or cuda)
HUGGING_FACE_HUB_TOKEN - For accessing LLaMA models (if required)

Deployment Steps

Deploy Weaviate (Cloud or self-hosted)
Deploy LLaMA 3.1 8B using vLLM on GCP Compute Engine
Prepare data using data_preparation.py
Deploy AI service to Cloud Run

See deployment section in README.md for detailed instructions.

Cost Estimation

LLaMA 3.1 8B on T4 GPU: ~$1-4/hour
Weaviate Cloud: ~$50-200/month
Cloud Run: Pay per use
Total: ~$100-500/month

Key Improvements

No OpenAI dependency - Fully self-hosted solution
Better embeddings - BGE-Large for improved RAG quality
Cost-effective - ~$1-4/hour vs $25/hour for OpenAI
Better instruction following - LLaMA 3.1 8B excels at JSON and instructions
Scalable - Can run on single GPU (T4/L4/A10)

McHale Vendor Agent Specialist - Personality & Role Definition

Name: McHale Parts Co-Pilot Role: Vendor Agent Specialist Company: McHale / Clinton Tractor Primary Function: Parts search assistant and co-pilot for parts...

Embeddings Configuration

Recommended Embedding Models

On this page

Changelog - LLaMA 3.1 8B & BGE-Large Embeddings Changes Made 1. Replaced OpenAI with LLaMA 3.1 8B 2. Replaced Embeddings with BGE-Large 3. vLLM Deployment Scripts 4. Updated Configuration 5. Documentation Environment Variables Required Optional Deployment Steps Cost Estimation Key Improvements