ProjectsPDF Parser

Embeddings Configuration

Recommended Embedding Models

Embeddings Configuration

Recommended Embedding Models

For RAG (Retrieval-Augmented Generation), use dedicated embedding models, NOT the LLM model.

Text Embeddings (Recommended)

BGE-Large ⭐ (Currently configured)
- Model: BAAI/bge-large-en-v1.5
- Best for: General RAG, high quality embeddings
- Size: ~1.3GB
- Performance: Excellent
GTE-Large
- Model: thenlper/gte-large
- Best for: Multilingual and general purpose
- Size: ~1.3GB
- Performance: Excellent
OpenAI text-embedding-3-large
- Requires OpenAI API key
- Best for: Production with OpenAI integration
- Cost: Pay per token

Why NOT use LLaMA for embeddings?

LLaMA is a generative model, not optimized for embeddings
Dedicated embedding models are:
- Faster for embedding generation
- Better at semantic similarity
- Smaller and more efficient
- Trained specifically for retrieval tasks

Configuration

Environment Variables

# Embedding model (default: BAAI/bge-large-en-v1.5)
export EMBEDDING_MODEL="BAAI/bge-large-en-v1.5"

# Device for embeddings (cpu or cuda)
export EMBEDDING_DEVICE="cpu"  # Use "cuda" if GPU available

In Code

The embedding model is configured in main.py:

embedding_model = os.getenv("EMBEDDING_MODEL", "BAAI/bge-large-en-v1.5")
device = os.getenv("EMBEDDING_DEVICE", "cpu")

embeddings = HuggingFaceEmbeddings(
    model_name=embedding_model,
    model_kwargs={"device": device},
    encode_kwargs={"normalize_embeddings": True}
)

Performance Tips

Use GPU for embeddings if available:
```
export EMBEDDING_DEVICE="cuda"
```
Normalize embeddings (already enabled):
- Improves similarity search quality
- Better cosine similarity calculations
Model size vs. quality trade-off:
- bge-large: Best quality, slower
- bge-base: Good balance
- all-MiniLM-L6-v2: Fast, lower quality

Switching Models

To switch to GTE-Large:

export EMBEDDING_MODEL="thenlper/gte-large"

Then restart the service. The model will be downloaded automatically on first use.

Changelog - LLaMA 3.1 8B & BGE-Large Embeddings

1. Replaced OpenAI with LLaMA 3.1 8B - ✅ Created for LLaMA API configuration - ✅ Updated to use LLaMA instead of OpenAI - ✅ Support for vLLM (self-hosted) and...

MongoDB Integration for Training Data

This document describes how MongoDB is integrated to enrich training data with additional part information.

On this page

Embeddings Configuration Recommended Embedding Models Text Embeddings (Recommended)Why NOT use LLaMA for embeddings?Configuration Environment Variables In Code Performance Tips Switching Models