Embeddings Configuration
Recommended Embedding Models
Embeddings Configuration
Recommended Embedding Models
For RAG (Retrieval-Augmented Generation), use dedicated embedding models, NOT the LLM model.
Text Embeddings (Recommended)
-
BGE-Large ⭐ (Currently configured)
- Model:
BAAI/bge-large-en-v1.5 - Best for: General RAG, high quality embeddings
- Size: ~1.3GB
- Performance: Excellent
- Model:
-
GTE-Large
- Model:
thenlper/gte-large - Best for: Multilingual and general purpose
- Size: ~1.3GB
- Performance: Excellent
- Model:
-
OpenAI text-embedding-3-large
- Requires OpenAI API key
- Best for: Production with OpenAI integration
- Cost: Pay per token
Why NOT use LLaMA for embeddings?
- LLaMA is a generative model, not optimized for embeddings
- Dedicated embedding models are:
- Faster for embedding generation
- Better at semantic similarity
- Smaller and more efficient
- Trained specifically for retrieval tasks
Configuration
Environment Variables
# Embedding model (default: BAAI/bge-large-en-v1.5)
export EMBEDDING_MODEL="BAAI/bge-large-en-v1.5"
# Device for embeddings (cpu or cuda)
export EMBEDDING_DEVICE="cpu" # Use "cuda" if GPU availableIn Code
The embedding model is configured in main.py:
embedding_model = os.getenv("EMBEDDING_MODEL", "BAAI/bge-large-en-v1.5")
device = os.getenv("EMBEDDING_DEVICE", "cpu")
embeddings = HuggingFaceEmbeddings(
model_name=embedding_model,
model_kwargs={"device": device},
encode_kwargs={"normalize_embeddings": True}
)Performance Tips
-
Use GPU for embeddings if available:
export EMBEDDING_DEVICE="cuda" -
Normalize embeddings (already enabled):
- Improves similarity search quality
- Better cosine similarity calculations
-
Model size vs. quality trade-off:
bge-large: Best quality, slowerbge-base: Good balanceall-MiniLM-L6-v2: Fast, lower quality
Switching Models
To switch to GTE-Large:
export EMBEDDING_MODEL="thenlper/gte-large"Then restart the service. The model will be downloaded automatically on first use.
Changelog - LLaMA 3.1 8B & BGE-Large Embeddings
1. Replaced OpenAI with LLaMA 3.1 8B - ✅ Created for LLaMA API configuration - ✅ Updated to use LLaMA instead of OpenAI - ✅ Support for vLLM (self-hosted) and...
MongoDB Integration for Training Data
This document describes how MongoDB is integrated to enrich training data with additional part information.