CROP
ProjectsPDF Parser

Embeddings Configuration

Recommended Embedding Models

Embeddings Configuration

For RAG (Retrieval-Augmented Generation), use dedicated embedding models, NOT the LLM model.

  1. BGE-Large ⭐ (Currently configured)

    • Model: BAAI/bge-large-en-v1.5
    • Best for: General RAG, high quality embeddings
    • Size: ~1.3GB
    • Performance: Excellent
  2. GTE-Large

    • Model: thenlper/gte-large
    • Best for: Multilingual and general purpose
    • Size: ~1.3GB
    • Performance: Excellent
  3. OpenAI text-embedding-3-large

    • Requires OpenAI API key
    • Best for: Production with OpenAI integration
    • Cost: Pay per token

Why NOT use LLaMA for embeddings?

  • LLaMA is a generative model, not optimized for embeddings
  • Dedicated embedding models are:
    • Faster for embedding generation
    • Better at semantic similarity
    • Smaller and more efficient
    • Trained specifically for retrieval tasks

Configuration

Environment Variables

# Embedding model (default: BAAI/bge-large-en-v1.5)
export EMBEDDING_MODEL="BAAI/bge-large-en-v1.5"

# Device for embeddings (cpu or cuda)
export EMBEDDING_DEVICE="cpu"  # Use "cuda" if GPU available

In Code

The embedding model is configured in main.py:

embedding_model = os.getenv("EMBEDDING_MODEL", "BAAI/bge-large-en-v1.5")
device = os.getenv("EMBEDDING_DEVICE", "cpu")

embeddings = HuggingFaceEmbeddings(
    model_name=embedding_model,
    model_kwargs={"device": device},
    encode_kwargs={"normalize_embeddings": True}
)

Performance Tips

  1. Use GPU for embeddings if available:

    export EMBEDDING_DEVICE="cuda"
  2. Normalize embeddings (already enabled):

    • Improves similarity search quality
    • Better cosine similarity calculations
  3. Model size vs. quality trade-off:

    • bge-large: Best quality, slower
    • bge-base: Good balance
    • all-MiniLM-L6-v2: Fast, lower quality

Switching Models

To switch to GTE-Large:

export EMBEDDING_MODEL="thenlper/gte-large"

Then restart the service. The model will be downloaded automatically on first use.

On this page