CROP
ProjectsPDF Parser

Weaviate Service

FastAPI service for storing and retrieving RAG documents in Weaviate vector database.

Weaviate Service

FastAPI service for storing and retrieving RAG documents in Weaviate vector database.

Overview

This service receives RAG documents from the /prepare-rag endpoint of pdf-parser-service, generates embeddings using BGE-Large model, and stores them in Weaviate for semantic search.

Features

  • Store RAG Documents: Receive documents from PDF parser and store in Weaviate
  • Semantic Search: Search for similar parts using natural language queries
  • Embedding Generation: Automatic embedding generation using BGE-Large model
  • Metadata Filtering: Filter results by part number, source PDF, page, etc.

Architecture

pdf-parser-service (/prepare-rag)
    ↓ JSON documents
weaviate-service (/store)
    ↓ Generate embeddings
    ↓ Store in Weaviate
Weaviate Vector Database

Local Development

Prerequisites

  • Python 3.11+
  • Weaviate instance running (local or remote)
  • BGE-Large model (downloaded automatically on first run)

Installation

cd weaviate_service
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

Environment Variables

Create a .env file:

For Weaviate Cloud (recommended):

WEAVIATE_URL=https://your-cluster.c0.us-east1.gcp.weaviate.cloud
WEAVIATE_API_KEY=your-weaviate-api-key-here
EMBEDDING_MODEL=BAAI/bge-large-en-v1.5
EMBEDDING_DEVICE=cpu  # or cuda if GPU available
PORT=8002

For Self-hosted Weaviate:

WEAVIATE_URL=http://localhost:8080
WEAVIATE_API_KEY=  # Optional, if Weaviate requires authentication
EMBEDDING_MODEL=BAAI/bge-large-en-v1.5
EMBEDDING_DEVICE=cpu  # or cuda if GPU available
PORT=8002

Running the Service

uvicorn main:app --reload --port 8002

The service will be available at http://localhost:8002

API documentation: http://localhost:8002/docs

API Endpoints

POST /store

Store RAG documents in Weaviate using batch API (optimized for performance).

⚠️ Important: Always send all documents in one request!

The service uses Weaviate's batch API which is 10-100x faster than storing documents one by one.

Request:

{
  "documents": [
    {
      "content": "Item Number: 1\nPart Number: ACH02226\n...",
      "metadata": {
        "item_number": "1",
        "part_number": "ACH02226",
        "description": "REEL BEARING FLOATING ASSY",
        "quantity": "5",
        "page": 0,
        "source_pdf": "F5-540-F5-540C.pdf",
        "bbox": [10.44, 31.23, 11.68, 33.09]
      }
    },
    {
      "content": "Item Number: 2\nPart Number: ACN00046\n...",
      "metadata": {...}
    }
  ],
  "source_pdf": "F5-540-F5-540C.pdf"
}

Response:

{
  "success": true,
  "stored_count": 2,
  "failed_count": 0,
  "errors": []
}

Performance:

  • Batch processing: All documents processed together
  • Batch size: 100 documents per batch
  • Parallel workers: 2 concurrent workers
  • Speed: 10-100x faster than individual storage
  • Automatic fallback: Falls back to individual storage if batch fails

Best Practice:

  • Send all documents from one PDF in a single request
  • Typical PDF with 40-100 parts: send all at once
  • Don't split into multiple requests - it's much slower!

POST /search

Semantic search in Weaviate (full search with all options).

Request:

{
  "query": "bearing for reel",
  "limit": 10,
  "where": {
    "path": ["source_pdf"],
    "operator": "Equal",
    "valueString": "F5-540-F5-540C.pdf"
  },
  "score_threshold": 0.7
}

Response:

{
  "query": "bearing for reel",
  "results": [
    {
      "content": "Item Number: 1\nPart Number: ACH02226\n...",
      "metadata": {...},
      "score": 0.85
    }
  ],
  "count": 1
}

POST /retrieveRecommended for LLM Agents

RAG retrieval endpoint optimized for LLM agents. Returns documents in a format suitable for context injection.

Request:

{
  "query": "bearing for reel",
  "limit": 5,
  "source_pdf": "F5-540-F5-540C.pdf",
  "part_number": null,
  "page": null,
  "score_threshold": 0.7,
  "include_metadata": true
}

Response:

{
  "query": "bearing for reel",
  "documents": [
    {
      "content": "Item Number: 1\nPart Number: ACH02226\nDescription: REEL BEARING FLOATING ASSY\n...",
      "metadata": {
        "item_number": "1",
        "part_number": "ACH02226",
        "description": "REEL BEARING FLOATING ASSY",
        "quantity": "5",
        "page": 0,
        "source_pdf": "F5-540-F5-540C.pdf",
        "bbox": [10.44, 31.23, 11.68, 33.09]
      },
      "score": 0.85
    }
  ],
  "count": 1,
  "context": "[1] Part: ACH02226 | REEL BEARING FLOATING ASSY | Item #1 | Item Number: 1\nPart Number: ACH02226\n...",
  "filters": {
    "source_pdf": "F5-540-F5-540C.pdf",
    "part_number": null,
    "page": null
  }
}

Key Features:

  • ✅ Optimized default limit (5 documents) for RAG
  • ✅ Easy filtering by source_pdf, part_number, page
  • ✅ Pre-formatted context string ready for LLM injection
  • ✅ Structured response format
  • ✅ Optional metadata inclusion

GET /health

Health check endpoint.

GET /schema

Get Weaviate schema.

DELETE /schema

Delete Weaviate schema (use with caution!).

Integration with PDF Parser Service and LLM Agents

Workflow

  1. Parse PDFpdf-parser-service /parse and /parse-table
  2. Prepare RAGpdf-parser-service /prepare-rag
  3. Store in Weaviateweaviate-service /store
  4. Retrieve for LLMweaviate-service /retrieve (used by LLM agents)

Example Integration

import requests

# Step 1: Parse PDF
pdf_response = requests.post(
    "http://localhost:8000/parse",
    files={"file": open("manual.pdf", "rb")}
)
schema_data = pdf_response.json()

table_response = requests.post(
    "http://localhost:8000/parse-table",
    files={"file": open("manual.pdf", "rb")}
)
table_data = table_response.json()

# Step 2: Prepare RAG
rag_response = requests.post(
    "http://localhost:8000/prepare-rag",
    json={
        "tables": table_data,
        "schemas": schema_data,
        "source_pdf": "manual.pdf"
    }
)
rag_documents = rag_response.json()

# Step 3: Store in Weaviate
store_response = requests.post(
    "http://localhost:8002/store",
    json={
        "documents": rag_documents["documents"],
        "source_pdf": "manual.pdf"
    }
)
print(store_response.json())

# Step 4: Retrieve for LLM Agent
retrieve_response = requests.post(
    "http://localhost:8002/retrieve",
    json={
        "query": "bearing for reel",
        "limit": 5,
        "source_pdf": "manual.pdf"
    }
)
rag_context = retrieve_response.json()["context"]
# Use rag_context in LLM prompt

Deployment to GCP

Prerequisites

  1. GCP project with billing enabled
  2. gcloud CLI installed and authenticated
  3. Weaviate instance deployed (see deployment section below)
  4. Environment variables in .env.deploy (project root):
    • PROJECT_ID
    • REGION
    • WEAVIATE_URL
    • WEAVIATE_API_KEY (if required)

Deploy to Cloud Run

cd weaviate_service
chmod +x deploy.sh
./deploy.sh

Environment Variables for Cloud Run

Add to .env.deploy (project root):

# Weaviate Service
_WEAVIATE_URL=https://your-weaviate-instance.run.app
_WEAVIATE_API_KEY=your-api-key  # Optional
_EMBEDDING_MODEL=BAAI/bge-large-en-v1.5
_EMBEDDING_DEVICE=cpu

Update cloudbuild.yaml with these substitutions.

Weaviate Schema

The service automatically creates a schema with the following properties:

  • content (text): Document content for semantic search
  • item_number (string): Item number in diagram
  • part_number (string): Part number for exact search
  • description (text): Part description
  • quantity (string): Quantity
  • notes (text): Additional notes
  • page (int): Page number
  • source_pdf (string): Source PDF file name
  • type (string): Document type
  • bbox (number[]): Bounding box coordinates
  • table_bbox (number[]): Table bounding box
  • schema_location_count (int): Number of schema locations
  • schema_locations (object): Schema locations (stored as JSON string)

Performance Considerations

Batch vs Individual Storage

✅ Always use batch storage (send all documents in one request):

  • 10-100x faster than individual storage
  • Processes 100 documents per batch
  • Uses 2 parallel workers
  • Automatic fallback if batch fails

❌ Don't send documents one by one:

  • Much slower (each document = separate HTTP request)
  • Higher latency
  • More network overhead

Other Performance Tips

  • Embedding Generation: First request may be slow as model loads (~30s)
  • Batch Embeddings: Documents are embedded in batches of 32
  • Memory: BGE-Large model requires ~1.5GB RAM
  • GPU: Set EMBEDDING_DEVICE=cuda for faster embeddings (if GPU available)
  • Typical Performance:
    • 100 documents: ~5-10 seconds (with embeddings)
    • 1000 documents: ~30-60 seconds (with embeddings)

Troubleshooting

Connection Errors

  • Verify WEAVIATE_URL is correct
  • Check if Weaviate instance is running
  • Verify API key if authentication is required

Embedding Model Errors

  • Ensure sufficient disk space for model download (~1.5GB)
  • Check internet connection for model download
  • Verify EMBEDDING_DEVICE matches available hardware

Storage Errors

  • Verify Weaviate schema is created
  • Check Weaviate instance has sufficient storage
  • Review Weaviate logs for detailed errors

On this page