Weaviate Service
FastAPI service for storing and retrieving RAG documents in Weaviate vector database.
Weaviate Service
FastAPI service for storing and retrieving RAG documents in Weaviate vector database.
Overview
This service receives RAG documents from the /prepare-rag endpoint of pdf-parser-service, generates embeddings using BGE-Large model, and stores them in Weaviate for semantic search.
Features
- Store RAG Documents: Receive documents from PDF parser and store in Weaviate
- Semantic Search: Search for similar parts using natural language queries
- Embedding Generation: Automatic embedding generation using BGE-Large model
- Metadata Filtering: Filter results by part number, source PDF, page, etc.
Architecture
pdf-parser-service (/prepare-rag)
↓ JSON documents
weaviate-service (/store)
↓ Generate embeddings
↓ Store in Weaviate
Weaviate Vector DatabaseLocal Development
Prerequisites
- Python 3.11+
- Weaviate instance running (local or remote)
- BGE-Large model (downloaded automatically on first run)
Installation
cd weaviate_service
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txtEnvironment Variables
Create a .env file:
For Weaviate Cloud (recommended):
WEAVIATE_URL=https://your-cluster.c0.us-east1.gcp.weaviate.cloud
WEAVIATE_API_KEY=your-weaviate-api-key-here
EMBEDDING_MODEL=BAAI/bge-large-en-v1.5
EMBEDDING_DEVICE=cpu # or cuda if GPU available
PORT=8002For Self-hosted Weaviate:
WEAVIATE_URL=http://localhost:8080
WEAVIATE_API_KEY= # Optional, if Weaviate requires authentication
EMBEDDING_MODEL=BAAI/bge-large-en-v1.5
EMBEDDING_DEVICE=cpu # or cuda if GPU available
PORT=8002Running the Service
uvicorn main:app --reload --port 8002The service will be available at http://localhost:8002
API documentation: http://localhost:8002/docs
API Endpoints
POST /store
Store RAG documents in Weaviate using batch API (optimized for performance).
⚠️ Important: Always send all documents in one request!
The service uses Weaviate's batch API which is 10-100x faster than storing documents one by one.
Request:
{
"documents": [
{
"content": "Item Number: 1\nPart Number: ACH02226\n...",
"metadata": {
"item_number": "1",
"part_number": "ACH02226",
"description": "REEL BEARING FLOATING ASSY",
"quantity": "5",
"page": 0,
"source_pdf": "F5-540-F5-540C.pdf",
"bbox": [10.44, 31.23, 11.68, 33.09]
}
},
{
"content": "Item Number: 2\nPart Number: ACN00046\n...",
"metadata": {...}
}
],
"source_pdf": "F5-540-F5-540C.pdf"
}Response:
{
"success": true,
"stored_count": 2,
"failed_count": 0,
"errors": []
}Performance:
- ✅ Batch processing: All documents processed together
- ✅ Batch size: 100 documents per batch
- ✅ Parallel workers: 2 concurrent workers
- ✅ Speed: 10-100x faster than individual storage
- ✅ Automatic fallback: Falls back to individual storage if batch fails
Best Practice:
- Send all documents from one PDF in a single request
- Typical PDF with 40-100 parts: send all at once
- Don't split into multiple requests - it's much slower!
POST /search
Semantic search in Weaviate (full search with all options).
Request:
{
"query": "bearing for reel",
"limit": 10,
"where": {
"path": ["source_pdf"],
"operator": "Equal",
"valueString": "F5-540-F5-540C.pdf"
},
"score_threshold": 0.7
}Response:
{
"query": "bearing for reel",
"results": [
{
"content": "Item Number: 1\nPart Number: ACH02226\n...",
"metadata": {...},
"score": 0.85
}
],
"count": 1
}POST /retrieve ⭐ Recommended for LLM Agents
RAG retrieval endpoint optimized for LLM agents. Returns documents in a format suitable for context injection.
Request:
{
"query": "bearing for reel",
"limit": 5,
"source_pdf": "F5-540-F5-540C.pdf",
"part_number": null,
"page": null,
"score_threshold": 0.7,
"include_metadata": true
}Response:
{
"query": "bearing for reel",
"documents": [
{
"content": "Item Number: 1\nPart Number: ACH02226\nDescription: REEL BEARING FLOATING ASSY\n...",
"metadata": {
"item_number": "1",
"part_number": "ACH02226",
"description": "REEL BEARING FLOATING ASSY",
"quantity": "5",
"page": 0,
"source_pdf": "F5-540-F5-540C.pdf",
"bbox": [10.44, 31.23, 11.68, 33.09]
},
"score": 0.85
}
],
"count": 1,
"context": "[1] Part: ACH02226 | REEL BEARING FLOATING ASSY | Item #1 | Item Number: 1\nPart Number: ACH02226\n...",
"filters": {
"source_pdf": "F5-540-F5-540C.pdf",
"part_number": null,
"page": null
}
}Key Features:
- ✅ Optimized default limit (5 documents) for RAG
- ✅ Easy filtering by
source_pdf,part_number,page - ✅ Pre-formatted
contextstring ready for LLM injection - ✅ Structured response format
- ✅ Optional metadata inclusion
GET /health
Health check endpoint.
GET /schema
Get Weaviate schema.
DELETE /schema
Delete Weaviate schema (use with caution!).
Integration with PDF Parser Service and LLM Agents
Workflow
- Parse PDF →
pdf-parser-service /parseand/parse-table - Prepare RAG →
pdf-parser-service /prepare-rag - Store in Weaviate →
weaviate-service /store - Retrieve for LLM →
weaviate-service /retrieve(used by LLM agents)
Example Integration
import requests
# Step 1: Parse PDF
pdf_response = requests.post(
"http://localhost:8000/parse",
files={"file": open("manual.pdf", "rb")}
)
schema_data = pdf_response.json()
table_response = requests.post(
"http://localhost:8000/parse-table",
files={"file": open("manual.pdf", "rb")}
)
table_data = table_response.json()
# Step 2: Prepare RAG
rag_response = requests.post(
"http://localhost:8000/prepare-rag",
json={
"tables": table_data,
"schemas": schema_data,
"source_pdf": "manual.pdf"
}
)
rag_documents = rag_response.json()
# Step 3: Store in Weaviate
store_response = requests.post(
"http://localhost:8002/store",
json={
"documents": rag_documents["documents"],
"source_pdf": "manual.pdf"
}
)
print(store_response.json())
# Step 4: Retrieve for LLM Agent
retrieve_response = requests.post(
"http://localhost:8002/retrieve",
json={
"query": "bearing for reel",
"limit": 5,
"source_pdf": "manual.pdf"
}
)
rag_context = retrieve_response.json()["context"]
# Use rag_context in LLM promptDeployment to GCP
Prerequisites
- GCP project with billing enabled
gcloudCLI installed and authenticated- Weaviate instance deployed (see deployment section below)
- Environment variables in
.env.deploy(project root):PROJECT_IDREGIONWEAVIATE_URLWEAVIATE_API_KEY(if required)
Deploy to Cloud Run
cd weaviate_service
chmod +x deploy.sh
./deploy.shEnvironment Variables for Cloud Run
Add to .env.deploy (project root):
# Weaviate Service
_WEAVIATE_URL=https://your-weaviate-instance.run.app
_WEAVIATE_API_KEY=your-api-key # Optional
_EMBEDDING_MODEL=BAAI/bge-large-en-v1.5
_EMBEDDING_DEVICE=cpuUpdate cloudbuild.yaml with these substitutions.
Weaviate Schema
The service automatically creates a schema with the following properties:
content(text): Document content for semantic searchitem_number(string): Item number in diagrampart_number(string): Part number for exact searchdescription(text): Part descriptionquantity(string): Quantitynotes(text): Additional notespage(int): Page numbersource_pdf(string): Source PDF file nametype(string): Document typebbox(number[]): Bounding box coordinatestable_bbox(number[]): Table bounding boxschema_location_count(int): Number of schema locationsschema_locations(object): Schema locations (stored as JSON string)
Performance Considerations
Batch vs Individual Storage
✅ Always use batch storage (send all documents in one request):
- 10-100x faster than individual storage
- Processes 100 documents per batch
- Uses 2 parallel workers
- Automatic fallback if batch fails
❌ Don't send documents one by one:
- Much slower (each document = separate HTTP request)
- Higher latency
- More network overhead
Other Performance Tips
- Embedding Generation: First request may be slow as model loads (~30s)
- Batch Embeddings: Documents are embedded in batches of 32
- Memory: BGE-Large model requires ~1.5GB RAM
- GPU: Set
EMBEDDING_DEVICE=cudafor faster embeddings (if GPU available) - Typical Performance:
- 100 documents: ~5-10 seconds (with embeddings)
- 1000 documents: ~30-60 seconds (with embeddings)
Troubleshooting
Connection Errors
- Verify
WEAVIATE_URLis correct - Check if Weaviate instance is running
- Verify API key if authentication is required
Embedding Model Errors
- Ensure sufficient disk space for model download (~1.5GB)
- Check internet connection for model download
- Verify
EMBEDDING_DEVICEmatches available hardware
Storage Errors
- Verify Weaviate schema is created
- Check Weaviate instance has sufficient storage
- Review Weaviate logs for detailed errors
Related Documentation
- PDF Parser Service - PDF parsing service
- Data Preparation Service - Data preparation scripts
Event Sourcing Guide for Weaviate Service
Event Sourcing is implemented to provide: - Transparency - complete history of all changes - Audit - tracking of all operations - Recovery - time-travel...
Shipping and logistics API integrating with UPS and Clinton Tractor for rate quotes, tracking, and shipment management
Shipping and logistics API integrating with UPS and Clinton Tractor for rate quotes, tracking, and shipment management.