MLflow Tracking Server for CROP AI Pipeline monitoring and experiment management.

MLflow Service

MLflow Tracking Server for CROP AI Pipeline monitoring and experiment management.

Overview

MLflow provides centralized tracking for machine learning experiments and data pipelines in the CROP ecosystem. It tracks:

Pipeline runs (PDF processing, embedding generation, RAG preparation)
Model experiments (embeddings, CLIP, LLaMA fine-tuning)
Artifacts (processed data, model checkpoints)
Metrics and parameters for reproducibility

Production Deployment

Cloud Run Service

Property	Value
URL	https://mlflow-service-atife5uvka-ue.a.run.app
Region	us-east1
Artifact Storage	`gs://mlflow-artifacts-noted-bliss-466410-q6/mlflow-artifacts/`
Backend Store	Cloud SQL / SQLite

Accessing the UI

Open in browser: https://mlflow-service-atife5uvka-ue.a.run.app

The MLflow UI provides:

Experiments: View all experiments with searchable runs
Runs: Compare runs, view metrics, parameters, and artifacts
Models: Model registry for versioning and deployment
Artifacts: Browse stored artifacts (models, data, etc.)

Local Development

Using Docker Compose

MLflow is included in the main docker-compose.yml:

cd CROP-pdf-parser-service
docker-compose up mlflow

Local MLflow will be available at: http://localhost:5000

Configuration

# docker-compose.yml
mlflow:
  image: ghcr.io/mlflow/mlflow:v2.8.1
  ports:
    - "5000:5000"
  environment:
    BACKEND_STORE_URI: sqlite:///mlflow.db
    DEFAULT_ARTIFACT_ROOT: /mlflow/artifacts

Environment Variables

Variable	Description	Default
`MLFLOW_TRACKING_URI`	MLflow server URL	http://localhost:5000
`MLFLOW_BACKEND_STORE_URI`	Database URI for metadata	sqlite:///mlflow.db
`MLFLOW_ARTIFACT_ROOT`	Root path for artifacts	/mlflow/artifacts

API Endpoints

MLflow exposes a REST API for programmatic access:

List Experiments

curl "https://mlflow-service-atife5uvka-ue.a.run.app/api/2.0/mlflow/experiments/search?max_results=100"

Search Runs

curl -X POST "https://mlflow-service-atife5uvka-ue.a.run.app/api/2.0/mlflow/runs/search" \
  -H "Content-Type: application/json" \
  -d '{"experiment_ids": ["0"], "max_results": 10}'

Get Run Details

curl "https://mlflow-service-atife5uvka-ue.a.run.app/api/2.0/mlflow/runs/get?run_id=<RUN_ID>"

Integration with Data Preparation Service

The Data Preparation Service uses MLflow for pipeline tracking:

import mlflow

# Set tracking URI
mlflow.set_tracking_uri("https://mlflow-service-atife5uvka-ue.a.run.app")

# Start a run
with mlflow.start_run(run_name="pdf-processing"):
    mlflow.log_param("pdf_path", "pdfs/manual.pdf")
    mlflow.log_param("document_number", "SPD00805")

    # Process PDF...

    mlflow.log_metric("total_pages", 100)
    mlflow.log_metric("documents_processed", 3)
    mlflow.log_metric("rag_documents_created", 150)

Tracked Metrics

Metric	Description
`pdf_size_bytes`	Size of downloaded PDF
`total_pages`	Total pages in PDF
`document_count`	Number of documents found
`documents_processed`	Documents successfully processed
`pages_processed`	Pages successfully processed
`rag_documents_created`	RAG documents created and stored
`images_stored`	Images with CLIP embeddings stored

Deployment to GCP Cloud Run

Prerequisites

GCP project with billing enabled
gcloud CLI installed and authenticated
Cloud SQL instance or persistent storage for backend

Deploy MLflow Service

# Build and push Docker image
gcloud builds submit --tag gcr.io/$PROJECT_ID/mlflow-service

# Deploy to Cloud Run
gcloud run deploy mlflow-service \
  --image gcr.io/$PROJECT_ID/mlflow-service \
  --region us-east1 \
  --platform managed \
  --allow-unauthenticated \
  --memory 1Gi \
  --cpu 1 \
  --set-env-vars "BACKEND_STORE_URI=sqlite:///mlflow.db,DEFAULT_ARTIFACT_ROOT=gs://mlflow-artifacts-$PROJECT_ID/mlflow-artifacts/"

Using GCS for Artifacts

For production, store artifacts in Google Cloud Storage:

# Create GCS bucket
gsutil mb gs://mlflow-artifacts-$PROJECT_ID

# Set artifact root
DEFAULT_ARTIFACT_ROOT=gs://mlflow-artifacts-$PROJECT_ID/mlflow-artifacts/

Use Cases

1. PDF Processing Pipeline Tracking

Track each PDF processing run with:

Input parameters (PDF path, document number)
Processing metrics (pages, documents, time)
Output artifacts (processed data)

2. Embedding Model Experiments

Track embedding model experiments:

Model parameters (model name, dimensions)
Performance metrics (latency, throughput)
Model artifacts (checkpoints, configs)

3. LLaMA Fine-tuning

Track fine-tuning experiments:

Training parameters (learning rate, epochs)
Evaluation metrics (loss, accuracy)
Model checkpoints

Monitoring

Viewing Pipeline Runs

Open MLflow UI: https://mlflow-service-atife5uvka-ue.a.run.app
Select experiment (e.g., "Default")
Browse runs with filters and search
Compare runs side-by-side

Pipeline Health

Check recent runs via API:

curl -X POST "https://mlflow-service-atife5uvka-ue.a.run.app/api/2.0/mlflow/runs/search" \
  -H "Content-Type: application/json" \
  -d '{
    "experiment_ids": ["0"],
    "max_results": 10,
    "order_by": ["start_time DESC"]
  }'

Troubleshooting

Common Issues

Connection refused: Ensure MLflow server is running and accessible
Artifact upload failed: Check GCS permissions and bucket access
Run not found: Verify experiment ID and run ID

Logs

Cloud Run logs:

gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=mlflow-service" --limit=50

Data Preparation Service - Uses MLflow for pipeline tracking
AI Service - Model experiment tracking
MLflow Official Docs - MLflow documentation
MLflow REST API - API reference

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    MLflow Tracking Server                    │
│              (Cloud Run: us-east1)                          │
├─────────────────────────────────────────────────────────────┤
│  UI: https://mlflow-service-atife5uvka-ue.a.run.app        │
│  API: /api/2.0/mlflow/*                                     │
└────────────────────────┬────────────────────────────────────┘
                         │
         ┌───────────────┼───────────────┐
         │               │               │
         ▼               ▼               ▼
┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│   Backend   │  │  Artifacts  │  │   Clients   │
│   Store     │  │   (GCS)     │  │             │
├─────────────┤  ├─────────────┤  ├─────────────┤
│ SQLite/     │  │ gs://mlflow │  │ Data Prep   │
│ Cloud SQL   │  │ -artifacts  │  │ AI Service  │
│             │  │             │  │ Notebooks   │
└─────────────┘  └─────────────┘  └─────────────┘

Status

Component	Status
Cloud Run Service	Active
Artifact Storage (GCS)	Active
Backend Store	Active
UI Access	Public

MLflow Service

On this page