LLM Deploy

Repository: CT-CROP/CROP-LLM-Deploy Last updated: 2026-02-02 Last synced to docs: 2026-03-10

Repository for deploying vLLM with LLaMA 8B (Meta-Llama-3.1-8B-Instruct) on GPU VM.

Quick Start

cp .env.example .env
# Create GPU VM (L4 recommended for LLaMA 8B)
bash scripts/deploy/create_gpu_vm.sh --gpu l4
# Deploy
bash scripts/deploy/setup_vllm.sh
# Verify
python src/main.py

GPU Recommendations

GPU	VRAM	Recommendation
L4 (24GB)	24GB	Best cost/performance ratio
A100 (40GB)	40GB	Best performance, production use
V100 (32GB)	32GB	Good alternative
T4 (16GB)	16GB	Not recommended (OOM risk)

Scripts

# Deploy
bash scripts/deploy/create_gpu_vm.sh --gpu l4    # Create VM
bash scripts/deploy/setup_vllm.sh                # Setup vLLM

# Monitor
bash scripts/monitor/connect_vm.sh               # SSH to VM
bash scripts/monitor/view_logs.sh -f              # Follow logs
bash scripts/monitor/check_disk.sh                # Disk usage

# Maintain
bash scripts/maintain/cleanup.sh                  # Cleanup Docker/logs
bash scripts/maintain/delete_vm.sh                # Delete VM
bash scripts/maintain/schedule_vm_shutdown.sh     # Auto-shutdown schedule

API Testing

curl http://localhost:8000/v1/models
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "meta-llama/Meta-Llama-3.1-8B-Instruct", "messages": [{"role": "user", "content": "Hello"}]}'

LLM Deploy docs — deployment overview

LLM Deploy

LLM Deploy

Quick Start

GPU Recommendations

Scripts

API Testing

Related Documentation

On this page