LLM Deploy
vLLM deployment infrastructure for LLaMA 3.1 8B on GPU VMs.
LLM Deploy
Repository: CT-CROP/CROP-LLM-Deploy Last updated: 2026-02-02 Last synced to docs: 2026-03-10
Repository for deploying vLLM with LLaMA 8B (Meta-Llama-3.1-8B-Instruct) on GPU VM.
Quick Start
cp .env.example .env
# Create GPU VM (L4 recommended for LLaMA 8B)
bash scripts/deploy/create_gpu_vm.sh --gpu l4
# Deploy
bash scripts/deploy/setup_vllm.sh
# Verify
python src/main.pyGPU Recommendations
| GPU | VRAM | Recommendation |
|---|---|---|
| L4 (24GB) | 24GB | Best cost/performance ratio |
| A100 (40GB) | 40GB | Best performance, production use |
| V100 (32GB) | 32GB | Good alternative |
| T4 (16GB) | 16GB | Not recommended (OOM risk) |
Scripts
# Deploy
bash scripts/deploy/create_gpu_vm.sh --gpu l4 # Create VM
bash scripts/deploy/setup_vllm.sh # Setup vLLM
# Monitor
bash scripts/monitor/connect_vm.sh # SSH to VM
bash scripts/monitor/view_logs.sh -f # Follow logs
bash scripts/monitor/check_disk.sh # Disk usage
# Maintain
bash scripts/maintain/cleanup.sh # Cleanup Docker/logs
bash scripts/maintain/delete_vm.sh # Delete VM
bash scripts/maintain/schedule_vm_shutdown.sh # Auto-shutdown scheduleAPI Testing
curl http://localhost:8000/v1/models
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "meta-llama/Meta-Llama-3.1-8B-Instruct", "messages": [{"role": "user", "content": "Hello"}]}'Related Documentation
- LLM Deploy docs — deployment overview