Docs GitHub CROP Documentation

Tools

Shared

Projects

Frontend

Backend

microservices including Search Service, Catalog API, Payment Service, and User Management

Getting Started

Media Coverage API - Frontend Documentation Index Media Coverage API - Frontend Integration Package Manufacturer Aliases System CROP Parts Services - Documentation

Deployment

Production Service URLs

API

K&M Tires API - Frontend Integration Guide

Services

Inter-Service Communication

Data

Media Coverage API - Frontend Integration Guide Photo Workflow - Implementation Plan

Architecture

CROP Catalog Architecture Refactoring Media Coverage Dashboard - Design Reference

Analysis

Critical Fixes Plan

Monitoring

Prometheus Metrics Documentation

Reference

Repository Guidelines CROP Search - Cleanup and Sync Plan Data Flows Documentation Dead Code Detection Setup Environment Configuration Admin Environment Switcher - Feature Plan Media Coverage API - Quick Reference Card Naming Conventions

Monitoring

Metrics Implementation Deployment Checklist Metrics Implementation Summary Metrics Integration Guide Metrics Quick Reference

Services

Architecture

CROP Microservices Architecture Data Flows SEO Data Flow Diagrams Mermaid Diagrams for Linear Dual-Index Architecture Fix Plan Search Index Analysis & Improvement Plan

Authentication

Backend Authentication Implementation Guide Clerk Frontend Integration Guide API Endpoints Authentication Audit

Deployment

CI Testing Architecture CI/CD Guide - CROP Microservices CI Rollback Guide Deployment Guide - CROP Microservices Deployment Modes for Search Service Deployment Runbook Production Pre-Deployment Checklist

Development

Git Worktree Workflow for CROP Microservices JSDoc Documentation Guide for Media Coverage API New Service Template - CROP Microservices Pre-Commit Hooks - Cyrillic Check Service Standards CROP Microservices - Comprehensive Testing Guide

ProjectsParts Services

Metrics Implementation Deployment Checklist

Pre-Deployment (Development)

Metrics Implementation Deployment Checklist

Pre-Deployment (Development)

Code changes compiled without errors
- Run: bun run type-check
- Check for no TypeScript errors in metrics files
Prometheus endpoint responds
- URL: http://localhost:3001/metrics
- Expected: Text format with metric definitions
New metrics appear in output
- Check for: autocomplete_result_count, parts_fitment_coverage_ratio
- All 20 new metrics should be present
No metric cardinality warnings
- Expected series count: ~38
- No unbounded label values
Integration tests pass (if applicable)
- Run: bun test

Development Deployment

Deploy to dev/staging environment
- Build: bun run build
- Deploy Docker image with metrics registry
Service starts without errors
- Check logs for metric initialization
- No startup warnings about metrics
Prometheus scraping configured
- Add scrape config for new service instance
- Target: http://dev-search-service:3001/metrics
- Interval: 30 seconds
Verify Prometheus has targets
- URL: http://prometheus:9090/targets
- Status: UP for search service
Test PromQL queries
- Query: rate(autocomplete_queries_total[5m])
- Should return data within 2 minutes

Monitoring Setup

Grafana datasource configured
- Type: Prometheus
- URL: Prometheus instance
- Test connection: successful
Import dashboard
- File: scripts/grafana-dashboard-autocomplete-metrics.json
- Location: Dashboards → Import
- Panels load without errors
Dashboard displays metrics
- All 10 panels show data
- Time range: Last 6 hours
Create custom panels if needed
- Example: Zero result rate trend
- Verify queries execute successfully

Alerting Setup

AlertManager configured
- Connection to Prometheus: working
- Notification channels configured
Alert rules added
- Copy from METRICS_INTEGRATION.md
- Rules file: /etc/prometheus/rules/search-alerts.yml
Alert rules validated
- Command: amtool check-config alertmanager.yml
- Reload: curl -X POST localhost:9093/-/reload
Test alerts
- Trigger: Force a metric to alert threshold
- Verify: Notification received
- Channels: Slack, PagerDuty, email

Fitment Coverage Job Setup

Script tested locally
- Command: bun scripts/calculate-fitment-coverage.ts --dry-run
- Output: Displays fitment statistics
Elasticsearch connection verified
- Script connects successfully
- Queries execute without errors
Dry-run shows accurate numbers
- Compare manual ES queries with script output
- Verify calculations are correct
Schedule cron job (Linux/macOS)
- Add to crontab: 0 * * * * cd /path && bun scripts/calculate-fitment-coverage.ts
- Verify: Job runs hourly
Or create Cloud Run Job
- Image: Search service with script
- Schedule: Cloud Scheduler (hourly)
- Test execution: successful

Performance Validation

Metric recording overhead
- Baseline latency: measure before metrics
- After metrics: should not increase by > 5%
- Expected overhead: < 0.1ms per request
Monitor resource usage
- Prometheus memory: should remain stable
- CPU: no significant increase
- Disk: ensure retention policy set
Cardinality monitoring
- Command: Check Prometheus series count
- Expected: ~38 series
- Alert if exceeds 1000

Quality Assurance

Metric labels are consistent
- Review all metrics
- No typos in label names
- Fixed enum values only
PromQL queries in docs work
- Test each example query
- Verify expected results
- Update docs if needed
Dashboard is user-friendly
- Panel names are clear
- Tooltips are helpful
- Colors are intuitive
Documentation is complete
- All files reviewed
- No broken links
- Examples are accurate

Production Deployment

Code reviewed and approved
- All changes reviewed
- No breaking changes
- Documentation reviewed
Production Prometheus configured
- Scrape targets updated
- Retention policy set (e.g., 30 days)
- Storage configured
Production alerts configured
- All rules deployed
- Notification channels active
- Test alert confirmed
Production dashboard imported
- Grafana updated
- Dashboard accessible
- Panels displaying data
Fitment coverage job scheduled
- Cron job active
- Or Cloud Run Job deployed
- Logs checked for successful execution
Monitoring alerts active
- Team notified of deployment
- Escalation paths configured
- On-call rotation updated

Post-Deployment Monitoring (24 hours)

Verify all metrics collecting data
- Spot-check 5 random metrics
- All have recent timestamps
Review metric distributions
- Zero result rate: reasonable?
- Latency: within expected range?
- Fitment coverage: expected percentage?
Check alert health
- No false positives?
- Alert messages clear?
- Routing working correctly?
Monitor resource consumption
- Prometheus memory: stable?
- CPU utilization: normal?
- Disk growth: reasonable?
Review logs
- Any errors in metric recording?
- Script execution logs clean?
- No cardinality warnings?

One Week Review

Analyze metric trends
- Create summary report
- Identify patterns
- Note any anomalies
Evaluate alert effectiveness
- Which alerts have fired?
- Were they actionable?
- Any false positives?
Gather team feedback
- Dashboard usability
- Metric relevance
- Documentation clarity
Plan optimizations
- Which metrics are most valuable?
- Any metrics to remove?
- New metrics needed?

Monthly Maintenance

Review alert thresholds
- Based on actual data
- Adjust if needed
- Document changes
Update documentation
- Add new discoveries
- Clarify confusing sections
- Update examples
Analyze cost
- Prometheus storage usage
- Consider retention policy
- Optimize if needed
Plan improvements
- Query optimization
- Dashboard enhancements
- New metric proposals

Rollback Plan

If issues occur:

Metric collection errors
- Disable metric recording in code
- Restart service
- Metrics endpoint still available
High cardinality issues
- Reduce sampling from 20% to 10%
- Remove problematic metric
- Restart Prometheus
Alerting false positives
- Disable specific alert rule
- Adjust threshold
- Re-enable after fix
Dashboard issues
- Delete dashboard
- Recreate or restore backup
- Update JSON if needed
Complete rollback
- Revert code changes to services/search
- Remove metrics from Prometheus scrape config
- Delete Grafana dashboard
- Disable alert rules

Success Criteria

All metrics defined and exported
Metrics integrated into autocomplete handler
Documentation complete and clear
Grafana dashboard ready
Fitment calculator script working
Code compiles without errors
Low cardinality design validated
Prometheus scraping confirmed (dev)
Alerts functioning (dev)
Dashboard displaying data (dev)
Fitment job running (dev)
Performance validated (dev)
Team trained on metrics
Deployed to production
Monitoring validated (24 hours)
Weekly review completed

Sign-Off

Developer: Code changes complete
QA: Testing verified
Operations: Monitoring configured
Team Lead: Approved for production

Contact Information

Questions about metrics:

See METRICS.md for complete reference
See METRICS_INTEGRATION.md for setup
See METRICS_QUICK_REFERENCE.md for common queries

On-call support:

Check Grafana dashboard for current status
Review AlertManager for active alerts
Query Prometheus for detailed data

Last Updated: 2025-11-12 Status: Ready for Deployment

Naming Conventions

This document describes the naming conventions used across the CROP-parts-services codebase.

Metrics Implementation Summary

Comprehensive Prometheus metrics have been added to the Search service for tracking: - Autocomplete quality and performance - Equipment fitment coverage and...

On this page

Metrics Implementation Deployment Checklist Pre-Deployment (Development)Development Deployment Monitoring Setup Alerting Setup Fitment Coverage Job Setup Performance Validation Quality Assurance Production Deployment Post-Deployment Monitoring (24 hours)One Week Review Monthly Maintenance Rollback Plan Success Criteria Sign-Off Contact Information