AI Integration Audit & Refactoring Plan

Executive Summary

Audit of Vercel AI Gateway integration in CROP-front (customer-facing site).

Current State Analysis

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                        CROP-front                                │
├─────────────────────────────────────────────────────────────────┤
│  app/page.tsx                                                    │
│  └── useAiChat (hooks/use-ai-chat.ts)                           │
│       └── sendMessageStream (services/ai-chat/client.ts)        │
│            └── /api/ai/query (Next.js proxy)                    │
│                 └── Cloud Run vLLM (LLaMA 3.1 8B)               │
├─────────────────────────────────────────────────────────────────┤
│  NEW: app/api/chat/route.ts (Vercel AI SDK)                     │
│       └── lib/ai/provider.ts                                     │
│            └── @ai-sdk/gateway → Claude/GPT-4                   │
└─────────────────────────────────────────────────────────────────┘

Files Inventory

File	Purpose	Status
`lib/ai/provider.ts`	AI provider configuration	✅ Created
`app/api/chat/route.ts`	Vercel AI streaming endpoint	✅ Created
`hooks/use-ai-chat.ts`	Custom chat hook (legacy)	✅ Compatible
`hooks/use-vercel-chat.ts`	Vercel AI SDK hook	✅ Created
`hooks/use-chat-provider.ts`	Unified hook with feature flag	✅ Created
`services/ai-chat/client.ts`	vLLM client	✅ Legacy (fallback)
`app/page.tsx`	Main chat UI	✅ Uses useChatProvider

Critical Issues

1. Dual Implementation Without Integration

Problem: New Vercel AI SDK files created but not connected to frontend.

// app/page.tsx - still uses legacy hook
const { messages, isLoading, error, sendMessage } = useAiChat({
  streaming: true,
  skipAuth: true,
});

Impact: Users still hitting vLLM backend, not Vercel AI Gateway.

2. Fake Streaming in Legacy Client

Problem: sendMessageStream simulates streaming by chunking complete response.

// services/ai-chat/client.ts:190-206
export async function* sendMessageStream(...) {
  const response = await sendMessage(...); // Wait for FULL response
  const content = response.content;
  for (let i = 0; i < content.length; i += chunkSize) {
    yield { type: "content", content: content.slice(i, i + chunkSize) };
    await new Promise((resolve) => setTimeout(resolve, 10)); // Fake delay!
  }
}

Impact: Poor UX - user waits for full response before seeing anything.

3. Gateway Fallback Not Working

Problem: getGatewayProviderOptions() returns incompatible type.

// lib/ai/provider.ts:84
export function getGatewayProviderOptions(): Record<string, unknown> | undefined
// But streamText expects: SharedV3ProviderOptions | undefined

Impact: No automatic fallback when Anthropic fails.

4. Inconsistent Message Types

Problem: Legacy ChatMessage type differs from Vercel AI SDK UIMessage.

// services/ai-chat/types.ts
interface ChatMessage {
  id: string;
  role: "user" | "assistant" | "system";
  content: string;
  createdAt: string;
  metadata?: ChatMessageMetadata;
}

// @ai-sdk/react UIMessage
interface UIMessage {
  id: string;
  role: "user" | "assistant" | "system";
  parts: MessagePart[];
  // Different structure!
}

Recommendations

Option A: Full Migration to Vercel AI SDK (Recommended)

Pros:

Real streaming from Claude/GPT-4
Unified codebase
Better UX
Automatic fallback support

Cons:

Cost increase (cloud AI vs self-hosted vLLM)
Migration effort

Implementation:

Create hooks/use-vercel-chat.ts wrapper around useChat
Update app/page.tsx to use new hook when AI_PROVIDER !== "vllm"
Keep legacy path for cost optimization scenarios

Option B: Feature Flag Approach (Conservative)

Pros:

Gradual rollout
Easy rollback
A/B testing possible

Cons:

Dual maintenance
Code complexity

Implementation:

Add NEXT_PUBLIC_USE_VERCEL_AI env variable
Create unified hook that switches implementation
Both paths coexist

Option C: Hybrid Architecture (Cost-Optimized)

Pros:

Best of both worlds
Cost control

Cons:

Most complex

Implementation:

Simple queries → vLLM (LLaMA, cheap)
Complex queries → Vercel AI Gateway (Claude, quality)
Automatic routing based on query complexity

Refactoring Plan

Phase 1: Fix Type Issues (Day 1)

// lib/ai/provider.ts - Remove unused function
// getGatewayProviderOptions() is incompatible, Gateway handles fallback internally

Phase 2: Create Vercel Chat Hook (Day 1)

// hooks/use-vercel-chat.ts
"use client";

import { useChat } from "@ai-sdk/react";

export function useVercelChat() {
  const { messages, input, handleInputChange, handleSubmit, isLoading, error, stop, reload } =
    useChat({
      api: "/api/chat",
      onError: (err) => console.error("Chat error:", err),
    });

  return {
    messages,
    input,
    setInput: (value: string) => handleInputChange({ target: { value } } as any),
    sendMessage: handleSubmit,
    isLoading,
    error,
    stop,
    regenerate: reload,
  };
}

Phase 3: Update Frontend (Day 2)

// app/page.tsx
import { useVercelChat } from "@/hooks/use-vercel-chat";
import { useAiChat } from "@/hooks/use-ai-chat";

const useChat = process.env.NEXT_PUBLIC_USE_VERCEL_AI === "true"
  ? useVercelChat
  : useAiChat;

Phase 4: Add Stop/Regenerate UI (Day 2)

Update message display to support:

Stop streaming button
Regenerate last response button
Better loading states

Phase 5: Testing (Day 3)

Unit tests for hooks
E2E tests with Playwright
A/B testing in staging

Environment Variables

Required for Vercel AI

# Provider selection
AI_PROVIDER="gateway"  # or "openai", "anthropic", "vllm"

# Feature flag for frontend
NEXT_PUBLIC_USE_VERCEL_AI="true"

# API keys (for non-gateway providers)
OPENAI_API_KEY="sk-..."
ANTHROPIC_API_KEY="sk-ant-..."

Vercel Dashboard Setup

Enable AI Gateway in Vercel project settings
Connect OpenAI and Anthropic providers
Set up spend limits and alerts

File Structure After Refactoring

lib/
├── ai/
│   ├── provider.ts          # Server-side provider config
│   └── index.ts             # Re-exports
hooks/
├── use-ai-chat.ts           # Legacy vLLM hook
├── use-vercel-chat.ts       # NEW: Vercel AI SDK hook
└── use-chat.ts              # NEW: Unified hook with feature flag
app/
├── api/
│   ├── ai/query/route.ts    # Legacy vLLM proxy
│   └── chat/route.ts        # Vercel AI streaming
└── page.tsx                  # Uses unified hook

Success Metrics

Latency: Time to first token < 500ms
Streaming: Real progressive display (not fake chunking)
Fallback: Automatic provider switch on errors
Cost: Track per-provider spend in Vercel dashboard

Rollback Plan

If issues arise:

Set NEXT_PUBLIC_USE_VERCEL_AI="false"
Users automatically switch to vLLM backend
No code changes needed

Next Steps

Review and approve this plan
Implement Phase 1-2 (hooks)
Implement Phase 3 (update page.tsx to use useChatProvider)
Implement AI Tools for Elasticsearch integration
Test in development with NEXT_PUBLIC_USE_VERCEL_AI=true
Deploy to staging with feature flag OFF
Gradual rollout (10% → 50% → 100%)

Phase 4: AI Tools Integration (Completed)

New Files Created

File	Purpose
`lib/ai/tools.ts`	AI tools for Elasticsearch catalog queries

AI Tools Implemented

// lib/ai/tools.ts exports:
export const cropAiTools = {
  searchParts,      // Search parts catalog with filters
  getPartDetails,   // Get detailed part information
  autocomplete,     // Get search suggestions
  checkAvailability // Check stock status for part numbers
};

Tool Definitions

Tool	Description	Use Case
`searchParts`	Search parts catalog by query, manufacturer, category, stock	"Find oil filters for New Holland"
`getPartDetails`	Get complete details for a specific part	"Tell me more about part ABC123"
`autocomplete`	Get search suggestions for partial queries	Help refine search terms
`checkAvailability`	Check stock status for multiple parts	"Do you have these parts in stock?"

Route Integration

// app/api/chat/route.ts
import { stepCountIs, streamText } from "ai";
import { cropAiTools } from "@/lib/ai/tools";

const result = streamText({
  model,
  system: getSystemPrompt(),
  messages: await convertToModelMessages(messages),
  tools: cropAiTools,
  stopWhen: stepCountIs(5), // Allow up to 5 sequential tool calls
});

System Prompt Updates

The system prompt now includes:

Available tools description
Usage guidelines for each tool
Examples of when to use each tool

Technical Decisions

Multi-step Tool Calls: Using stopWhen: stepCountIs(5) allows AI to:
- Search for parts
- Get details for specific results
- Chain multiple tool calls in one conversation turn
Type Safety:
- Used buildSearchParams() helper to handle exactOptionalPropertyTypes
- Used isAutocompleteSectionsResponse() type guard for union types
Error Handling:
- Each tool has try/catch with user-friendly error messages
- Logging for debugging server-side issues

AI Integration Audit & Refactoring Plan

On this page