# AI Streaming Responses - Implementation Guide ## Overview This document describes the implementation of streaming AI responses using Server-Sent Events (SSE) for real-time token-by-token display. ## Architecture ### Backend Components **1. StreamingService** (`src/modules/ai/streaming/streaming.service.ts`) - Handles Azure OpenAI streaming API - Processes SSE stream from Azure - Emits tokens via callback function - Chunk types: `token`, `metadata`, `done`, `error` **2. AI Controller** (`src/modules/ai/ai.controller.ts`) - Endpoint: `POST /api/v1/ai/chat/stream` - Headers: `Content-Type: text/event-stream` - Streams response chunks to client - Error handling with SSE events **3. AI Service Integration** (TODO) - Add `chatStream()` method to AIService - Reuse existing safety checks and context building - Call StreamingService for actual streaming - Save conversation after completion ### Frontend Components (TODO) **1. Streaming Hook** (`hooks/useStreamingChat.ts`) ```typescript const { streamMessage, isStreaming } = useStreamingChat(); streamMessage( { message: "Hello", conversationId: "123" }, (chunk) => { // Handle incoming chunks if (chunk.type === 'token') { appendToMessage(chunk.content); } } ); ``` **2. AIChatInterface Updates** - Add streaming state management - Display tokens as they arrive - Show typing indicator during streaming - Handle stream errors gracefully ## Implementation Steps ### Step 1: Complete Backend Integration (30 min) 1. Add StreamingService to AI module: ```typescript // ai.module.ts import { StreamingService } from './streaming/streaming.service'; @Module({ providers: [AIService, StreamingService, ...] }) ``` 2. Implement `chatStream()` in AIService: ```typescript async chatStream( userId: string, chatDto: ChatMessageDto, callback: StreamCallback ): Promise { // 1. Run all safety checks (rate limit, input sanitization, etc.) // 2. Build context messages // 3. Call streamingService.streamCompletion() // 4. Collect full response and save conversation } ``` ### Step 2: Frontend Streaming Client (1 hour) 1. Create EventSource hook: ```typescript // hooks/useStreamingChat.ts export function useStreamingChat() { const streamMessage = async ( chatDto: ChatMessageDto, onChunk: (chunk: StreamChunk) => void ) => { const response = await fetch('/api/v1/ai/chat/stream', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify(chatDto), }); const reader = response.body.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value); const lines = chunk.split('\n'); for (const line of lines) { if (line.startsWith('data: ')) { const data = JSON.parse(line.substring(6)); onChunk(data); } } } }; return { streamMessage }; } ``` 2. Update AIChatInterface: ```typescript const [streamingMessage, setStreamingMessage] = useState(''); const { streamMessage, isStreaming } = useStreamingChat(); const handleSubmit = async () => { setStreamingMessage(''); setIsLoading(true); await streamMessage( { message: input, conversationId }, (chunk) => { if (chunk.type === 'token') { setStreamingMessage(prev => prev + chunk.content); } else if (chunk.type === 'done') { // Add to messages array setMessages(prev => [...prev, { role: 'assistant', content: streamingMessage }]); setStreamingMessage(''); setIsLoading(false); } } ); }; ``` ### Step 3: UI Enhancements (30 min) 1. Add streaming indicator: ```tsx {isStreaming && ( AI is thinking... )} ``` 2. Show partial message: ```tsx {streamingMessage && ( {streamingMessage} | )} ``` 3. Add CSS animation: ```css @keyframes blink { 0%, 100% { opacity: 1; } 50% { opacity: 0; } } ``` ## Testing ### Backend Test ```bash curl -X POST http://localhost:3020/api/v1/ai/chat/stream \ -H "Content-Type: application/json" \ -d '{"message": "Tell me about baby sleep patterns"}' \ --no-buffer ``` Expected output: ``` data: {"type":"token","content":"Baby"} data: {"type":"token","content":" sleep"} data: {"type":"token","content":" patterns"} ... data: {"type":"done"} ``` ### Frontend Test 1. Open browser DevTools Network tab 2. Send a message in AI chat 3. Verify SSE connection established 4. Confirm tokens appear in real-time ## Performance Considerations ### Token Buffering To reduce UI updates, buffer tokens: ```typescript let buffer = ''; let bufferTimeout: NodeJS.Timeout; onChunk((chunk) => { if (chunk.type === 'token') { buffer += chunk.content; clearTimeout(bufferTimeout); bufferTimeout = setTimeout(() => { setStreamingMessage(prev => prev + buffer); buffer = ''; }, 50); // Update every 50ms } }); ``` ### Memory Management - Clear streaming state on unmount - Cancel ongoing streams when switching conversations - Limit message history to prevent memory leaks ### Error Recovery - Retry connection on failure (max 3 attempts) - Fall back to non-streaming on error - Show user-friendly error messages ## Security ### Rate Limiting - Streaming requests count against rate limits - Close stream if rate limit exceeded mid-response ### Input Validation - Same validation as non-streaming endpoint - Safety checks before starting stream ### Connection Management - Set timeout for inactive connections (60s) - Clean up resources on client disconnect ## Future Enhancements 1. **Token Usage Tracking**: Track streaming token usage separately 2. **Pause/Resume**: Allow users to pause streaming 3. **Multi-Model Streaming**: Switch models mid-conversation 4. **Streaming Analytics**: Track average tokens/second 5. **WebSocket Alternative**: Consider WebSocket for bidirectional streaming ## Troubleshooting ### Stream Cuts Off Early - Check timeout settings (increase to 120s for long responses) - Verify nginx/proxy timeout configuration - Check network connectivity ### Tokens Arrive Out of Order - Ensure single-threaded processing - Use buffer to accumulate before rendering - Verify SSE event ordering ### High Latency - Check Azure OpenAI endpoint latency - Optimize context size to reduce time-to-first-token - Consider caching common responses ## References - [Azure OpenAI Streaming Docs](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/streaming) - [Server-Sent Events Spec](https://html.spec.whatwg.org/multipage/server-sent-events.html) - [LangChain Streaming](https://js.langchain.com/docs/expression_language/streaming)