From 075c4b88c67936e0ee98551de7e9120ccfcae4c6 Mon Sep 17 00:00:00 2001
From: Andrei <andrei@cloudz.ro>
Date: Fri, 3 Oct 2025 22:24:53 +0000
Subject: [PATCH] feat: Add AI streaming responses foundation (partial
 implementation)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

**Streaming Infrastructure**
Created foundation for real-time AI response streaming using Server-Sent Events (SSE):

**1. StreamingService** (163 lines)
- Azure OpenAI streaming API integration
- SSE stream processing with buffer management
- Chunk types: token, metadata, done, error
- Response stream handling with error recovery
- Timeout and connection management

**2. AI Controller Streaming Endpoint**
- POST /api/v1/ai/chat/stream
- SSE headers configuration (Content-Type, Cache-Control, Connection)
- nginx buffering disabled (X-Accel-Buffering)
- Chunk-by-chunk streaming to client
- Error handling with SSE events

**3. Implementation Documentation**
Created comprehensive STREAMING_IMPLEMENTATION.md (200+ lines):
- Architecture overview
- Backend/frontend integration steps
- Code examples for hooks and components
- Testing procedures (curl + browser)
- Performance considerations (token buffering, memory management)
- Security (rate limiting, input validation)
- Troubleshooting guide
- Future enhancements

**Technical Details**
- Server-Sent Events (SSE) protocol
- Axios stream processing
- Buffer management for incomplete lines
- Delta content extraction from Azure responses
- Finish reason and usage metadata tracking

**Remaining Work (Frontend)**
- useStreamingChat hook implementation
- AIChatInterface streaming state management
- Token buffering for UI updates (50ms intervals)
- Streaming indicator and cursor animation
- Error recovery with fallback to non-streaming

**Impact**
- Perceived performance: Users see responses immediately
- Better UX: Token-by-token display feels more responsive
- Ready for production: SSE is well-supported across browsers
- Scalable: Can handle multiple concurrent streams

Files:
- src/modules/ai/ai.controller.ts: Added streaming endpoint
- src/modules/ai/streaming/streaming.service.ts: Core streaming logic
- docs/STREAMING_IMPLEMENTATION.md: Complete implementation guide

Next Steps:
1. Integrate StreamingService into AI module
2. Implement AIService.chatStream() method
3. Create frontend useStreamingChat hook
4. Update AIChatInterface with streaming UI

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
---
 docs/STREAMING_IMPLEMENTATION.md              | 280 ++++++++++++++++++
 .../src/modules/ai/ai.controller.ts           |  44 +++
 .../modules/ai/streaming/streaming.service.ts | 166 +++++++++++
 3 files changed, 490 insertions(+)
 create mode 100644 docs/STREAMING_IMPLEMENTATION.md
 create mode 100644 maternal-app/maternal-app-backend/src/modules/ai/streaming/streaming.service.ts

diff --git a/docs/STREAMING_IMPLEMENTATION.md b/docs/STREAMING_IMPLEMENTATION.md
new file mode 100644
index 0000000..12316b5
--- /dev/null
+++ b/docs/STREAMING_IMPLEMENTATION.md
@@ -0,0 +1,280 @@
+# AI Streaming Responses - Implementation Guide
+
+## Overview
+
+This document describes the implementation of streaming AI responses using Server-Sent Events (SSE) for real-time token-by-token display.
+
+## Architecture
+
+### Backend Components
+
+**1. StreamingService** (`src/modules/ai/streaming/streaming.service.ts`)
+- Handles Azure OpenAI streaming API
+- Processes SSE stream from Azure
+- Emits tokens via callback function
+- Chunk types: `token`, `metadata`, `done`, `error`
+
+**2. AI Controller** (`src/modules/ai/ai.controller.ts`)
+- Endpoint: `POST /api/v1/ai/chat/stream`
+- Headers: `Content-Type: text/event-stream`
+- Streams response chunks to client
+- Error handling with SSE events
+
+**3. AI Service Integration** (TODO)
+- Add `chatStream()` method to AIService
+- Reuse existing safety checks and context building
+- Call StreamingService for actual streaming
+- Save conversation after completion
+
+### Frontend Components (TODO)
+
+**1. Streaming Hook** (`hooks/useStreamingChat.ts`)
+```typescript
+const { streamMessage, isStreaming } = useStreamingChat();
+
+streamMessage(
+  { message: "Hello", conversationId: "123" },
+  (chunk) => {
+    // Handle incoming chunks
+    if (chunk.type === 'token') {
+      appendToMessage(chunk.content);
+    }
+  }
+);
+```
+
+**2. AIChatInterface Updates**
+- Add streaming state management
+- Display tokens as they arrive
+- Show typing indicator during streaming
+- Handle stream errors gracefully
+
+## Implementation Steps
+
+### Step 1: Complete Backend Integration (30 min)
+
+1. Add StreamingService to AI module:
+```typescript
+// ai.module.ts
+import { StreamingService } from './streaming/streaming.service';
+
+@Module({
+  providers: [AIService, StreamingService, ...]
+})
+```
+
+2. Implement `chatStream()` in AIService:
+```typescript
+async chatStream(
+  userId: string,
+  chatDto: ChatMessageDto,
+  callback: StreamCallback
+): Promise<void> {
+  // 1. Run all safety checks (rate limit, input sanitization, etc.)
+  // 2. Build context messages
+  // 3. Call streamingService.streamCompletion()
+  // 4. Collect full response and save conversation
+}
+```
+
+### Step 2: Frontend Streaming Client (1 hour)
+
+1. Create EventSource hook:
+```typescript
+// hooks/useStreamingChat.ts
+export function useStreamingChat() {
+  const streamMessage = async (
+    chatDto: ChatMessageDto,
+    onChunk: (chunk: StreamChunk) => void
+  ) => {
+    const response = await fetch('/api/v1/ai/chat/stream', {
+      method: 'POST',
+      headers: { 'Content-Type': 'application/json' },
+      body: JSON.stringify(chatDto),
+    });
+
+    const reader = response.body.getReader();
+    const decoder = new TextDecoder();
+
+    while (true) {
+      const { done, value } = await reader.read();
+      if (done) break;
+
+      const chunk = decoder.decode(value);
+      const lines = chunk.split('\n');
+
+      for (const line of lines) {
+        if (line.startsWith('data: ')) {
+          const data = JSON.parse(line.substring(6));
+          onChunk(data);
+        }
+      }
+    }
+  };
+
+  return { streamMessage };
+}
+```
+
+2. Update AIChatInterface:
+```typescript
+const [streamingMessage, setStreamingMessage] = useState('');
+const { streamMessage, isStreaming } = useStreamingChat();
+
+const handleSubmit = async () => {
+  setStreamingMessage('');
+  setIsLoading(true);
+
+  await streamMessage(
+    { message: input, conversationId },
+    (chunk) => {
+      if (chunk.type === 'token') {
+        setStreamingMessage(prev => prev + chunk.content);
+      } else if (chunk.type === 'done') {
+        // Add to messages array
+        setMessages(prev => [...prev, {
+          role: 'assistant',
+          content: streamingMessage
+        }]);
+        setStreamingMessage('');
+        setIsLoading(false);
+      }
+    }
+  );
+};
+```
+
+### Step 3: UI Enhancements (30 min)
+
+1. Add streaming indicator:
+```tsx
+{isStreaming && (
+  <Box sx={{ display: 'flex', alignItems: 'center', gap: 1 }}>
+    <CircularProgress size={16} />
+    <Typography variant="caption">AI is thinking...</Typography>
+  </Box>
+)}
+```
+
+2. Show partial message:
+```tsx
+{streamingMessage && (
+  <Paper sx={{ p: 2, bgcolor: 'grey.100' }}>
+    <ReactMarkdown>{streamingMessage}</ReactMarkdown>
+    <Box component="span" sx={{ animation: 'blink 1s infinite' }}>|</Box>
+  </Paper>
+)}
+```
+
+3. Add CSS animation:
+```css
+@keyframes blink {
+  0%, 100% { opacity: 1; }
+  50% { opacity: 0; }
+}
+```
+
+## Testing
+
+### Backend Test
+```bash
+curl -X POST http://localhost:3020/api/v1/ai/chat/stream \
+  -H "Content-Type: application/json" \
+  -d '{"message": "Tell me about baby sleep patterns"}' \
+  --no-buffer
+```
+
+Expected output:
+```
+data: {"type":"token","content":"Baby"}
+
+data: {"type":"token","content":" sleep"}
+
+data: {"type":"token","content":" patterns"}
+
+...
+
+data: {"type":"done"}
+```
+
+### Frontend Test
+1. Open browser DevTools Network tab
+2. Send a message in AI chat
+3. Verify SSE connection established
+4. Confirm tokens appear in real-time
+
+## Performance Considerations
+
+### Token Buffering
+To reduce UI updates, buffer tokens:
+```typescript
+let buffer = '';
+let bufferTimeout: NodeJS.Timeout;
+
+onChunk((chunk) => {
+  if (chunk.type === 'token') {
+    buffer += chunk.content;
+
+    clearTimeout(bufferTimeout);
+    bufferTimeout = setTimeout(() => {
+      setStreamingMessage(prev => prev + buffer);
+      buffer = '';
+    }, 50); // Update every 50ms
+  }
+});
+```
+
+### Memory Management
+- Clear streaming state on unmount
+- Cancel ongoing streams when switching conversations
+- Limit message history to prevent memory leaks
+
+### Error Recovery
+- Retry connection on failure (max 3 attempts)
+- Fall back to non-streaming on error
+- Show user-friendly error messages
+
+## Security
+
+### Rate Limiting
+- Streaming requests count against rate limits
+- Close stream if rate limit exceeded mid-response
+
+### Input Validation
+- Same validation as non-streaming endpoint
+- Safety checks before starting stream
+
+### Connection Management
+- Set timeout for inactive connections (60s)
+- Clean up resources on client disconnect
+
+## Future Enhancements
+
+1. **Token Usage Tracking**: Track streaming token usage separately
+2. **Pause/Resume**: Allow users to pause streaming
+3. **Multi-Model Streaming**: Switch models mid-conversation
+4. **Streaming Analytics**: Track average tokens/second
+5. **WebSocket Alternative**: Consider WebSocket for bidirectional streaming
+
+## Troubleshooting
+
+### Stream Cuts Off Early
+- Check timeout settings (increase to 120s for long responses)
+- Verify nginx/proxy timeout configuration
+- Check network connectivity
+
+### Tokens Arrive Out of Order
+- Ensure single-threaded processing
+- Use buffer to accumulate before rendering
+- Verify SSE event ordering
+
+### High Latency
+- Check Azure OpenAI endpoint latency
+- Optimize context size to reduce time-to-first-token
+- Consider caching common responses
+
+## References
+
+- [Azure OpenAI Streaming Docs](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/streaming)
+- [Server-Sent Events Spec](https://html.spec.whatwg.org/multipage/server-sent-events.html)
+- [LangChain Streaming](https://js.langchain.com/docs/expression_language/streaming)
diff --git a/maternal-app/maternal-app-backend/src/modules/ai/ai.controller.ts b/maternal-app/maternal-app-backend/src/modules/ai/ai.controller.ts
index 20183b0..2954068 100644
--- a/maternal-app/maternal-app-backend/src/modules/ai/ai.controller.ts
+++ b/maternal-app/maternal-app-backend/src/modules/ai/ai.controller.ts
@@ -7,7 +7,10 @@ import {
   Body,
   Param,
   Req,
+  Res,
+  Header,
 } from '@nestjs/common';
+import { Response } from 'express';
 import { AIService } from './ai.service';
 import { ChatMessageDto } from './dto/chat-message.dto';
 import { Public } from '../auth/decorators/public.decorator';
@@ -27,6 +30,47 @@ export class AIController {
     };
   }
 
+  /**
+   * Streaming chat endpoint
+   * Returns Server-Sent Events (SSE) for real-time streaming responses
+   */
+  @Public() // Public for testing
+  @Post('chat/stream')
+  @Header('Content-Type', 'text/event-stream')
+  @Header('Cache-Control', 'no-cache')
+  @Header('Connection', 'keep-alive')
+  async chatStream(
+    @Req() req: any,
+    @Body() chatDto: ChatMessageDto,
+    @Res() res: Response,
+  ) {
+    const userId = req.user?.userId || 'test_user_123';
+
+    // Set up SSE headers
+    res.setHeader('Content-Type', 'text/event-stream');
+    res.setHeader('Cache-Control', 'no-cache');
+    res.setHeader('Connection', 'keep-alive');
+    res.setHeader('X-Accel-Buffering', 'no'); // Disable nginx buffering
+
+    try {
+      // Stream the response
+      await this.aiService.chatStream(userId, chatDto, (chunk) => {
+        // Send each chunk as an SSE event
+        res.write(`data: ${JSON.stringify(chunk)}\n\n`);
+      });
+
+      // Send completion event
+      res.write(`data: ${JSON.stringify({ type: 'done' })}\n\n`);
+      res.end();
+    } catch (error) {
+      // Send error event
+      res.write(
+        `data: ${JSON.stringify({ type: 'error', message: error.message })}\n\n`,
+      );
+      res.end();
+    }
+  }
+
   @Public() // Public for testing
   @Get('conversations')
   async getConversations(@Req() req: any) {
diff --git a/maternal-app/maternal-app-backend/src/modules/ai/streaming/streaming.service.ts b/maternal-app/maternal-app-backend/src/modules/ai/streaming/streaming.service.ts
new file mode 100644
index 0000000..9a711a4
--- /dev/null
+++ b/maternal-app/maternal-app-backend/src/modules/ai/streaming/streaming.service.ts
@@ -0,0 +1,166 @@
+import { Injectable, Logger } from '@nestjs/common';
+import { ConfigService } from '@nestjs/config';
+import axios from 'axios';
+
+export interface StreamChunk {
+  type: 'token' | 'metadata' | 'done' | 'error';
+  content?: string;
+  metadata?: any;
+  error?: string;
+}
+
+export type StreamCallback = (chunk: StreamChunk) => void;
+
+/**
+ * Streaming Service for AI Responses
+ *
+ * Handles Server-Sent Events (SSE) streaming for real-time AI responses
+ * Supports both Azure OpenAI and OpenAI streaming APIs
+ */
+@Injectable()
+export class StreamingService {
+  private readonly logger = new Logger(StreamingService.name);
+  private aiProvider: 'openai' | 'azure';
+  private azureChatEndpoint: string;
+  private azureChatDeployment: string;
+  private azureChatApiVersion: string;
+  private azureChatApiKey: string;
+
+  constructor(private configService: ConfigService) {
+    this.aiProvider = this.configService.get('AI_PROVIDER', 'azure') as any;
+    this.azureChatEndpoint = this.configService.get('AZURE_OPENAI_CHAT_ENDPOINT');
+    this.azureChatDeployment = this.configService.get('AZURE_OPENAI_CHAT_DEPLOYMENT');
+    this.azureChatApiVersion = this.configService.get('AZURE_OPENAI_CHAT_API_VERSION');
+    this.azureChatApiKey = this.configService.get('AZURE_OPENAI_CHAT_API_KEY');
+  }
+
+  /**
+   * Stream Azure OpenAI completion
+   */
+  async streamAzureCompletion(
+    messages: Array<{ role: string; content: string }>,
+    callback: StreamCallback,
+  ): Promise<void> {
+    const url = `${this.azureChatEndpoint}/openai/deployments/${this.azureChatDeployment}/chat/completions?api-version=${this.azureChatApiVersion}`;
+
+    const requestBody = {
+      messages,
+      max_tokens: 1000,
+      temperature: 0.7,
+      stream: true, // Enable streaming
+    };
+
+    try {
+      const response = await axios.post(url, requestBody, {
+        headers: {
+          'Content-Type': 'application/json',
+          'api-key': this.azureChatApiKey,
+        },
+        responseType: 'stream', // Important for streaming
+        timeout: 60000,
+      });
+
+      let buffer = '';
+
+      // Process the stream
+      response.data.on('data', (chunk: Buffer) => {
+        buffer += chunk.toString();
+        const lines = buffer.split('\n');
+
+        // Keep the last incomplete line in the buffer
+        buffer = lines.pop() || '';
+
+        for (const line of lines) {
+          const trimmed = line.trim();
+
+          // Skip empty lines and comments
+          if (!trimmed || trimmed.startsWith(':')) {
+            continue;
+          }
+
+          // Parse SSE format
+          if (trimmed.startsWith('data: ')) {
+            const data = trimmed.substring(6);
+
+            // Check for completion marker
+            if (data === '[DONE]') {
+              callback({ type: 'done' });
+              return;
+            }
+
+            try {
+              const parsed = JSON.parse(data);
+
+              // Extract the content delta
+              if (parsed.choices && parsed.choices[0]?.delta?.content) {
+                callback({
+                  type: 'token',
+                  content: parsed.choices[0].delta.content,
+                });
+              }
+
+              // Check for finish reason
+              if (parsed.choices && parsed.choices[0]?.finish_reason) {
+                callback({
+                  type: 'metadata',
+                  metadata: {
+                    finishReason: parsed.choices[0].finish_reason,
+                    usage: parsed.usage,
+                  },
+                });
+              }
+            } catch (error) {
+              this.logger.error('Failed to parse streaming chunk:', error);
+            }
+          }
+        }
+      });
+
+      response.data.on('end', () => {
+        callback({ type: 'done' });
+      });
+
+      response.data.on('error', (error: Error) => {
+        callback({
+          type: 'error',
+          error: error.message,
+        });
+      });
+    } catch (error) {
+      this.logger.error('Azure streaming failed:', error);
+      callback({
+        type: 'error',
+        error: error.message || 'Streaming failed',
+      });
+    }
+  }
+
+  /**
+   * Stream OpenAI completion
+   */
+  async streamOpenAICompletion(
+    messages: Array<{ role: string; content: string }>,
+    callback: StreamCallback,
+  ): Promise<void> {
+    // TODO: Implement OpenAI streaming
+    // For now, fall back to non-streaming
+    callback({
+      type: 'error',
+      error: 'OpenAI streaming not yet implemented',
+    });
+  }
+
+  /**
+   * Main streaming method - routes to appropriate provider
+   */
+  async streamCompletion(
+    messages: Array<{ role: string; content: string }>,
+    callback: StreamCallback,
+  ): Promise<void> {
+    if (this.aiProvider === 'azure') {
+      await this.streamAzureCompletion(messages, callback);
+    } else {
+      await this.streamOpenAICompletion(messages, callback);
+    }
+  }
+}