andrei/maternal-app

Fork 0

Files

Andrei 075c4b88c6

CI/CD Pipeline / Lint and Test (push) Has been cancelled

Details

CI/CD Pipeline / E2E Tests (push) Has been cancelled

Details

CI/CD Pipeline / Build Application (push) Has been cancelled

Details

feat: Add AI streaming responses foundation (partial implementation)

**Streaming Infrastructure**
Created foundation for real-time AI response streaming using Server-Sent Events (SSE):

**1. StreamingService** (163 lines)
- Azure OpenAI streaming API integration
- SSE stream processing with buffer management
- Chunk types: token, metadata, done, error
- Response stream handling with error recovery
- Timeout and connection management

**2. AI Controller Streaming Endpoint**
- POST /api/v1/ai/chat/stream
- SSE headers configuration (Content-Type, Cache-Control, Connection)
- nginx buffering disabled (X-Accel-Buffering)
- Chunk-by-chunk streaming to client
- Error handling with SSE events

**3. Implementation Documentation**
Created comprehensive STREAMING_IMPLEMENTATION.md (200+ lines):
- Architecture overview
- Backend/frontend integration steps
- Code examples for hooks and components
- Testing procedures (curl + browser)
- Performance considerations (token buffering, memory management)
- Security (rate limiting, input validation)
- Troubleshooting guide
- Future enhancements

**Technical Details**
- Server-Sent Events (SSE) protocol
- Axios stream processing
- Buffer management for incomplete lines
- Delta content extraction from Azure responses
- Finish reason and usage metadata tracking

**Remaining Work (Frontend)**
- useStreamingChat hook implementation
- AIChatInterface streaming state management
- Token buffering for UI updates (50ms intervals)
- Streaming indicator and cursor animation
- Error recovery with fallback to non-streaming

**Impact**
- Perceived performance: Users see responses immediately
- Better UX: Token-by-token display feels more responsive
- Ready for production: SSE is well-supported across browsers
- Scalable: Can handle multiple concurrent streams

Files:
- src/modules/ai/ai.controller.ts: Added streaming endpoint
- src/modules/ai/streaming/streaming.service.ts: Core streaming logic
- docs/STREAMING_IMPLEMENTATION.md: Complete implementation guide

Next Steps:
1. Integrate StreamingService into AI module
2. Implement AIService.chatStream() method
3. Create frontend useStreamingChat hook
4. Update AIChatInterface with streaming UI

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-10-03 22:24:53 +00:00

6.9 KiB

Raw Blame History

AI Streaming Responses - Implementation Guide

Overview

This document describes the implementation of streaming AI responses using Server-Sent Events (SSE) for real-time token-by-token display.

Architecture

Backend Components

1. StreamingService (src/modules/ai/streaming/streaming.service.ts)

Handles Azure OpenAI streaming API
Processes SSE stream from Azure
Emits tokens via callback function
Chunk types: token, metadata, done, error

2. AI Controller (src/modules/ai/ai.controller.ts)

Endpoint: POST /api/v1/ai/chat/stream
Headers: Content-Type: text/event-stream
Streams response chunks to client
Error handling with SSE events

3. AI Service Integration (TODO)

Add chatStream() method to AIService
Reuse existing safety checks and context building
Call StreamingService for actual streaming
Save conversation after completion

Frontend Components (TODO)

1. Streaming Hook (hooks/useStreamingChat.ts)

const { streamMessage, isStreaming } = useStreamingChat();

streamMessage(
  { message: "Hello", conversationId: "123" },
  (chunk) => {
    // Handle incoming chunks
    if (chunk.type === 'token') {
      appendToMessage(chunk.content);
    }
  }
);

2. AIChatInterface Updates

Add streaming state management
Display tokens as they arrive
Show typing indicator during streaming
Handle stream errors gracefully

Implementation Steps

Step 1: Complete Backend Integration (30 min)

Add StreamingService to AI module:

// ai.module.ts
import { StreamingService } from './streaming/streaming.service';

@Module({
  providers: [AIService, StreamingService, ...]
})

Implement chatStream() in AIService:

async chatStream(
  userId: string,
  chatDto: ChatMessageDto,
  callback: StreamCallback
): Promise<void> {
  // 1. Run all safety checks (rate limit, input sanitization, etc.)
  // 2. Build context messages
  // 3. Call streamingService.streamCompletion()
  // 4. Collect full response and save conversation
}

Step 2: Frontend Streaming Client (1 hour)

Create EventSource hook:

// hooks/useStreamingChat.ts
export function useStreamingChat() {
  const streamMessage = async (
    chatDto: ChatMessageDto,
    onChunk: (chunk: StreamChunk) => void
  ) => {
    const response = await fetch('/api/v1/ai/chat/stream', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(chatDto),
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const chunk = decoder.decode(value);
      const lines = chunk.split('\n');

      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const data = JSON.parse(line.substring(6));
          onChunk(data);
        }
      }
    }
  };

  return { streamMessage };
}

Update AIChatInterface:

const [streamingMessage, setStreamingMessage] = useState('');
const { streamMessage, isStreaming } = useStreamingChat();

const handleSubmit = async () => {
  setStreamingMessage('');
  setIsLoading(true);

  await streamMessage(
    { message: input, conversationId },
    (chunk) => {
      if (chunk.type === 'token') {
        setStreamingMessage(prev => prev + chunk.content);
      } else if (chunk.type === 'done') {
        // Add to messages array
        setMessages(prev => [...prev, {
          role: 'assistant',
          content: streamingMessage
        }]);
        setStreamingMessage('');
        setIsLoading(false);
      }
    }
  );
};

Step 3: UI Enhancements (30 min)

Add streaming indicator:

{isStreaming && (
  <Box sx={{ display: 'flex', alignItems: 'center', gap: 1 }}>
    <CircularProgress size={16} />
    <Typography variant="caption">AI is thinking...</Typography>
  </Box>
)}

Show partial message:

{streamingMessage && (
  <Paper sx={{ p: 2, bgcolor: 'grey.100' }}>
    <ReactMarkdown>{streamingMessage}</ReactMarkdown>
    <Box component="span" sx={{ animation: 'blink 1s infinite' }}>|</Box>
  </Paper>
)}

Add CSS animation:

@keyframes blink {
  0%, 100% { opacity: 1; }
  50% { opacity: 0; }
}

Testing

Backend Test

curl -X POST http://localhost:3020/api/v1/ai/chat/stream \
  -H "Content-Type: application/json" \
  -d '{"message": "Tell me about baby sleep patterns"}' \
  --no-buffer

Expected output:

data: {"type":"token","content":"Baby"}

data: {"type":"token","content":" sleep"}

data: {"type":"token","content":" patterns"}

...

data: {"type":"done"}

Frontend Test

Open browser DevTools Network tab
Send a message in AI chat
Verify SSE connection established
Confirm tokens appear in real-time

Performance Considerations

Token Buffering

To reduce UI updates, buffer tokens:

let buffer = '';
let bufferTimeout: NodeJS.Timeout;

onChunk((chunk) => {
  if (chunk.type === 'token') {
    buffer += chunk.content;

    clearTimeout(bufferTimeout);
    bufferTimeout = setTimeout(() => {
      setStreamingMessage(prev => prev + buffer);
      buffer = '';
    }, 50); // Update every 50ms
  }
});

Memory Management

Clear streaming state on unmount
Cancel ongoing streams when switching conversations
Limit message history to prevent memory leaks

Error Recovery

Retry connection on failure (max 3 attempts)
Fall back to non-streaming on error
Show user-friendly error messages

Security

Rate Limiting

Streaming requests count against rate limits
Close stream if rate limit exceeded mid-response

Input Validation

Same validation as non-streaming endpoint
Safety checks before starting stream

Connection Management

Set timeout for inactive connections (60s)
Clean up resources on client disconnect

Future Enhancements

Token Usage Tracking: Track streaming token usage separately
Pause/Resume: Allow users to pause streaming
Multi-Model Streaming: Switch models mid-conversation
Streaming Analytics: Track average tokens/second
WebSocket Alternative: Consider WebSocket for bidirectional streaming

6.9 KiB

Raw Blame History

AI Streaming Responses - Implementation Guide

Overview

Architecture

Backend Components

Frontend Components (TODO)

Implementation Steps

Step 1: Complete Backend Integration (30 min)

Step 2: Frontend Streaming Client (1 hour)

Step 3: UI Enhancements (30 min)

Testing

Backend Test

Frontend Test

Performance Considerations

Token Buffering

Memory Management

Error Recovery

Security

Rate Limiting

Input Validation

Connection Management

Future Enhancements

Troubleshooting

Stream Cuts Off Early

Tokens Arrive Out of Order

High Latency

References

6.9 KiB Raw Blame History

AI Streaming Responses - Implementation Guide

Overview

Architecture

Backend Components

Frontend Components (TODO)

Implementation Steps

Step 1: Complete Backend Integration (30 min)

Step 2: Frontend Streaming Client (1 hour)

Step 3: UI Enhancements (30 min)

Testing

Backend Test

Frontend Test

Performance Considerations

Token Buffering

Memory Management

Error Recovery

Security

Rate Limiting

Input Validation

Connection Management

Future Enhancements

Troubleshooting

Stream Cuts Off Early

Tokens Arrive Out of Order

High Latency

References

6.9 KiB

Raw Blame History