feat: Add AI streaming responses foundation (partial implementation)
Some checks failed
CI/CD Pipeline / Lint and Test (push) Has been cancelled
CI/CD Pipeline / E2E Tests (push) Has been cancelled
CI/CD Pipeline / Build Application (push) Has been cancelled

**Streaming Infrastructure**
Created foundation for real-time AI response streaming using Server-Sent Events (SSE):

**1. StreamingService** (163 lines)
- Azure OpenAI streaming API integration
- SSE stream processing with buffer management
- Chunk types: token, metadata, done, error
- Response stream handling with error recovery
- Timeout and connection management

**2. AI Controller Streaming Endpoint**
- POST /api/v1/ai/chat/stream
- SSE headers configuration (Content-Type, Cache-Control, Connection)
- nginx buffering disabled (X-Accel-Buffering)
- Chunk-by-chunk streaming to client
- Error handling with SSE events

**3. Implementation Documentation**
Created comprehensive STREAMING_IMPLEMENTATION.md (200+ lines):
- Architecture overview
- Backend/frontend integration steps
- Code examples for hooks and components
- Testing procedures (curl + browser)
- Performance considerations (token buffering, memory management)
- Security (rate limiting, input validation)
- Troubleshooting guide
- Future enhancements

**Technical Details**
- Server-Sent Events (SSE) protocol
- Axios stream processing
- Buffer management for incomplete lines
- Delta content extraction from Azure responses
- Finish reason and usage metadata tracking

**Remaining Work (Frontend)**
- useStreamingChat hook implementation
- AIChatInterface streaming state management
- Token buffering for UI updates (50ms intervals)
- Streaming indicator and cursor animation
- Error recovery with fallback to non-streaming

**Impact**
- Perceived performance: Users see responses immediately
- Better UX: Token-by-token display feels more responsive
- Ready for production: SSE is well-supported across browsers
- Scalable: Can handle multiple concurrent streams

Files:
- src/modules/ai/ai.controller.ts: Added streaming endpoint
- src/modules/ai/streaming/streaming.service.ts: Core streaming logic
- docs/STREAMING_IMPLEMENTATION.md: Complete implementation guide

Next Steps:
1. Integrate StreamingService into AI module
2. Implement AIService.chatStream() method
3. Create frontend useStreamingChat hook
4. Update AIChatInterface with streaming UI

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-10-03 22:24:53 +00:00
parent 906e5aeacd
commit 075c4b88c6
3 changed files with 490 additions and 0 deletions

View File

@@ -0,0 +1,280 @@
# AI Streaming Responses - Implementation Guide
## Overview
This document describes the implementation of streaming AI responses using Server-Sent Events (SSE) for real-time token-by-token display.
## Architecture
### Backend Components
**1. StreamingService** (`src/modules/ai/streaming/streaming.service.ts`)
- Handles Azure OpenAI streaming API
- Processes SSE stream from Azure
- Emits tokens via callback function
- Chunk types: `token`, `metadata`, `done`, `error`
**2. AI Controller** (`src/modules/ai/ai.controller.ts`)
- Endpoint: `POST /api/v1/ai/chat/stream`
- Headers: `Content-Type: text/event-stream`
- Streams response chunks to client
- Error handling with SSE events
**3. AI Service Integration** (TODO)
- Add `chatStream()` method to AIService
- Reuse existing safety checks and context building
- Call StreamingService for actual streaming
- Save conversation after completion
### Frontend Components (TODO)
**1. Streaming Hook** (`hooks/useStreamingChat.ts`)
```typescript
const { streamMessage, isStreaming } = useStreamingChat();
streamMessage(
{ message: "Hello", conversationId: "123" },
(chunk) => {
// Handle incoming chunks
if (chunk.type === 'token') {
appendToMessage(chunk.content);
}
}
);
```
**2. AIChatInterface Updates**
- Add streaming state management
- Display tokens as they arrive
- Show typing indicator during streaming
- Handle stream errors gracefully
## Implementation Steps
### Step 1: Complete Backend Integration (30 min)
1. Add StreamingService to AI module:
```typescript
// ai.module.ts
import { StreamingService } from './streaming/streaming.service';
@Module({
providers: [AIService, StreamingService, ...]
})
```
2. Implement `chatStream()` in AIService:
```typescript
async chatStream(
userId: string,
chatDto: ChatMessageDto,
callback: StreamCallback
): Promise<void> {
// 1. Run all safety checks (rate limit, input sanitization, etc.)
// 2. Build context messages
// 3. Call streamingService.streamCompletion()
// 4. Collect full response and save conversation
}
```
### Step 2: Frontend Streaming Client (1 hour)
1. Create EventSource hook:
```typescript
// hooks/useStreamingChat.ts
export function useStreamingChat() {
const streamMessage = async (
chatDto: ChatMessageDto,
onChunk: (chunk: StreamChunk) => void
) => {
const response = await fetch('/api/v1/ai/chat/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(chatDto),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.substring(6));
onChunk(data);
}
}
}
};
return { streamMessage };
}
```
2. Update AIChatInterface:
```typescript
const [streamingMessage, setStreamingMessage] = useState('');
const { streamMessage, isStreaming } = useStreamingChat();
const handleSubmit = async () => {
setStreamingMessage('');
setIsLoading(true);
await streamMessage(
{ message: input, conversationId },
(chunk) => {
if (chunk.type === 'token') {
setStreamingMessage(prev => prev + chunk.content);
} else if (chunk.type === 'done') {
// Add to messages array
setMessages(prev => [...prev, {
role: 'assistant',
content: streamingMessage
}]);
setStreamingMessage('');
setIsLoading(false);
}
}
);
};
```
### Step 3: UI Enhancements (30 min)
1. Add streaming indicator:
```tsx
{isStreaming && (
<Box sx={{ display: 'flex', alignItems: 'center', gap: 1 }}>
<CircularProgress size={16} />
<Typography variant="caption">AI is thinking...</Typography>
</Box>
)}
```
2. Show partial message:
```tsx
{streamingMessage && (
<Paper sx={{ p: 2, bgcolor: 'grey.100' }}>
<ReactMarkdown>{streamingMessage}</ReactMarkdown>
<Box component="span" sx={{ animation: 'blink 1s infinite' }}>|</Box>
</Paper>
)}
```
3. Add CSS animation:
```css
@keyframes blink {
0%, 100% { opacity: 1; }
50% { opacity: 0; }
}
```
## Testing
### Backend Test
```bash
curl -X POST http://localhost:3020/api/v1/ai/chat/stream \
-H "Content-Type: application/json" \
-d '{"message": "Tell me about baby sleep patterns"}' \
--no-buffer
```
Expected output:
```
data: {"type":"token","content":"Baby"}
data: {"type":"token","content":" sleep"}
data: {"type":"token","content":" patterns"}
...
data: {"type":"done"}
```
### Frontend Test
1. Open browser DevTools Network tab
2. Send a message in AI chat
3. Verify SSE connection established
4. Confirm tokens appear in real-time
## Performance Considerations
### Token Buffering
To reduce UI updates, buffer tokens:
```typescript
let buffer = '';
let bufferTimeout: NodeJS.Timeout;
onChunk((chunk) => {
if (chunk.type === 'token') {
buffer += chunk.content;
clearTimeout(bufferTimeout);
bufferTimeout = setTimeout(() => {
setStreamingMessage(prev => prev + buffer);
buffer = '';
}, 50); // Update every 50ms
}
});
```
### Memory Management
- Clear streaming state on unmount
- Cancel ongoing streams when switching conversations
- Limit message history to prevent memory leaks
### Error Recovery
- Retry connection on failure (max 3 attempts)
- Fall back to non-streaming on error
- Show user-friendly error messages
## Security
### Rate Limiting
- Streaming requests count against rate limits
- Close stream if rate limit exceeded mid-response
### Input Validation
- Same validation as non-streaming endpoint
- Safety checks before starting stream
### Connection Management
- Set timeout for inactive connections (60s)
- Clean up resources on client disconnect
## Future Enhancements
1. **Token Usage Tracking**: Track streaming token usage separately
2. **Pause/Resume**: Allow users to pause streaming
3. **Multi-Model Streaming**: Switch models mid-conversation
4. **Streaming Analytics**: Track average tokens/second
5. **WebSocket Alternative**: Consider WebSocket for bidirectional streaming
## Troubleshooting
### Stream Cuts Off Early
- Check timeout settings (increase to 120s for long responses)
- Verify nginx/proxy timeout configuration
- Check network connectivity
### Tokens Arrive Out of Order
- Ensure single-threaded processing
- Use buffer to accumulate before rendering
- Verify SSE event ordering
### High Latency
- Check Azure OpenAI endpoint latency
- Optimize context size to reduce time-to-first-token
- Consider caching common responses
## References
- [Azure OpenAI Streaming Docs](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/streaming)
- [Server-Sent Events Spec](https://html.spec.whatwg.org/multipage/server-sent-events.html)
- [LangChain Streaming](https://js.langchain.com/docs/expression_language/streaming)