# Azure OpenAI Integration - Implementation Summary ## Overview The AI service has been updated to support both OpenAI and Azure OpenAI with automatic fallback, proper environment configuration, and full support for GPT-5 models including reasoning tokens. --- ## Environment Configuration ### ✅ Complete Environment Variables (.env) ```bash # AI Services Configuration # Primary provider: 'openai' or 'azure' AI_PROVIDER=azure # OpenAI Configuration (Primary - if AI_PROVIDER=openai) OPENAI_API_KEY=sk-your-openai-api-key-here OPENAI_MODEL=gpt-4o-mini OPENAI_EMBEDDING_MODEL=text-embedding-3-small OPENAI_MAX_TOKENS=1000 # Azure OpenAI Configuration (if AI_PROVIDER=azure) AZURE_OPENAI_ENABLED=true # Azure OpenAI - Chat/Completion Endpoint (GPT-5) # Each deployment has its own API key for better security and quota management AZURE_OPENAI_CHAT_ENDPOINT=https://footprints-open-ai.openai.azure.com AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-5-mini AZURE_OPENAI_CHAT_API_VERSION=2025-04-01-preview AZURE_OPENAI_CHAT_API_KEY=your-chat-api-key-here AZURE_OPENAI_CHAT_MAX_TOKENS=1000 AZURE_OPENAI_REASONING_EFFORT=medium # Azure OpenAI - Whisper/Voice Endpoint AZURE_OPENAI_WHISPER_ENDPOINT=https://footprints-open-ai.openai.azure.com AZURE_OPENAI_WHISPER_DEPLOYMENT=whisper AZURE_OPENAI_WHISPER_API_VERSION=2025-04-01-preview AZURE_OPENAI_WHISPER_API_KEY=your-whisper-api-key-here # Azure OpenAI - Embeddings Endpoint AZURE_OPENAI_EMBEDDINGS_ENDPOINT=https://footprints-ai.openai.azure.com AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=Text-Embedding-ada-002-V2 AZURE_OPENAI_EMBEDDINGS_API_VERSION=2023-05-15 AZURE_OPENAI_EMBEDDINGS_API_KEY=your-embeddings-api-key-here ``` ### Configuration for Your Setup Based on your requirements: ```bash AI_PROVIDER=azure AZURE_OPENAI_ENABLED=true # Chat (GPT-5 Mini) - Separate API key AZURE_OPENAI_CHAT_ENDPOINT=https://footprints-open-ai.openai.azure.com AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-5-mini AZURE_OPENAI_CHAT_API_VERSION=2025-04-01-preview AZURE_OPENAI_CHAT_API_KEY=[your_chat_key] AZURE_OPENAI_REASONING_EFFORT=medium # or 'minimal', 'low', 'high' # Voice (Whisper) - Separate API key AZURE_OPENAI_WHISPER_ENDPOINT=https://footprints-open-ai.openai.azure.com AZURE_OPENAI_WHISPER_DEPLOYMENT=whisper AZURE_OPENAI_WHISPER_API_VERSION=2025-04-01-preview AZURE_OPENAI_WHISPER_API_KEY=[your_whisper_key] # Embeddings - Separate API key AZURE_OPENAI_EMBEDDINGS_ENDPOINT=https://footprints-ai.openai.azure.com AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=Text-Embedding-ada-002-V2 AZURE_OPENAI_EMBEDDINGS_API_VERSION=2023-05-15 AZURE_OPENAI_EMBEDDINGS_API_KEY=[your_embeddings_key] ``` ### Why Separate API Keys? Each Azure OpenAI deployment can have its own API key for: - **Security**: Limit blast radius if a key is compromised - **Quota Management**: Separate rate limits per service - **Cost Tracking**: Monitor usage per deployment - **Access Control**: Different team members can have access to different services --- ## AI Service Implementation ### ✅ Key Features **1. Multi-Provider Support** - Primary: Azure OpenAI (GPT-5) - Fallback: OpenAI (GPT-4o-mini) - Automatic failover if Azure unavailable **2. GPT-5 Specific Features** - ✅ Reasoning tokens tracking - ✅ Configurable reasoning effort (minimal, low, medium, high) - ✅ Extended context (272K input + 128K output = 400K total) - ✅ Response metadata with token counts **3. Response Format** ```typescript interface ChatResponseDto { conversationId: string; message: string; timestamp: Date; metadata?: { model?: string; // 'gpt-5-mini' or 'gpt-4o-mini' provider?: 'openai' | 'azure'; reasoningTokens?: number; // GPT-5 only totalTokens?: number; }; } ``` **4. Azure GPT-5 Request** ```typescript const requestBody = { messages: azureMessages, temperature: 0.7, max_tokens: 1000, stream: false, reasoning_effort: 'medium', // GPT-5 specific }; ``` **5. Azure GPT-5 Response** ```typescript { choices: [{ message: { content: string }, reasoning_tokens: number, // NEW in GPT-5 }], usage: { prompt_tokens: number, completion_tokens: number, reasoning_tokens: number, // NEW in GPT-5 total_tokens: number, } } ``` --- ## GPT-5 vs GPT-4 Differences ### Reasoning Tokens **GPT-5 introduces `reasoning_tokens`**: - Hidden tokens used for internal reasoning - Not part of message content - Configurable via `reasoning_effort` parameter - Higher effort = more reasoning tokens = better quality **Reasoning Effort Levels**: ```typescript 'minimal' // Fastest, lowest reasoning tokens 'low' // Quick responses with basic reasoning 'medium' // Balanced (default) 'high' // Most thorough, highest reasoning tokens ``` ### Context Length **GPT-5**: - Input: 272,000 tokens (vs GPT-4's 128K) - Output: 128,000 tokens - Total context: 400,000 tokens **GPT-4o**: - Input: 128,000 tokens - Total context: 128,000 tokens ### Token Efficiency **GPT-5 Benefits**: - 22% fewer output tokens vs o3 - 45% fewer tool calls - Better performance per dollar despite reasoning overhead ### Pricing **Azure OpenAI GPT-5**: - Input: $1.25 / 1M tokens - Output: $10.00 / 1M tokens - Cached input: $0.125 / 1M (90% discount for repeated prompts) --- ## Implementation Details ### Service Initialization The AI service now: 1. Checks `AI_PROVIDER` environment variable 2. Configures Azure OpenAI if provider is 'azure' 3. Falls back to OpenAI if Azure not configured 4. Logs which provider is active ```typescript constructor() { this.aiProvider = this.configService.get('AI_PROVIDER', 'openai'); if (this.aiProvider === 'azure') { // Load Azure configuration from environment this.azureChatEndpoint = this.configService.get('AZURE_OPENAI_CHAT_ENDPOINT'); this.azureChatDeployment = this.configService.get('AZURE_OPENAI_CHAT_DEPLOYMENT'); // ... more configuration } else { // Load OpenAI configuration this.chatModel = new ChatOpenAI({ ... }); } } ``` ### Chat Method Flow ```typescript async chat(userId, chatDto) { // 1. Validate configuration // 2. Get/create conversation // 3. Build context with user data // 4. Generate response based on provider: if (this.aiProvider === 'azure') { const response = await this.generateWithAzure(messages); // Returns: { content, reasoningTokens, totalTokens } } else { const response = await this.generateWithOpenAI(messages); // Returns: content string } // 5. Save conversation with token tracking // 6. Return response with metadata } ``` ### Azure Generation Method ```typescript private async generateWithAzure(messages) { const url = `${endpoint}/openai/deployments/${deployment}/chat/completions?api-version=${apiVersion}`; const requestBody = { messages: azureMessages, temperature: 0.7, max_tokens: 1000, reasoning_effort: 'medium', // GPT-5 parameter }; const response = await axios.post(url, requestBody, { headers: { 'api-key': this.azureApiKey, 'Content-Type': 'application/json', }, }); return { content: response.data.choices[0].message.content, reasoningTokens: response.data.usage.reasoning_tokens, totalTokens: response.data.usage.total_tokens, }; } ``` ### Automatic Fallback If Azure fails, the service automatically retries with OpenAI: ```typescript catch (error) { // Fallback to OpenAI if Azure fails if (this.aiProvider === 'azure' && this.chatModel) { this.logger.warn('Azure OpenAI failed, attempting OpenAI fallback...'); this.aiProvider = 'openai'; return this.chat(userId, chatDto); // Recursive call with OpenAI } throw new BadRequestException('Failed to generate AI response'); } ``` --- ## Testing the Integration ### 1. Check Provider Status ```bash GET /api/v1/ai/provider-status ``` Response: ```json { "provider": "azure", "model": "gpt-5-mini", "configured": true, "endpoint": "https://footprints-open-ai.openai.azure.com" } ``` ### 2. Test Chat with GPT-5 ```bash POST /api/v1/ai/chat Authorization: Bearer {token} { "message": "How much should a 3-month-old eat per feeding?" } ``` Response: ```json { "conversationId": "conv_123", "message": "A 3-month-old typically eats...", "timestamp": "2025-01-15T10:30:00Z", "metadata": { "model": "gpt-5-mini", "provider": "azure", "reasoningTokens": 145, "totalTokens": 523 } } ``` ### 3. Monitor Reasoning Tokens Check logs for GPT-5 reasoning token usage: ``` [AIService] Azure OpenAI response: { model: 'gpt-5-mini', finish_reason: 'stop', prompt_tokens: 256, completion_tokens: 122, reasoning_tokens: 145, // GPT-5 reasoning overhead total_tokens: 523 } ``` --- ## Optimizing Reasoning Effort ### When to Use Each Level **Minimal** (`reasoning_effort: 'minimal'`): - Simple queries - Quick responses needed - Cost optimization - Use case: "What time is it?" **Low** (`reasoning_effort: 'low'`): - Straightforward questions - Fast turnaround required - Use case: "How many oz in 120ml?" **Medium** (`reasoning_effort: 'medium'`) - **Default**: - Balanced performance - Most common use cases - Use case: "Is my baby's sleep pattern normal?" **High** (`reasoning_effort: 'high'`): - Complex reasoning required - Premium features - Use case: "Analyze my baby's feeding patterns over the last month and suggest optimizations" ### Dynamic Reasoning Effort You can adjust based on query complexity: ```typescript // Future enhancement: Analyze query complexity const effort = this.determineReasoningEffort(chatDto.message); const requestBody = { messages: azureMessages, reasoning_effort: effort, // Dynamic based on query }; ``` --- ## Future Enhancements ### 1. Voice Service (Whisper) Implement similar pattern for voice transcription: ```typescript export class WhisperService { async transcribeAudio(audioBuffer: Buffer): Promise { if (this.aiProvider === 'azure') { return this.transcribeWithAzure(audioBuffer); } return this.transcribeWithOpenAI(audioBuffer); } private async transcribeWithAzure(audioBuffer: Buffer) { const url = `${this.azureWhisperEndpoint}/openai/deployments/${this.azureWhisperDeployment}/audio/transcriptions?api-version=${this.azureWhisperApiVersion}`; const formData = new FormData(); formData.append('file', new Blob([audioBuffer]), 'audio.wav'); const response = await axios.post(url, formData, { headers: { 'api-key': this.azureWhisperApiKey, // Separate key for Whisper }, }); return response.data.text; } } ``` ### 2. Embeddings Service For pattern recognition and similarity search: ```typescript export class EmbeddingsService { async createEmbedding(text: string): Promise { if (this.aiProvider === 'azure') { return this.createEmbeddingWithAzure(text); } return this.createEmbeddingWithOpenAI(text); } private async createEmbeddingWithAzure(text: string) { const url = `${this.azureEmbeddingsEndpoint}/openai/deployments/${this.azureEmbeddingsDeployment}/embeddings?api-version=${this.azureEmbeddingsApiVersion}`; const response = await axios.post(url, { input: text }, { headers: { 'api-key': this.azureEmbeddingsApiKey, // Separate key for Embeddings }, }); return response.data.data[0].embedding; } } ``` ### 3. Prompt Caching Leverage Azure's cached input pricing (90% discount): ```typescript // Reuse identical system prompts for cost savings const systemPrompt = `You are a helpful parenting assistant...`; // Cache this ``` ### 4. Streaming Responses For better UX with long responses: ```typescript const requestBody = { messages: azureMessages, stream: true, // Enable streaming reasoning_effort: 'medium', }; // Handle streamed response ``` --- ## Troubleshooting ### Common Issues **1. "AI service not configured"** - Check `AI_PROVIDER` is set to 'azure' - Verify `AZURE_OPENAI_CHAT_API_KEY` is set (not the old `AZURE_OPENAI_API_KEY`) - Confirm `AZURE_OPENAI_CHAT_ENDPOINT` is correct **2. "Invalid API version"** - GPT-5 requires `2025-04-01-preview` or later - Update `AZURE_OPENAI_CHAT_API_VERSION` **3. "Deployment not found"** - Verify `AZURE_OPENAI_CHAT_DEPLOYMENT` matches Azure deployment name - Check deployment is in same region as endpoint **4. High token usage** - GPT-5 reasoning tokens are additional overhead - Reduce `reasoning_effort` if cost is concern - Use `'minimal'` for simple queries **5. Slow responses** - Higher `reasoning_effort` = slower responses - Use `'low'` or `'minimal'` for time-sensitive queries - Consider caching common responses ### Debug Logging Enable debug logs to see requests/responses: ```typescript this.logger.debug('Azure OpenAI request:', { url, deployment, reasoning_effort, messageCount, }); this.logger.debug('Azure OpenAI response:', { model, finish_reason, prompt_tokens, completion_tokens, reasoning_tokens, total_tokens, }); ``` --- ## Summary ✅ **Fully Configured**: - Environment variables for all Azure endpoints - Chat (GPT-5), Whisper, Embeddings separately configurable - No hardcoded values ✅ **GPT-5 Support**: - Reasoning tokens tracked and returned - Configurable reasoning effort (minimal/low/medium/high) - Extended 400K context window ready ✅ **Automatic Fallback**: - Azure → OpenAI if Azure fails - Graceful degradation ✅ **Monitoring**: - Detailed logging for debugging - Token usage tracking (including reasoning tokens) - Provider status endpoint ✅ **Production Ready**: - Proper error handling - Timeout configuration (30s) - Metadata in responses --- ## Next Steps 1. **Add your actual API keys** to `.env`: ```bash AZURE_OPENAI_CHAT_API_KEY=[your_chat_key] AZURE_OPENAI_WHISPER_API_KEY=[your_whisper_key] AZURE_OPENAI_EMBEDDINGS_API_KEY=[your_embeddings_key] ``` 2. **Restart the backend** to pick up configuration: ```bash npm run start:dev ``` 3. **Test the integration**: - Check provider status endpoint - Send a test chat message - Verify reasoning tokens in response 4. **Monitor token usage**: - Review logs for reasoning token counts - Adjust `reasoning_effort` based on usage patterns - Consider cost optimization strategies 5. **Implement Voice & Embeddings** (optional): - Follow similar patterns as chat service - Use separate Azure endpoints already configured