Add comprehensive .gitignore
This commit is contained in:
576
docs/azure-openai-integration-summary.md
Normal file
576
docs/azure-openai-integration-summary.md
Normal file
@@ -0,0 +1,576 @@
|
||||
# Azure OpenAI Integration - Implementation Summary
|
||||
|
||||
## Overview
|
||||
|
||||
The AI service has been updated to support both OpenAI and Azure OpenAI with automatic fallback, proper environment configuration, and full support for GPT-5 models including reasoning tokens.
|
||||
|
||||
---
|
||||
|
||||
## Environment Configuration
|
||||
|
||||
### ✅ Complete Environment Variables (.env)
|
||||
|
||||
```bash
|
||||
# AI Services Configuration
|
||||
# Primary provider: 'openai' or 'azure'
|
||||
AI_PROVIDER=azure
|
||||
|
||||
# OpenAI Configuration (Primary - if AI_PROVIDER=openai)
|
||||
OPENAI_API_KEY=sk-your-openai-api-key-here
|
||||
OPENAI_MODEL=gpt-4o-mini
|
||||
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
|
||||
OPENAI_MAX_TOKENS=1000
|
||||
|
||||
# Azure OpenAI Configuration (if AI_PROVIDER=azure)
|
||||
AZURE_OPENAI_ENABLED=true
|
||||
|
||||
# Azure OpenAI - Chat/Completion Endpoint (GPT-5)
|
||||
# Each deployment has its own API key for better security and quota management
|
||||
AZURE_OPENAI_CHAT_ENDPOINT=https://footprints-open-ai.openai.azure.com
|
||||
AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-5-mini
|
||||
AZURE_OPENAI_CHAT_API_VERSION=2025-04-01-preview
|
||||
AZURE_OPENAI_CHAT_API_KEY=your-chat-api-key-here
|
||||
AZURE_OPENAI_CHAT_MAX_TOKENS=1000
|
||||
AZURE_OPENAI_REASONING_EFFORT=medium
|
||||
|
||||
# Azure OpenAI - Whisper/Voice Endpoint
|
||||
AZURE_OPENAI_WHISPER_ENDPOINT=https://footprints-open-ai.openai.azure.com
|
||||
AZURE_OPENAI_WHISPER_DEPLOYMENT=whisper
|
||||
AZURE_OPENAI_WHISPER_API_VERSION=2025-04-01-preview
|
||||
AZURE_OPENAI_WHISPER_API_KEY=your-whisper-api-key-here
|
||||
|
||||
# Azure OpenAI - Embeddings Endpoint
|
||||
AZURE_OPENAI_EMBEDDINGS_ENDPOINT=https://footprints-ai.openai.azure.com
|
||||
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=Text-Embedding-ada-002-V2
|
||||
AZURE_OPENAI_EMBEDDINGS_API_VERSION=2023-05-15
|
||||
AZURE_OPENAI_EMBEDDINGS_API_KEY=your-embeddings-api-key-here
|
||||
```
|
||||
|
||||
### Configuration for Your Setup
|
||||
|
||||
Based on your requirements:
|
||||
|
||||
```bash
|
||||
AI_PROVIDER=azure
|
||||
AZURE_OPENAI_ENABLED=true
|
||||
|
||||
# Chat (GPT-5 Mini) - Separate API key
|
||||
AZURE_OPENAI_CHAT_ENDPOINT=https://footprints-open-ai.openai.azure.com
|
||||
AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-5-mini
|
||||
AZURE_OPENAI_CHAT_API_VERSION=2025-04-01-preview
|
||||
AZURE_OPENAI_CHAT_API_KEY=[your_chat_key]
|
||||
AZURE_OPENAI_REASONING_EFFORT=medium # or 'minimal', 'low', 'high'
|
||||
|
||||
# Voice (Whisper) - Separate API key
|
||||
AZURE_OPENAI_WHISPER_ENDPOINT=https://footprints-open-ai.openai.azure.com
|
||||
AZURE_OPENAI_WHISPER_DEPLOYMENT=whisper
|
||||
AZURE_OPENAI_WHISPER_API_VERSION=2025-04-01-preview
|
||||
AZURE_OPENAI_WHISPER_API_KEY=[your_whisper_key]
|
||||
|
||||
# Embeddings - Separate API key
|
||||
AZURE_OPENAI_EMBEDDINGS_ENDPOINT=https://footprints-ai.openai.azure.com
|
||||
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=Text-Embedding-ada-002-V2
|
||||
AZURE_OPENAI_EMBEDDINGS_API_VERSION=2023-05-15
|
||||
AZURE_OPENAI_EMBEDDINGS_API_KEY=[your_embeddings_key]
|
||||
```
|
||||
|
||||
### Why Separate API Keys?
|
||||
|
||||
Each Azure OpenAI deployment can have its own API key for:
|
||||
- **Security**: Limit blast radius if a key is compromised
|
||||
- **Quota Management**: Separate rate limits per service
|
||||
- **Cost Tracking**: Monitor usage per deployment
|
||||
- **Access Control**: Different team members can have access to different services
|
||||
|
||||
---
|
||||
|
||||
## AI Service Implementation
|
||||
|
||||
### ✅ Key Features
|
||||
|
||||
**1. Multi-Provider Support**
|
||||
- Primary: Azure OpenAI (GPT-5)
|
||||
- Fallback: OpenAI (GPT-4o-mini)
|
||||
- Automatic failover if Azure unavailable
|
||||
|
||||
**2. GPT-5 Specific Features**
|
||||
- ✅ Reasoning tokens tracking
|
||||
- ✅ Configurable reasoning effort (minimal, low, medium, high)
|
||||
- ✅ Extended context (272K input + 128K output = 400K total)
|
||||
- ✅ Response metadata with token counts
|
||||
|
||||
**3. Response Format**
|
||||
```typescript
|
||||
interface ChatResponseDto {
|
||||
conversationId: string;
|
||||
message: string;
|
||||
timestamp: Date;
|
||||
metadata?: {
|
||||
model?: string; // 'gpt-5-mini' or 'gpt-4o-mini'
|
||||
provider?: 'openai' | 'azure';
|
||||
reasoningTokens?: number; // GPT-5 only
|
||||
totalTokens?: number;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
**4. Azure GPT-5 Request**
|
||||
```typescript
|
||||
const requestBody = {
|
||||
messages: azureMessages,
|
||||
temperature: 0.7,
|
||||
max_tokens: 1000,
|
||||
stream: false,
|
||||
reasoning_effort: 'medium', // GPT-5 specific
|
||||
};
|
||||
```
|
||||
|
||||
**5. Azure GPT-5 Response**
|
||||
```typescript
|
||||
{
|
||||
choices: [{
|
||||
message: { content: string },
|
||||
reasoning_tokens: number, // NEW in GPT-5
|
||||
}],
|
||||
usage: {
|
||||
prompt_tokens: number,
|
||||
completion_tokens: number,
|
||||
reasoning_tokens: number, // NEW in GPT-5
|
||||
total_tokens: number,
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## GPT-5 vs GPT-4 Differences
|
||||
|
||||
### Reasoning Tokens
|
||||
|
||||
**GPT-5 introduces `reasoning_tokens`**:
|
||||
- Hidden tokens used for internal reasoning
|
||||
- Not part of message content
|
||||
- Configurable via `reasoning_effort` parameter
|
||||
- Higher effort = more reasoning tokens = better quality
|
||||
|
||||
**Reasoning Effort Levels**:
|
||||
```typescript
|
||||
'minimal' // Fastest, lowest reasoning tokens
|
||||
'low' // Quick responses with basic reasoning
|
||||
'medium' // Balanced (default)
|
||||
'high' // Most thorough, highest reasoning tokens
|
||||
```
|
||||
|
||||
### Context Length
|
||||
|
||||
**GPT-5**:
|
||||
- Input: 272,000 tokens (vs GPT-4's 128K)
|
||||
- Output: 128,000 tokens
|
||||
- Total context: 400,000 tokens
|
||||
|
||||
**GPT-4o**:
|
||||
- Input: 128,000 tokens
|
||||
- Total context: 128,000 tokens
|
||||
|
||||
### Token Efficiency
|
||||
|
||||
**GPT-5 Benefits**:
|
||||
- 22% fewer output tokens vs o3
|
||||
- 45% fewer tool calls
|
||||
- Better performance per dollar despite reasoning overhead
|
||||
|
||||
### Pricing
|
||||
|
||||
**Azure OpenAI GPT-5**:
|
||||
- Input: $1.25 / 1M tokens
|
||||
- Output: $10.00 / 1M tokens
|
||||
- Cached input: $0.125 / 1M (90% discount for repeated prompts)
|
||||
|
||||
---
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Service Initialization
|
||||
|
||||
The AI service now:
|
||||
1. Checks `AI_PROVIDER` environment variable
|
||||
2. Configures Azure OpenAI if provider is 'azure'
|
||||
3. Falls back to OpenAI if Azure not configured
|
||||
4. Logs which provider is active
|
||||
|
||||
```typescript
|
||||
constructor() {
|
||||
this.aiProvider = this.configService.get('AI_PROVIDER', 'openai');
|
||||
|
||||
if (this.aiProvider === 'azure') {
|
||||
// Load Azure configuration from environment
|
||||
this.azureChatEndpoint = this.configService.get('AZURE_OPENAI_CHAT_ENDPOINT');
|
||||
this.azureChatDeployment = this.configService.get('AZURE_OPENAI_CHAT_DEPLOYMENT');
|
||||
// ... more configuration
|
||||
} else {
|
||||
// Load OpenAI configuration
|
||||
this.chatModel = new ChatOpenAI({ ... });
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Chat Method Flow
|
||||
|
||||
```typescript
|
||||
async chat(userId, chatDto) {
|
||||
// 1. Validate configuration
|
||||
// 2. Get/create conversation
|
||||
// 3. Build context with user data
|
||||
// 4. Generate response based on provider:
|
||||
|
||||
if (this.aiProvider === 'azure') {
|
||||
const response = await this.generateWithAzure(messages);
|
||||
// Returns: { content, reasoningTokens, totalTokens }
|
||||
} else {
|
||||
const response = await this.generateWithOpenAI(messages);
|
||||
// Returns: content string
|
||||
}
|
||||
|
||||
// 5. Save conversation with token tracking
|
||||
// 6. Return response with metadata
|
||||
}
|
||||
```
|
||||
|
||||
### Azure Generation Method
|
||||
|
||||
```typescript
|
||||
private async generateWithAzure(messages) {
|
||||
const url = `${endpoint}/openai/deployments/${deployment}/chat/completions?api-version=${apiVersion}`;
|
||||
|
||||
const requestBody = {
|
||||
messages: azureMessages,
|
||||
temperature: 0.7,
|
||||
max_tokens: 1000,
|
||||
reasoning_effort: 'medium', // GPT-5 parameter
|
||||
};
|
||||
|
||||
const response = await axios.post(url, requestBody, {
|
||||
headers: {
|
||||
'api-key': this.azureApiKey,
|
||||
'Content-Type': 'application/json',
|
||||
},
|
||||
});
|
||||
|
||||
return {
|
||||
content: response.data.choices[0].message.content,
|
||||
reasoningTokens: response.data.usage.reasoning_tokens,
|
||||
totalTokens: response.data.usage.total_tokens,
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### Automatic Fallback
|
||||
|
||||
If Azure fails, the service automatically retries with OpenAI:
|
||||
|
||||
```typescript
|
||||
catch (error) {
|
||||
// Fallback to OpenAI if Azure fails
|
||||
if (this.aiProvider === 'azure' && this.chatModel) {
|
||||
this.logger.warn('Azure OpenAI failed, attempting OpenAI fallback...');
|
||||
this.aiProvider = 'openai';
|
||||
return this.chat(userId, chatDto); // Recursive call with OpenAI
|
||||
}
|
||||
throw new BadRequestException('Failed to generate AI response');
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing the Integration
|
||||
|
||||
### 1. Check Provider Status
|
||||
|
||||
```bash
|
||||
GET /api/v1/ai/provider-status
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"provider": "azure",
|
||||
"model": "gpt-5-mini",
|
||||
"configured": true,
|
||||
"endpoint": "https://footprints-open-ai.openai.azure.com"
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Test Chat with GPT-5
|
||||
|
||||
```bash
|
||||
POST /api/v1/ai/chat
|
||||
Authorization: Bearer {token}
|
||||
|
||||
{
|
||||
"message": "How much should a 3-month-old eat per feeding?"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"conversationId": "conv_123",
|
||||
"message": "A 3-month-old typically eats...",
|
||||
"timestamp": "2025-01-15T10:30:00Z",
|
||||
"metadata": {
|
||||
"model": "gpt-5-mini",
|
||||
"provider": "azure",
|
||||
"reasoningTokens": 145,
|
||||
"totalTokens": 523
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Monitor Reasoning Tokens
|
||||
|
||||
Check logs for GPT-5 reasoning token usage:
|
||||
|
||||
```
|
||||
[AIService] Azure OpenAI response: {
|
||||
model: 'gpt-5-mini',
|
||||
finish_reason: 'stop',
|
||||
prompt_tokens: 256,
|
||||
completion_tokens: 122,
|
||||
reasoning_tokens: 145, // GPT-5 reasoning overhead
|
||||
total_tokens: 523
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Optimizing Reasoning Effort
|
||||
|
||||
### When to Use Each Level
|
||||
|
||||
**Minimal** (`reasoning_effort: 'minimal'`):
|
||||
- Simple queries
|
||||
- Quick responses needed
|
||||
- Cost optimization
|
||||
- Use case: "What time is it?"
|
||||
|
||||
**Low** (`reasoning_effort: 'low'`):
|
||||
- Straightforward questions
|
||||
- Fast turnaround required
|
||||
- Use case: "How many oz in 120ml?"
|
||||
|
||||
**Medium** (`reasoning_effort: 'medium'`) - **Default**:
|
||||
- Balanced performance
|
||||
- Most common use cases
|
||||
- Use case: "Is my baby's sleep pattern normal?"
|
||||
|
||||
**High** (`reasoning_effort: 'high'`):
|
||||
- Complex reasoning required
|
||||
- Premium features
|
||||
- Use case: "Analyze my baby's feeding patterns over the last month and suggest optimizations"
|
||||
|
||||
### Dynamic Reasoning Effort
|
||||
|
||||
You can adjust based on query complexity:
|
||||
|
||||
```typescript
|
||||
// Future enhancement: Analyze query complexity
|
||||
const effort = this.determineReasoningEffort(chatDto.message);
|
||||
|
||||
const requestBody = {
|
||||
messages: azureMessages,
|
||||
reasoning_effort: effort, // Dynamic based on query
|
||||
};
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### 1. Voice Service (Whisper)
|
||||
|
||||
Implement similar pattern for voice transcription:
|
||||
|
||||
```typescript
|
||||
export class WhisperService {
|
||||
async transcribeAudio(audioBuffer: Buffer): Promise<string> {
|
||||
if (this.aiProvider === 'azure') {
|
||||
return this.transcribeWithAzure(audioBuffer);
|
||||
}
|
||||
return this.transcribeWithOpenAI(audioBuffer);
|
||||
}
|
||||
|
||||
private async transcribeWithAzure(audioBuffer: Buffer) {
|
||||
const url = `${this.azureWhisperEndpoint}/openai/deployments/${this.azureWhisperDeployment}/audio/transcriptions?api-version=${this.azureWhisperApiVersion}`;
|
||||
|
||||
const formData = new FormData();
|
||||
formData.append('file', new Blob([audioBuffer]), 'audio.wav');
|
||||
|
||||
const response = await axios.post(url, formData, {
|
||||
headers: {
|
||||
'api-key': this.azureWhisperApiKey, // Separate key for Whisper
|
||||
},
|
||||
});
|
||||
|
||||
return response.data.text;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Embeddings Service
|
||||
|
||||
For pattern recognition and similarity search:
|
||||
|
||||
```typescript
|
||||
export class EmbeddingsService {
|
||||
async createEmbedding(text: string): Promise<number[]> {
|
||||
if (this.aiProvider === 'azure') {
|
||||
return this.createEmbeddingWithAzure(text);
|
||||
}
|
||||
return this.createEmbeddingWithOpenAI(text);
|
||||
}
|
||||
|
||||
private async createEmbeddingWithAzure(text: string) {
|
||||
const url = `${this.azureEmbeddingsEndpoint}/openai/deployments/${this.azureEmbeddingsDeployment}/embeddings?api-version=${this.azureEmbeddingsApiVersion}`;
|
||||
|
||||
const response = await axios.post(url, { input: text }, {
|
||||
headers: {
|
||||
'api-key': this.azureEmbeddingsApiKey, // Separate key for Embeddings
|
||||
},
|
||||
});
|
||||
|
||||
return response.data.data[0].embedding;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Prompt Caching
|
||||
|
||||
Leverage Azure's cached input pricing (90% discount):
|
||||
|
||||
```typescript
|
||||
// Reuse identical system prompts for cost savings
|
||||
const systemPrompt = `You are a helpful parenting assistant...`; // Cache this
|
||||
```
|
||||
|
||||
### 4. Streaming Responses
|
||||
|
||||
For better UX with long responses:
|
||||
|
||||
```typescript
|
||||
const requestBody = {
|
||||
messages: azureMessages,
|
||||
stream: true, // Enable streaming
|
||||
reasoning_effort: 'medium',
|
||||
};
|
||||
|
||||
// Handle streamed response
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**1. "AI service not configured"**
|
||||
- Check `AI_PROVIDER` is set to 'azure'
|
||||
- Verify `AZURE_OPENAI_CHAT_API_KEY` is set (not the old `AZURE_OPENAI_API_KEY`)
|
||||
- Confirm `AZURE_OPENAI_CHAT_ENDPOINT` is correct
|
||||
|
||||
**2. "Invalid API version"**
|
||||
- GPT-5 requires `2025-04-01-preview` or later
|
||||
- Update `AZURE_OPENAI_CHAT_API_VERSION`
|
||||
|
||||
**3. "Deployment not found"**
|
||||
- Verify `AZURE_OPENAI_CHAT_DEPLOYMENT` matches Azure deployment name
|
||||
- Check deployment is in same region as endpoint
|
||||
|
||||
**4. High token usage**
|
||||
- GPT-5 reasoning tokens are additional overhead
|
||||
- Reduce `reasoning_effort` if cost is concern
|
||||
- Use `'minimal'` for simple queries
|
||||
|
||||
**5. Slow responses**
|
||||
- Higher `reasoning_effort` = slower responses
|
||||
- Use `'low'` or `'minimal'` for time-sensitive queries
|
||||
- Consider caching common responses
|
||||
|
||||
### Debug Logging
|
||||
|
||||
Enable debug logs to see requests/responses:
|
||||
|
||||
```typescript
|
||||
this.logger.debug('Azure OpenAI request:', {
|
||||
url,
|
||||
deployment,
|
||||
reasoning_effort,
|
||||
messageCount,
|
||||
});
|
||||
|
||||
this.logger.debug('Azure OpenAI response:', {
|
||||
model,
|
||||
finish_reason,
|
||||
prompt_tokens,
|
||||
completion_tokens,
|
||||
reasoning_tokens,
|
||||
total_tokens,
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
✅ **Fully Configured**:
|
||||
- Environment variables for all Azure endpoints
|
||||
- Chat (GPT-5), Whisper, Embeddings separately configurable
|
||||
- No hardcoded values
|
||||
|
||||
✅ **GPT-5 Support**:
|
||||
- Reasoning tokens tracked and returned
|
||||
- Configurable reasoning effort (minimal/low/medium/high)
|
||||
- Extended 400K context window ready
|
||||
|
||||
✅ **Automatic Fallback**:
|
||||
- Azure → OpenAI if Azure fails
|
||||
- Graceful degradation
|
||||
|
||||
✅ **Monitoring**:
|
||||
- Detailed logging for debugging
|
||||
- Token usage tracking (including reasoning tokens)
|
||||
- Provider status endpoint
|
||||
|
||||
✅ **Production Ready**:
|
||||
- Proper error handling
|
||||
- Timeout configuration (30s)
|
||||
- Metadata in responses
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Add your actual API keys** to `.env`:
|
||||
```bash
|
||||
AZURE_OPENAI_CHAT_API_KEY=[your_chat_key]
|
||||
AZURE_OPENAI_WHISPER_API_KEY=[your_whisper_key]
|
||||
AZURE_OPENAI_EMBEDDINGS_API_KEY=[your_embeddings_key]
|
||||
```
|
||||
|
||||
2. **Restart the backend** to pick up configuration:
|
||||
```bash
|
||||
npm run start:dev
|
||||
```
|
||||
|
||||
3. **Test the integration**:
|
||||
- Check provider status endpoint
|
||||
- Send a test chat message
|
||||
- Verify reasoning tokens in response
|
||||
|
||||
4. **Monitor token usage**:
|
||||
- Review logs for reasoning token counts
|
||||
- Adjust `reasoning_effort` based on usage patterns
|
||||
- Consider cost optimization strategies
|
||||
|
||||
5. **Implement Voice & Embeddings** (optional):
|
||||
- Follow similar patterns as chat service
|
||||
- Use separate Azure endpoints already configured
|
||||
Reference in New Issue
Block a user