maternal-app/maternal-web/lib/voice/README.md

# Voice Intent Classification

Accurate classification of voice commands for hands-free activity tracking.

## Overview

The voice intent classification system converts natural language voice commands into structured data for feeding, sleep, and diaper tracking. It uses pattern matching and entity extraction to understand user intent and extract relevant details.

## Supported Intents

### 1. Feeding
Track bottle feeding, breastfeeding, and solid food consumption.

**Subtypes:**
- `bottle` - Bottle feeding with formula or pumped milk
- `breast_left` - Breastfeeding from left side
- `breast_right` - Breastfeeding from right side
- `breast_both` - Breastfeeding from both sides
- `solid` - Solid food or meals

**Extractable Entities:**
- Amount (ml, oz, tbsp)
- Duration (minutes)
- Side (left, right, both)
- Time (absolute or relative)

**Examples:**
```
"Fed baby 120 ml"
"Gave him 4 ounces"
"Nursed on left breast for 15 minutes"
"Breastfed on both sides for 20 minutes"
"Baby ate solid food"
"Had breakfast"
```

### 2. Sleep
Track naps and nighttime sleep.

**Subtypes:**
- `nap` - Daytime nap
- `night` - Nighttime sleep

**Extractable Entities:**
- Duration (minutes)
- Type (nap or night)
- Time (start/end)

**Examples:**
```
"Baby fell asleep for a nap"
"Napped for 45 minutes"
"Put baby down for bedtime"
"Baby is sleeping through the night"
"Baby woke up"
```

### 3. Diaper
Track diaper changes.

**Subtypes:**
- `wet` - Wet diaper (urine)
- `dirty` - Dirty diaper (bowel movement)
- `both` - Both wet and dirty
- `dry` - Dry/clean diaper

**Extractable Entities:**
- Type (wet, dirty, both)
- Time (when changed)

**Examples:**
```
"Changed wet diaper"
"Dirty diaper change"
"Changed a wet and dirty diaper"
"Baby had a bowel movement"
"Diaper had both poop and pee"
```

## Usage

### Basic Classification

```typescript
import { classifyIntent } from '@/lib/voice/intentClassifier';

const result = classifyIntent("Fed baby 120 ml");

console.log(result.intent); // 'feeding'
console.log(result.confidence); // 0.9
console.log(result.structuredData);
// {
//   type: 'bottle',
//   amount: 120,
//   unit: 'ml'
// }
```

### With Validation

```typescript
import { classifyIntent, validateClassification } from '@/lib/voice/intentClassifier';

const result = classifyIntent(userInput);

if (validateClassification(result)) {
  // Confidence >= 0.3 and intent is known
  createActivity(result.structuredData);
} else {
  // Low confidence or unknown intent
  showError("Could not understand command");
}
```

### Confidence Levels

```typescript
import { getConfidenceLevel } from '@/lib/voice/intentClassifier';

const level = getConfidenceLevel(0.85); // 'high'
// 'high': >= 0.8
// 'medium': 0.5 - 0.79
// 'low': < 0.5
```

## API Endpoint

### POST /api/voice/transcribe

Transcribe audio or classify text input.

**Text Input:**
```bash
curl -X POST http://localhost:3030/api/voice/transcribe \
  -H "Content-Type: application/json" \
  -d '{"text": "Fed baby 120ml"}'
```

**Response:**
```json
{
  "success": true,
  "transcription": "Fed baby 120ml",
  "classification": {
    "intent": "feeding",
    "confidence": 0.9,
    "confidenceLevel": "high",
    "entities": [
      {
        "type": "amount",
        "value": 120,
        "confidence": 0.9,
        "text": "120 ml"
      }
    ],
    "structuredData": {
      "type": "bottle",
      "amount": 120,
      "unit": "ml"
    }
  }
}
```

### GET /api/voice/transcribe

Get supported commands and examples.

```bash
curl http://localhost:3030/api/voice/transcribe
```

## Pattern Matching

The classifier uses regex patterns to detect intents:

### Feeding Patterns
- Fed/feed/gave + amount + unit
- Bottle feeding keywords
- Breastfeeding keywords (nursed, nursing)
- Solid food keywords (ate, breakfast, lunch, dinner)

### Sleep Patterns
- Sleep/nap keywords
- Fell asleep / woke up
- Bedtime / night sleep

### Diaper Patterns
- Diaper/nappy keywords
- Changed diaper
- Wet/dirty/poop/pee keywords
- Bowel movement / BM

## Entity Extraction

### Amount Extraction
Recognizes:
- `120 ml`, `120ml`, `120 milliliters`
- `4 oz`, `4oz`, `4 ounces`
- `2 tbsp`, `2 tablespoons`

### Duration Extraction
Recognizes:
- `15 minutes`, `15 mins`, `15min`
- `for 20 minutes`
- `lasted 30 minutes`

### Time Extraction
Recognizes:
- Absolute: `at 3:30 pm`, `10 am`
- Relative: `30 minutes ago`, `2 hours ago`
- Contextual: `just now`, `a moment ago`

### Side Extraction (Breastfeeding)
Recognizes:
- `left breast`, `left side`, `left boob`
- `right breast`, `right side`
- `both breasts`, `both sides`

### Type Extraction (Diaper)
Recognizes:
- Wet: `wet`, `pee`, `urine`
- Dirty: `dirty`, `poop`, `poopy`, `soiled`, `bowel movement`, `bm`
- Combination: detects both keywords for mixed diapers

## Common Mishears & Corrections

The system handles common voice recognition errors:

| Heard | Meant | Handled |
|-------|-------|---------|
| "mils" | "ml" | ✅ Pattern includes "ml" variations |
| "ounce says" | "ounces" | ✅ Pattern matches "ounce" or "oz" |
| "left side" vs "left breast" | Same meaning | ✅ Both patterns recognized |
| "poopy" vs "poop" | Same meaning | ✅ Multiple variations supported |

## Confidence Scoring

Confidence is calculated based on:
- **Pattern matches**: More matches = higher confidence
- **Entity extraction**: Successfully extracted entities boost confidence
- **Ambiguity**: Conflicting signals reduce confidence

Minimum confidence threshold: **0.3** (30%)

## Testing

Run the test suite:

```bash
node scripts/test-voice-intent.mjs
```

**Test Coverage:**
- 25 test cases
- Feeding: 8 tests (bottle, breast, solid)
- Sleep: 6 tests (nap, night, duration)
- Diaper: 7 tests (wet, dirty, both)
- Edge cases: 4 tests

## Multi-Language Support

Currently supports English only. Planned languages:
- Spanish (es-ES)
- French (fr-FR)
- Portuguese (pt-BR)
- Chinese (zh-CN)

Each language will have localized patterns and keywords.

## Integration with Whisper API

For audio transcription, integrate OpenAI Whisper:

```typescript
import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function transcribeAudio(audioFile: File): Promise<string> {
  const transcription = await openai.audio.transcriptions.create({
    file: audioFile,
    model: 'whisper-1',
    language: 'en', // Optional: specify language
  });

  return transcription.text;
}
```

## Future Enhancements

- [ ] Audio transcription with Whisper API
- [ ] Multi-language support (5 languages)
- [ ] Context-aware classification (user history)
- [ ] Custom vocabulary (child names, brand names)
- [ ] Clarification prompts for ambiguous commands
- [ ] Machine learning-based classification
- [ ] Offline voice recognition fallback
- [ ] Voice feedback confirmation

## Troubleshooting

**Q: Classification returns 'unknown' for valid commands**
- Check if keywords are in supported patterns
- Verify minimum confidence threshold (0.3)
- Add variations to INTENT_PATTERNS

**Q: Entities not extracted correctly**
- Check regex patterns in ENTITY_PATTERNS
- Verify unit formatting (spaces, abbreviations)
- Test with simplified command first

**Q: Confidence too low despite correct intent**
- Multiple pattern matches boost confidence
- Add more specific patterns for common phrases
- Adjust confidence calculation algorithm

## Error Codes

- `VOICE_INVALID_INPUT` - Missing or invalid text input
- `VOICE_AUDIO_NOT_IMPLEMENTED` - Audio transcription not yet available
- `VOICE_INVALID_CONTENT_TYPE` - Wrong Content-Type header
- `VOICE_CLASSIFICATION_FAILED` - Could not classify intent
- `VOICE_TRANSCRIPTION_FAILED` - General transcription error