Implemented comprehensive voice command understanding system: **Intent Classification:** - Feeding intent (bottle, breastfeeding, solid food) - Sleep intent (naps, nighttime sleep) - Diaper intent (wet, dirty, both, dry) - Unknown intent handling **Entity Extraction:** - Amounts with units (ml, oz, tbsp): "120 ml", "4 ounces" - Durations in minutes: "15 minutes", "for 20 mins" - Time expressions: "at 3:30 pm", "30 minutes ago", "just now" - Breast feeding side: "left", "right", "both" - Diaper types: "wet", "dirty", "both" - Sleep types: "nap", "night" **Structured Data Output:** - FeedingData: type, amount, unit, duration, side, timestamps - SleepData: type, duration, start/end times - DiaperData: type, timestamp - Ready for direct activity creation **Pattern Matching:** - 15+ feeding patterns (bottle, breast, solid) - 8+ sleep patterns (nap, sleep, woke up) - 8+ diaper patterns (wet, dirty, bowel movement) - Robust keyword detection with variations **Confidence Scoring:** - High: >= 0.8 (strong match) - Medium: 0.5-0.79 (probable match) - Low: < 0.5 (uncertain) - Minimum threshold: 0.3 for validation **API Endpoint:** - POST /api/voice/transcribe - Classify text or audio - GET /api/voice/transcribe - Get supported commands - JSON response with intent, confidence, entities, structured data - Audio transcription placeholder (Whisper integration ready) **Implementation Files:** - lib/voice/intentClassifier.ts - Core classification (600+ lines) - app/api/voice/transcribe/route.ts - API endpoint - scripts/test-voice-intent.mjs - Test suite (25 tests) - lib/voice/README.md - Complete documentation **Test Coverage:** 25 tests, 100% pass rate ✅ Bottle feeding (3 tests) ✅ Breastfeeding (3 tests) ✅ Solid food (2 tests) ✅ Sleep tracking (6 tests) ✅ Diaper changes (7 tests) ✅ Edge cases (4 tests) **Example Commands:** - "Fed baby 120 ml" → bottle, 120ml - "Nursed on left breast for 15 minutes" → breast_left, 15min - "Changed wet and dirty diaper" → both - "Napped for 45 minutes" → nap, 45min System converts natural language to structured tracking data with high accuracy for common parenting voice commands. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
7.5 KiB
Voice Intent Classification
Accurate classification of voice commands for hands-free activity tracking.
Overview
The voice intent classification system converts natural language voice commands into structured data for feeding, sleep, and diaper tracking. It uses pattern matching and entity extraction to understand user intent and extract relevant details.
Supported Intents
1. Feeding
Track bottle feeding, breastfeeding, and solid food consumption.
Subtypes:
bottle- Bottle feeding with formula or pumped milkbreast_left- Breastfeeding from left sidebreast_right- Breastfeeding from right sidebreast_both- Breastfeeding from both sidessolid- Solid food or meals
Extractable Entities:
- Amount (ml, oz, tbsp)
- Duration (minutes)
- Side (left, right, both)
- Time (absolute or relative)
Examples:
"Fed baby 120 ml"
"Gave him 4 ounces"
"Nursed on left breast for 15 minutes"
"Breastfed on both sides for 20 minutes"
"Baby ate solid food"
"Had breakfast"
2. Sleep
Track naps and nighttime sleep.
Subtypes:
nap- Daytime napnight- Nighttime sleep
Extractable Entities:
- Duration (minutes)
- Type (nap or night)
- Time (start/end)
Examples:
"Baby fell asleep for a nap"
"Napped for 45 minutes"
"Put baby down for bedtime"
"Baby is sleeping through the night"
"Baby woke up"
3. Diaper
Track diaper changes.
Subtypes:
wet- Wet diaper (urine)dirty- Dirty diaper (bowel movement)both- Both wet and dirtydry- Dry/clean diaper
Extractable Entities:
- Type (wet, dirty, both)
- Time (when changed)
Examples:
"Changed wet diaper"
"Dirty diaper change"
"Changed a wet and dirty diaper"
"Baby had a bowel movement"
"Diaper had both poop and pee"
Usage
Basic Classification
import { classifyIntent } from '@/lib/voice/intentClassifier';
const result = classifyIntent("Fed baby 120 ml");
console.log(result.intent); // 'feeding'
console.log(result.confidence); // 0.9
console.log(result.structuredData);
// {
// type: 'bottle',
// amount: 120,
// unit: 'ml'
// }
With Validation
import { classifyIntent, validateClassification } from '@/lib/voice/intentClassifier';
const result = classifyIntent(userInput);
if (validateClassification(result)) {
// Confidence >= 0.3 and intent is known
createActivity(result.structuredData);
} else {
// Low confidence or unknown intent
showError("Could not understand command");
}
Confidence Levels
import { getConfidenceLevel } from '@/lib/voice/intentClassifier';
const level = getConfidenceLevel(0.85); // 'high'
// 'high': >= 0.8
// 'medium': 0.5 - 0.79
// 'low': < 0.5
API Endpoint
POST /api/voice/transcribe
Transcribe audio or classify text input.
Text Input:
curl -X POST http://localhost:3030/api/voice/transcribe \
-H "Content-Type: application/json" \
-d '{"text": "Fed baby 120ml"}'
Response:
{
"success": true,
"transcription": "Fed baby 120ml",
"classification": {
"intent": "feeding",
"confidence": 0.9,
"confidenceLevel": "high",
"entities": [
{
"type": "amount",
"value": 120,
"confidence": 0.9,
"text": "120 ml"
}
],
"structuredData": {
"type": "bottle",
"amount": 120,
"unit": "ml"
}
}
}
GET /api/voice/transcribe
Get supported commands and examples.
curl http://localhost:3030/api/voice/transcribe
Pattern Matching
The classifier uses regex patterns to detect intents:
Feeding Patterns
- Fed/feed/gave + amount + unit
- Bottle feeding keywords
- Breastfeeding keywords (nursed, nursing)
- Solid food keywords (ate, breakfast, lunch, dinner)
Sleep Patterns
- Sleep/nap keywords
- Fell asleep / woke up
- Bedtime / night sleep
Diaper Patterns
- Diaper/nappy keywords
- Changed diaper
- Wet/dirty/poop/pee keywords
- Bowel movement / BM
Entity Extraction
Amount Extraction
Recognizes:
120 ml,120ml,120 milliliters4 oz,4oz,4 ounces2 tbsp,2 tablespoons
Duration Extraction
Recognizes:
15 minutes,15 mins,15minfor 20 minuteslasted 30 minutes
Time Extraction
Recognizes:
- Absolute:
at 3:30 pm,10 am - Relative:
30 minutes ago,2 hours ago - Contextual:
just now,a moment ago
Side Extraction (Breastfeeding)
Recognizes:
left breast,left side,left boobright breast,right sideboth breasts,both sides
Type Extraction (Diaper)
Recognizes:
- Wet:
wet,pee,urine - Dirty:
dirty,poop,poopy,soiled,bowel movement,bm - Combination: detects both keywords for mixed diapers
Common Mishears & Corrections
The system handles common voice recognition errors:
| Heard | Meant | Handled |
|---|---|---|
| "mils" | "ml" | ✅ Pattern includes "ml" variations |
| "ounce says" | "ounces" | ✅ Pattern matches "ounce" or "oz" |
| "left side" vs "left breast" | Same meaning | ✅ Both patterns recognized |
| "poopy" vs "poop" | Same meaning | ✅ Multiple variations supported |
Confidence Scoring
Confidence is calculated based on:
- Pattern matches: More matches = higher confidence
- Entity extraction: Successfully extracted entities boost confidence
- Ambiguity: Conflicting signals reduce confidence
Minimum confidence threshold: 0.3 (30%)
Testing
Run the test suite:
node scripts/test-voice-intent.mjs
Test Coverage:
- 25 test cases
- Feeding: 8 tests (bottle, breast, solid)
- Sleep: 6 tests (nap, night, duration)
- Diaper: 7 tests (wet, dirty, both)
- Edge cases: 4 tests
Multi-Language Support
Currently supports English only. Planned languages:
- Spanish (es-ES)
- French (fr-FR)
- Portuguese (pt-BR)
- Chinese (zh-CN)
Each language will have localized patterns and keywords.
Integration with Whisper API
For audio transcription, integrate OpenAI Whisper:
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
async function transcribeAudio(audioFile: File): Promise<string> {
const transcription = await openai.audio.transcriptions.create({
file: audioFile,
model: 'whisper-1',
language: 'en', // Optional: specify language
});
return transcription.text;
}
Future Enhancements
- Audio transcription with Whisper API
- Multi-language support (5 languages)
- Context-aware classification (user history)
- Custom vocabulary (child names, brand names)
- Clarification prompts for ambiguous commands
- Machine learning-based classification
- Offline voice recognition fallback
- Voice feedback confirmation
Troubleshooting
Q: Classification returns 'unknown' for valid commands
- Check if keywords are in supported patterns
- Verify minimum confidence threshold (0.3)
- Add variations to INTENT_PATTERNS
Q: Entities not extracted correctly
- Check regex patterns in ENTITY_PATTERNS
- Verify unit formatting (spaces, abbreviations)
- Test with simplified command first
Q: Confidence too low despite correct intent
- Multiple pattern matches boost confidence
- Add more specific patterns for common phrases
- Adjust confidence calculation algorithm
Error Codes
VOICE_INVALID_INPUT- Missing or invalid text inputVOICE_AUDIO_NOT_IMPLEMENTED- Audio transcription not yet availableVOICE_INVALID_CONTENT_TYPE- Wrong Content-Type headerVOICE_CLASSIFICATION_FAILED- Could not classify intentVOICE_TRANSCRIPTION_FAILED- General transcription error