Commit Graph

7 Commits

Author SHA1 Message Date
c60467b6f9 Fix login data structure and improve voice input UX
Some checks failed
CI/CD Pipeline / Lint and Test (push) Has been cancelled
CI/CD Pipeline / E2E Tests (push) Has been cancelled
CI/CD Pipeline / Build Application (push) Has been cancelled
- Fix login endpoint to return families as array of objects instead of strings
- Update auth interface to match /auth/me endpoint structure
- Add silence detection to voice input (auto-stop after 1.5s)
- Add comprehensive status messages to voice modal (Listening, Understanding, Saving)
- Unify voice input flow to use MediaRecorder + backend for all platforms
- Add null checks to prevent tracking page crashes from invalid data
- Wait for auth completion before loading family data in HomePage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-02 10:25:13 +00:00
8a342fa85b Fix Web Speech API desktop voice recognition
Some checks failed
CI/CD Pipeline / Lint and Test (push) Has been cancelled
CI/CD Pipeline / E2E Tests (push) Has been cancelled
CI/CD Pipeline / Build Application (push) Has been cancelled
- Set continuous=true to keep listening through pauses
- Only process final results, ignore interim transcripts
- Add usesFallback check to route Web Speech API transcripts through classification
- Desktop now captures complete phrases before classification
- Add detailed logging for debugging recognition flow

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-02 07:25:16 +00:00
a44faf6ef4 Fix voice input for iOS Safari and prevent infinite loop
Some checks failed
CI/CD Pipeline / Lint and Test (push) Has been cancelled
CI/CD Pipeline / E2E Tests (push) Has been cancelled
CI/CD Pipeline / Build Application (push) Has been cancelled
- Remove temperature parameter from GPT-5-mini activity extraction (not supported)
- Add classification state to useVoiceInput hook to avoid duplicate API calls
- Prevent infinite loop in VoiceFloatingButton by tracking lastClassifiedTranscript
- Use classification from backend directly instead of making second request
- iOS Safari now successfully transcribes with Azure Whisper and classifies with GPT-5-mini

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-02 07:15:44 +00:00
26d3f8962f Improve iOS Safari voice input with better error handling and debugging
Some checks failed
CI/CD Pipeline / Lint and Test (push) Has been cancelled
CI/CD Pipeline / E2E Tests (push) Has been cancelled
CI/CD Pipeline / Build Application (push) Has been cancelled
- Force MediaRecorder fallback for all iOS Safari devices
- Add iOS device detection to avoid Web Speech API on iOS
- Support multiple audio formats (webm, mp4, default) for compatibility
- Add comprehensive error logging throughout the flow
- Improve error messages with specific guidance for each error type
- Add console logging to track microphone permissions and recording state
- Better handling of getUserMedia permissions

This should help diagnose and fix the "Failed to recognize speech" error
by ensuring iOS Safari uses the MediaRecorder path with proper permissions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-02 06:03:24 +00:00
330c776124 Add iOS Safari support for voice commands with MediaRecorder fallback
Some checks failed
CI/CD Pipeline / Lint and Test (push) Has been cancelled
CI/CD Pipeline / E2E Tests (push) Has been cancelled
CI/CD Pipeline / Build Application (push) Has been cancelled
Frontend changes:
- Add MediaRecorder fallback for iOS Safari (no Web Speech API support)
- Automatically detect browser capabilities and use appropriate method
- Add usesFallback flag to track which method is being used
- Update UI to show "Recording..." vs "Listening..." based on method
- Add iOS-specific indicator text
- Handle microphone permissions and errors properly

Backend changes:
- Update /api/v1/voice/transcribe to accept both audio files and text
- Support text-based classification (from Web Speech API)
- Support audio file transcription + classification (from MediaRecorder)
- Return unified response format with transcript and classification

How it works:
- Chrome/Edge: Uses Web Speech API for realtime transcription
- iOS Safari: Records audio with MediaRecorder, sends to server for transcription
- Fallback is transparent to the user with appropriate UI feedback

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-02 05:59:26 +00:00
63a333bba3 Add voice input UI components for hands-free tracking
Some checks failed
CI/CD Pipeline / Lint and Test (push) Has been cancelled
CI/CD Pipeline / E2E Tests (push) Has been cancelled
CI/CD Pipeline / Build Application (push) Has been cancelled
Implemented complete voice input user interface:

**Voice Recording Hook (useVoiceInput):**
- Browser Web Speech API integration
- Real-time speech recognition
- Continuous and interim results
- 10-second auto-timeout
- Error handling for permissions, network, audio issues
- Graceful fallback for unsupported browsers

**Voice Input Button Component:**
- Modal dialog with microphone button
- Animated pulsing microphone when recording
- Real-time transcript display
- Automatic intent classification on completion
- Structured data visualization
- Example commands for user guidance
- Success/error feedback with MUI Alerts
- Confidence level indicators

**Floating Action Button:**
- Always-visible FAB in bottom-right corner
- Quick access from any page
- Auto-navigation to appropriate tracking page
- Snackbar feedback messages
- Mobile-optimized positioning (thumb zone)

**Integration with Tracking Pages:**
- Voice button in feeding page header
- Auto-fills form fields from voice commands
- Seamless voice-to-form workflow
- Example: "Fed baby 120ml" → fills bottle type & amount

**Features:**
-  Browser speech recognition (Chrome, Edge, Safari)
-  Real-time transcription display
-  Automatic intent classification
-  Auto-fill tracking forms
-  Visual feedback (animations, colors)
-  Error handling & user guidance
-  Mobile-optimized design
-  Accessibility support

**User Flow:**
1. Click microphone button (floating or in-page)
2. Speak command: "Fed baby 120 ml"
3. See real-time transcript
4. Auto-classification shows intent & data
5. Click "Use Command"
6. Form auto-fills or activity created

**Browser Support:**
- Chrome 
- Edge 
- Safari 
- Firefox  (Web Speech API not supported)

**Files Created:**
- hooks/useVoiceInput.ts - Speech recognition hook
- components/voice/VoiceInputButton.tsx - Modal input component
- components/voice/VoiceFloatingButton.tsx - FAB for quick access
- app/layout.tsx - Added floating button globally
- app/track/feeding/page.tsx - Added voice button to header

Voice input is now accessible from anywhere in the app, providing
true hands-free tracking for parents.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 20:24:43 +00:00
f3ff07c0ef Add comprehensive .gitignore 2025-10-01 19:01:52 +00:00