feat(ai-safety): Implement comprehensive AI Safety features
- Create AISafetyService with keyword detection for emergency, medical, crisis, developmental, and stress triggers - Add emergency response templates (911, poison control, medical disclaimer) - Add crisis hotline integration (988, Postpartum Support, Crisis Text Line, Childhelp) - Add medical disclaimer and developmental disclaimer templates - Add stress support resources for overwhelmed parents - Implement output safety checking for unsafe patterns (dosages, diagnoses) - Add safety response injection based on trigger type - Integrate safety checks into AI chat flow with immediate overrides for emergencies/crises - Add base safety prompt with critical safety rules and guardrails - Add medical and crisis safety override prompts - Enhance system prompt with safety guardrails dynamically based on query triggers - Export AISafetyService from AIModule for use in other modules - All safety metrics logged for monitoring dashboard (TODO: database storage) Safety coverage: ✅ Emergency keyword detection (not breathing, choking, seizure, etc.) ✅ Medical concern keywords (fever, vomiting, rash, medication, etc.) ✅ Crisis keywords (suicide, self-harm, PPD, abuse, etc.) ✅ Parental stress keywords (overwhelmed, burned out, isolated, etc.) ✅ Developmental concern keywords (delay, autism, ADHD, regression, etc.) ✅ Output moderation patterns (dosages, diagnoses, definitive medical statements) ✅ Crisis hotline templates with 4 major US resources ✅ Medical disclaimers with red flags and when to seek care ✅ Stress support with self-care reminders Tested: Backend compiles and runs successfully with 0 errors
This commit is contained in:
517
AI_SAFETY_STRATEGY.md
Normal file
517
AI_SAFETY_STRATEGY.md
Normal file
@@ -0,0 +1,517 @@
|
||||
# AI Safety Strategy - Maternal App
|
||||
|
||||
**Purpose:** Ensure safe, responsible, and helpful AI interactions for parents seeking childcare guidance.
|
||||
|
||||
**Last Updated:** October 2, 2025
|
||||
|
||||
---
|
||||
|
||||
## 1. Safety Principles
|
||||
|
||||
### 1.1 Core Safety Values
|
||||
1. **Medical Disclaimer First** - Never provide medical diagnoses or emergency advice
|
||||
2. **Crisis Detection** - Recognize mental health crises and provide resources
|
||||
3. **Age-Appropriate** - Responses suitable for child ages 0-6 years
|
||||
4. **Evidence-Based** - Reference pediatric guidelines when possible
|
||||
5. **Non-Judgmental** - Support all parenting styles without criticism
|
||||
6. **Privacy-Focused** - Never request or store unnecessary medical information
|
||||
|
||||
### 1.2 What AI SHOULD Do
|
||||
- ✅ Provide general parenting information and tips
|
||||
- ✅ Suggest routines and organizational strategies
|
||||
- ✅ Explain developmental milestones
|
||||
- ✅ Offer emotional support and encouragement
|
||||
- ✅ Direct to professional resources when appropriate
|
||||
- ✅ Help track and interpret patterns in child's data
|
||||
|
||||
### 1.3 What AI MUST NOT Do
|
||||
- ❌ Diagnose medical conditions
|
||||
- ❌ Prescribe medications or dosages
|
||||
- ❌ Handle medical emergencies
|
||||
- ❌ Replace professional medical advice
|
||||
- ❌ Make definitive statements about child development delays
|
||||
- ❌ Encourage unsafe practices
|
||||
|
||||
---
|
||||
|
||||
## 2. Medical Disclaimer Triggers
|
||||
|
||||
### 2.1 Emergency Keywords (Immediate Disclaimer)
|
||||
Trigger immediate medical disclaimer and emergency guidance:
|
||||
|
||||
**Critical Symptoms:**
|
||||
- `emergency`, `911`, `ambulance`
|
||||
- `not breathing`, `can't breathe`, `choking`
|
||||
- `unconscious`, `unresponsive`, `passed out`
|
||||
- `seizure`, `convulsion`, `shaking uncontrollably`
|
||||
- `severe bleeding`, `blood loss`, `won't stop bleeding`
|
||||
- `severe burn`, `burned`, `scalded`
|
||||
- `poisoning`, `swallowed`, `ingested`
|
||||
- `head injury`, `fell`, `hit head`
|
||||
- `allergic reaction`, `anaphylaxis`, `swelling`
|
||||
|
||||
**Response Template:**
|
||||
```
|
||||
⚠️ EMERGENCY DISCLAIMER ⚠️
|
||||
|
||||
This appears to be a medical emergency. Please:
|
||||
|
||||
1. Call emergency services immediately (911 in US)
|
||||
2. If child is not breathing, start CPR if trained
|
||||
3. Stay calm and follow dispatcher instructions
|
||||
|
||||
I'm an AI assistant and cannot provide emergency medical guidance.
|
||||
Please seek immediate professional medical help.
|
||||
|
||||
Emergency Resources:
|
||||
- US: 911
|
||||
- Poison Control: 1-800-222-1222
|
||||
- Nurse Hotline: [Local number]
|
||||
```
|
||||
|
||||
### 2.2 Medical Concern Keywords (Soft Disclaimer)
|
||||
Trigger medical disclaimer but allow response with disclaimer:
|
||||
|
||||
**Symptoms:**
|
||||
- `fever`, `temperature`, `hot`, `feverish`
|
||||
- `vomiting`, `throwing up`, `vomit`
|
||||
- `diarrhea`, `loose stools`, `watery stool`
|
||||
- `rash`, `spots`, `bumps`, `hives`
|
||||
- `cough`, `coughing`, `wheezing`
|
||||
- `ear infection`, `ear pain`, `ear ache`
|
||||
- `cold`, `flu`, `sick`, `illness`
|
||||
- `constipation`, `not pooping`, `hard stool`
|
||||
- `dehydration`, `not drinking`, `dry`
|
||||
- `injury`, `hurt`, `pain`, `ache`
|
||||
- `medication`, `medicine`, `dosage`, `dose`
|
||||
|
||||
**Response Pattern:**
|
||||
```
|
||||
⚠️ Medical Disclaimer: I'm an AI assistant, not a medical professional.
|
||||
For medical concerns, always consult your pediatrician or healthcare provider.
|
||||
|
||||
[General information response, if appropriate]
|
||||
|
||||
When to seek immediate care:
|
||||
- [Specific red flags for the symptom]
|
||||
|
||||
Consider calling your pediatrician if:
|
||||
- [Warning signs]
|
||||
```
|
||||
|
||||
### 2.3 Developmental Concern Keywords
|
||||
- `delay`, `behind`, `not talking`, `not walking`
|
||||
- `autism`, `ADHD`, `disability`, `special needs`
|
||||
- `regression`, `lost skills`, `stopped doing`
|
||||
|
||||
**Response Pattern:**
|
||||
```
|
||||
⚠️ Developmental Disclaimer: Every child develops at their own pace.
|
||||
If you have concerns, consult your pediatrician or early intervention specialist.
|
||||
|
||||
[General milestone information]
|
||||
|
||||
Resources:
|
||||
- CDC Milestone Tracker: [link]
|
||||
- Early Intervention Services: [link]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Crisis Hotline Integration
|
||||
|
||||
### 3.1 Mental Health Crisis Detection
|
||||
|
||||
**Crisis Keywords:**
|
||||
- `suicide`, `kill myself`, `want to die`, `end it all`
|
||||
- `hurt myself`, `self-harm`, `cutting`
|
||||
- `can't cope`, `can't do this`, `giving up`
|
||||
- `hopeless`, `worthless`, `burden`
|
||||
- `postpartum depression`, `PPD`, `severe depression`
|
||||
- `psychosis`, `hallucinations`, `voices`
|
||||
- `abuse`, `hitting`, `hurting`, `violent`
|
||||
|
||||
**Crisis Response Template:**
|
||||
```
|
||||
🆘 CRISIS SUPPORT 🆘
|
||||
|
||||
It sounds like you're going through an incredibly difficult time.
|
||||
Your feelings are valid, and help is available.
|
||||
|
||||
IMMEDIATE CRISIS RESOURCES:
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
🇺🇸 National Suicide Prevention Lifeline
|
||||
📞 988 (call or text)
|
||||
💬 Chat: 988lifeline.org
|
||||
|
||||
🤱 Postpartum Support International
|
||||
📞 1-800-944-4773
|
||||
💬 Text "HELP" to 800-944-4773
|
||||
|
||||
🆘 Crisis Text Line
|
||||
💬 Text "HOME" to 741741
|
||||
|
||||
👶 Childhelp National Child Abuse Hotline
|
||||
📞 1-800-422-4453
|
||||
|
||||
These services are:
|
||||
✓ Free and confidential
|
||||
✓ Available 24/7
|
||||
✓ Staffed by trained counselors
|
||||
✓ No judgment, only support
|
||||
|
||||
You don't have to go through this alone. Please reach out.
|
||||
```
|
||||
|
||||
### 3.2 Parental Stress Detection
|
||||
|
||||
**Stress Keywords:**
|
||||
- `overwhelmed`, `stressed`, `exhausted`, `burned out`
|
||||
- `crying`, `breaking down`, `can't handle`
|
||||
- `no support`, `alone`, `isolated`
|
||||
- `angry at baby`, `resentful`, `regret`
|
||||
|
||||
**Support Response:**
|
||||
```
|
||||
💙 Parenting is Hard 💙
|
||||
|
||||
You're not alone in feeling this way. Many parents experience:
|
||||
- Exhaustion and burnout
|
||||
- Overwhelming emotions
|
||||
- Moments of frustration
|
||||
|
||||
This doesn't make you a bad parent. It makes you human.
|
||||
|
||||
Support Resources:
|
||||
🤱 Postpartum Support International: 1-800-944-4773
|
||||
📞 Parents Anonymous: 1-855-427-2736
|
||||
💬 Crisis Text Line: Text "PARENT" to 741741
|
||||
|
||||
Self-Care Reminders:
|
||||
✓ Taking breaks is necessary, not selfish
|
||||
✓ Asking for help is a sign of strength
|
||||
✓ Your mental health matters too
|
||||
|
||||
[Followed by relevant coping strategies if appropriate]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Content Moderation & Filtering
|
||||
|
||||
### 4.1 Input Moderation
|
||||
|
||||
**Filter Out:**
|
||||
- Inappropriate sexual content
|
||||
- Violent or abusive language
|
||||
- Requests for illegal activities
|
||||
- Spam or commercial solicitation
|
||||
- Personal information requests (addresses, SSN, etc.)
|
||||
|
||||
**Moderation Response:**
|
||||
```
|
||||
I'm here to provide parenting support and childcare guidance.
|
||||
This type of content falls outside my intended use.
|
||||
|
||||
If you have questions about childcare for children ages 0-6,
|
||||
I'm happy to help!
|
||||
```
|
||||
|
||||
### 4.2 Output Moderation
|
||||
|
||||
**Before Sending AI Response, Check For:**
|
||||
- Medical advice (trigger disclaimer if present)
|
||||
- Potentially harmful suggestions
|
||||
- Contradictory information
|
||||
- Inappropriate content
|
||||
- Dosage recommendations (remove/disclaim)
|
||||
|
||||
**Content Safety Checks:**
|
||||
```typescript
|
||||
const unsafePatterns = [
|
||||
/\d+\s*(mg|ml|oz|tbsp|tsp)\s*(of|every|per)/i, // Dosage patterns
|
||||
/give\s+(him|her|them|baby|child)\s+\d+/i, // Specific instructions with amounts
|
||||
/diagnose|diagnosis|you have|they have/i, // Diagnostic language
|
||||
/definitely|certainly\s+(is|has)/i, // Definitive medical statements
|
||||
];
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Rate Limiting & Abuse Prevention
|
||||
|
||||
### 5.1 Current Rate Limits
|
||||
- **Free Tier:** 10 AI queries per day
|
||||
- **Premium Tier:** Unlimited queries (with fair use policy)
|
||||
|
||||
### 5.2 Enhanced Rate Limiting
|
||||
|
||||
**Suspicious Pattern Detection:**
|
||||
- Same question repeated >3 times in 1 hour
|
||||
- Medical emergency keywords repeated >5 times in 1 day
|
||||
- Queries from multiple IPs for same user (account sharing)
|
||||
- Unusual query volume (>100/day even for premium)
|
||||
|
||||
**Action on Suspicious Activity:**
|
||||
1. Temporary rate limit (1 query per hour for 24 hours)
|
||||
2. Flag account for manual review
|
||||
3. Send email notification to user
|
||||
4. Log to security monitoring
|
||||
|
||||
### 5.3 Fair Use Policy (Premium)
|
||||
- Maximum 200 queries per day (reasonable for active use)
|
||||
- Queries should be related to childcare (0-6 years)
|
||||
- No commercial use or data scraping
|
||||
- No sharing of account credentials
|
||||
|
||||
---
|
||||
|
||||
## 6. System Prompt Guardrails
|
||||
|
||||
### 6.1 Base System Prompt (Always Included)
|
||||
|
||||
```
|
||||
You are a helpful AI assistant for the Maternal App, designed to support
|
||||
parents of children aged 0-6 years with childcare organization and guidance.
|
||||
|
||||
CRITICAL SAFETY RULES:
|
||||
1. You are NOT a medical professional. Never diagnose or prescribe.
|
||||
2. For medical emergencies, ALWAYS direct to call 911 immediately.
|
||||
3. For medical concerns, ALWAYS recommend consulting a pediatrician.
|
||||
4. Recognize mental health crises and provide crisis hotline resources.
|
||||
5. Be supportive and non-judgmental of all parenting approaches.
|
||||
6. Focus on evidence-based information from reputable sources (AAP, CDC, WHO).
|
||||
7. Never provide specific medication dosages.
|
||||
8. If asked about serious developmental delays, refer to professionals.
|
||||
|
||||
TONE:
|
||||
- Warm, empathetic, and encouraging
|
||||
- Clear and concise
|
||||
- Non-judgmental and inclusive
|
||||
- Supportive but honest
|
||||
|
||||
SCOPE:
|
||||
- Childcare for ages 0-6 years
|
||||
- Routines, tracking, organization
|
||||
- General parenting tips and strategies
|
||||
- Emotional support for parents
|
||||
- Interpretation of tracked activity patterns
|
||||
|
||||
OUT OF SCOPE:
|
||||
- Medical diagnosis or treatment
|
||||
- Legal advice
|
||||
- Financial planning
|
||||
- Relationship counseling (beyond parenting partnership)
|
||||
- Children outside 0-6 age range
|
||||
```
|
||||
|
||||
### 6.2 Context-Specific Safety Additions
|
||||
|
||||
**When Medical Keywords Detected:**
|
||||
```
|
||||
MEDICAL SAFETY OVERRIDE:
|
||||
This query contains medical concerns. Your response MUST:
|
||||
1. Start with a medical disclaimer
|
||||
2. Avoid definitive statements
|
||||
3. Recommend professional consultation
|
||||
4. Provide general information only
|
||||
5. Include "when to seek immediate care" guidance
|
||||
```
|
||||
|
||||
**When Crisis Keywords Detected:**
|
||||
```
|
||||
CRISIS RESPONSE OVERRIDE:
|
||||
This query indicates a potential crisis. Your response MUST:
|
||||
1. Acknowledge their feelings without judgment
|
||||
2. Provide immediate crisis hotline numbers
|
||||
3. Encourage reaching out for professional help
|
||||
4. Be brief and focus on resources
|
||||
5. Do NOT provide coping strategies that could be misinterpreted
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Monitoring & Analytics
|
||||
|
||||
### 7.1 Safety Metrics to Track
|
||||
|
||||
**Daily Monitoring:**
|
||||
- Number of medical disclaimer triggers
|
||||
- Number of crisis hotline responses sent
|
||||
- Number of emergency keyword detections
|
||||
- Number of queries blocked by content filter
|
||||
- Average response moderation score
|
||||
|
||||
**Weekly Review:**
|
||||
- Trending medical concerns
|
||||
- Common crisis keywords
|
||||
- False positive rate for disclaimers
|
||||
- User feedback on safety responses
|
||||
|
||||
### 7.2 Alert Thresholds
|
||||
|
||||
**Immediate Alerts:**
|
||||
- >10 emergency keywords in 1 hour (potential abuse or real emergency pattern)
|
||||
- >5 crisis keywords from single user in 24 hours (high-risk user)
|
||||
- Content filter blocking >50 queries/day (attack or misconfiguration)
|
||||
|
||||
**Daily Review Alerts:**
|
||||
- >100 medical disclaimers triggered in 24 hours
|
||||
- >20 crisis responses in 24 hours
|
||||
- Pattern of similar medical queries (potential misinformation spread)
|
||||
|
||||
---
|
||||
|
||||
## 8. User Education
|
||||
|
||||
### 8.1 First-Time User Guidance
|
||||
|
||||
**On First AI Query, Show:**
|
||||
```
|
||||
👋 Welcome to Your AI Parenting Assistant!
|
||||
|
||||
I'm here to help with:
|
||||
✓ Childcare tips and organization
|
||||
✓ Understanding your baby's patterns
|
||||
✓ Answering general parenting questions
|
||||
✓ Providing encouragement and support
|
||||
|
||||
Important to Know:
|
||||
⚠️ I'm NOT a medical professional
|
||||
⚠️ Always consult your pediatrician for medical concerns
|
||||
⚠️ Call 911 for emergencies
|
||||
|
||||
For the best experience:
|
||||
💡 Ask specific questions about routines, tracking, or parenting tips
|
||||
💡 Share your child's age for age-appropriate guidance
|
||||
💡 Use your activity tracking data for personalized insights
|
||||
|
||||
Let's get started! How can I help you today?
|
||||
```
|
||||
|
||||
### 8.2 In-App Safety Reminders
|
||||
|
||||
**Random Safety Tips (1 in 20 responses):**
|
||||
- "Remember: I'm an AI assistant, not a doctor. Always consult your pediatrician for medical advice."
|
||||
- "Having a tough day? It's okay to ask for help. Check out our Resources section."
|
||||
- "Emergency? Call 911 immediately. I cannot provide emergency medical guidance."
|
||||
|
||||
---
|
||||
|
||||
## 9. Implementation Checklist
|
||||
|
||||
### 9.1 Backend Implementation
|
||||
- [ ] Create `ai-safety.service.ts` with keyword detection
|
||||
- [ ] Implement medical disclaimer injection
|
||||
- [ ] Add crisis hotline response generator
|
||||
- [ ] Enhance content moderation filters
|
||||
- [ ] Update system prompts with safety guardrails
|
||||
- [ ] Add safety metrics logging
|
||||
- [ ] Create safety monitoring dashboard
|
||||
|
||||
### 9.2 Frontend Implementation
|
||||
- [ ] Add safety disclaimer modal on first AI use
|
||||
- [ ] Display crisis hotlines prominently when triggered
|
||||
- [ ] Show medical disclaimer badges
|
||||
- [ ] Add "Report Unsafe Response" button
|
||||
- [ ] Create AI Safety FAQ page
|
||||
|
||||
### 9.3 Testing
|
||||
- [ ] Unit tests for keyword detection (100+ test cases)
|
||||
- [ ] Integration tests for crisis response flow
|
||||
- [ ] Manual testing of all disclaimer triggers
|
||||
- [ ] User acceptance testing with parents
|
||||
- [ ] Safety audit by medical professional (recommended)
|
||||
|
||||
### 9.4 Documentation
|
||||
- [ ] User-facing AI Safety policy
|
||||
- [ ] Developer documentation for safety service
|
||||
- [ ] Incident response playbook
|
||||
- [ ] Safety metric reporting template
|
||||
|
||||
---
|
||||
|
||||
## 10. Incident Response
|
||||
|
||||
### 10.1 If Harmful Response is Reported
|
||||
|
||||
**Immediate Actions (Within 1 hour):**
|
||||
1. Flag conversation in database
|
||||
2. Review full conversation history
|
||||
3. Identify failure point (keyword detection, content filter, prompt)
|
||||
4. Temporarily disable AI for affected user if serious
|
||||
5. Notify safety team
|
||||
|
||||
**Follow-up Actions (Within 24 hours):**
|
||||
1. Root cause analysis
|
||||
2. Update keyword lists or filters
|
||||
3. Test fix with similar queries
|
||||
4. Document incident and resolution
|
||||
5. Update user with resolution
|
||||
|
||||
**Communication:**
|
||||
- Acknowledge report within 1 hour
|
||||
- Provide resolution timeline
|
||||
- Follow up when fixed
|
||||
- Offer support resources if appropriate
|
||||
|
||||
### 10.2 Emergency Escalation
|
||||
|
||||
**Escalate to Management If:**
|
||||
- User reports following harmful AI advice and experiencing negative outcome
|
||||
- Multiple similar harmful responses in short timeframe
|
||||
- Media or regulatory inquiry about AI safety
|
||||
- Suicidal ideation not properly handled by crisis detection
|
||||
|
||||
---
|
||||
|
||||
## 11. Regular Safety Audits
|
||||
|
||||
### 11.1 Monthly Audits
|
||||
- Review 100 random AI conversations
|
||||
- Test all crisis and medical keyword triggers
|
||||
- Analyze false positive/negative rates
|
||||
- Update keyword lists based on new patterns
|
||||
- Review user reports and feedback
|
||||
|
||||
### 11.2 Quarterly Reviews
|
||||
- External safety audit (if possible)
|
||||
- Update crisis hotline numbers (verify still active)
|
||||
- Review and update safety policies
|
||||
- Train team on new safety features
|
||||
- Benchmark against industry standards
|
||||
|
||||
---
|
||||
|
||||
## 12. Compliance & Legal
|
||||
|
||||
### 12.1 Disclaimers (Required)
|
||||
- [ ] Terms of Service mention AI limitations
|
||||
- [ ] Privacy Policy covers AI data usage
|
||||
- [ ] Medical disclaimer in AI interface
|
||||
- [ ] Crisis resources easily accessible
|
||||
|
||||
### 12.2 Recommended Legal Review
|
||||
- AI response liability
|
||||
- Medical advice disclaimers sufficiency
|
||||
- Crisis response appropriateness
|
||||
- Age-appropriate content standards
|
||||
|
||||
---
|
||||
|
||||
## Status: READY FOR IMPLEMENTATION
|
||||
|
||||
**Priority:** HIGH - User Safety Critical
|
||||
**Estimated Implementation Time:** 3-5 days
|
||||
**Testing Time:** 2 days
|
||||
**Total:** 5-7 days with testing
|
||||
|
||||
**Next Steps:**
|
||||
1. Implement `AISafetyService` with keyword detection
|
||||
2. Update `AIService` to integrate safety checks
|
||||
3. Add crisis hotline database/configuration
|
||||
4. Create frontend safety components
|
||||
5. Comprehensive testing with real scenarios
|
||||
6. User documentation
|
||||
|
||||
This strategy ensures the Maternal App AI is helpful, supportive, and above all, SAFE for parents seeking guidance.
|
||||
Reference in New Issue
Block a user