Implement Azure OpenAI vector embeddings for Romanian Bible
- Add pgvector support with bible_passages table for vector search - Create Python ingestion script for Azure OpenAI embed-3 embeddings - Implement hybrid search combining vector similarity and full-text search - Update AI chat to use vector search with Azure OpenAI gpt-4o - Add floating chat component with Material UI design - Import complete Romanian Bible (FIDELA) with 30K+ verses - Add vector search library for semantic Bible search - Create multi-language implementation plan for future expansion 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -11,6 +11,12 @@ JWT_SECRET=development-jwt-secret-change-in-production
|
|||||||
AZURE_OPENAI_KEY=4DhkkXVdDOXZ7xX1eOLHTHQQnbCy0jFYdA6RPJtyAdOMtO16nZmFJQQJ99BCACYeBjFXJ3w3AAABACOGHgNC
|
AZURE_OPENAI_KEY=4DhkkXVdDOXZ7xX1eOLHTHQQnbCy0jFYdA6RPJtyAdOMtO16nZmFJQQJ99BCACYeBjFXJ3w3AAABACOGHgNC
|
||||||
AZURE_OPENAI_ENDPOINT=https://azureopenaiinstant.openai.azure.com
|
AZURE_OPENAI_ENDPOINT=https://azureopenaiinstant.openai.azure.com
|
||||||
AZURE_OPENAI_DEPLOYMENT=gpt-4o
|
AZURE_OPENAI_DEPLOYMENT=gpt-4o
|
||||||
|
AZURE_OPENAI_API_VERSION=2024-05-01-preview
|
||||||
|
AZURE_OPENAI_EMBED_DEPLOYMENT=embed-3
|
||||||
|
EMBED_DIMS=3072
|
||||||
|
BIBLE_MD_PATH=./bibles/Biblia-Fidela-limba-romana.md
|
||||||
|
LANG_CODE=ro
|
||||||
|
TRANSLATION_CODE=FIDELA
|
||||||
|
|
||||||
# API Bible
|
# API Bible
|
||||||
API_BIBLE_KEY=7b42606f8f809e155c9b0742c4f1849b
|
API_BIBLE_KEY=7b42606f8f809e155c9b0742c4f1849b
|
||||||
|
|||||||
@@ -1,5 +1,6 @@
|
|||||||
import { NextRequest, NextResponse } from 'next/server'
|
import { NextRequest, NextResponse } from 'next/server'
|
||||||
import { z } from 'zod'
|
import { z } from 'zod'
|
||||||
|
import { searchBibleHybrid, BibleVerse } from '@/lib/vector-search'
|
||||||
|
|
||||||
const chatRequestSchema = z.object({
|
const chatRequestSchema = z.object({
|
||||||
message: z.string().min(1),
|
message: z.string().min(1),
|
||||||
@@ -49,73 +50,81 @@ export async function POST(request: NextRequest) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
async function generateBiblicalResponse(message: string, history: any[]): Promise<string> {
|
async function generateBiblicalResponse(message: string, history: any[]): Promise<string> {
|
||||||
// Mock biblical responses for common questions
|
try {
|
||||||
const lowerMessage = message.toLowerCase()
|
// Search for relevant Bible verses using vector search
|
||||||
|
const relevantVerses = await searchBibleHybrid(message, 5)
|
||||||
|
|
||||||
if (lowerMessage.includes('dragoste') || lowerMessage.includes('iubire')) {
|
// Create context from relevant verses
|
||||||
return `Întrebarea ta despre dragoste este foarte frumoasă! Biblia ne învață că "Dumnezeu este dragoste" (1 Ioan 4:8). De asemenea, în 1 Corinteni 13:4-7 găsim descrierea perfectă a dragostei: "Dragostea este îndelung răbdătoare, dragostea este binevoitoare; dragostea nu pizmuiește; dragostea nu se fălește, nu se semeață, nu face nimic necuviincios, nu caută ale sale, nu se mânie, nu ține seama de răul făcut..."
|
const versesContext = relevantVerses
|
||||||
|
.map(verse => `${verse.ref}: "${verse.text_raw}"`)
|
||||||
|
.join('\n\n')
|
||||||
|
|
||||||
Isus ne-a dat cea mai mare poruncă: "Să iubești pe Domnul Dumnezeul tău cu toată inima ta, cu tot sufletul tău și cu tot cugetul tău" și "să-ți iubești aproapele ca pe tine însuți" (Matei 22:37-39).`
|
// Create conversation history for context
|
||||||
|
const conversationHistory = history
|
||||||
|
.slice(-3) // Last 3 messages for context
|
||||||
|
.map(msg => `${msg.role}: ${msg.content}`)
|
||||||
|
.join('\n')
|
||||||
|
|
||||||
|
// Construct prompt for Azure OpenAI
|
||||||
|
const systemPrompt = `Ești un asistent AI pentru întrebări biblice în limba română. Răspunde pe baza Scripturii, fiind respectuos și înțelept.
|
||||||
|
|
||||||
|
Instrucțiuni:
|
||||||
|
- Folosește versurile biblice relevante pentru a răspunde la întrebare
|
||||||
|
- Citează întotdeauna referințele biblice (ex: Ioan 3:16)
|
||||||
|
- Răspunde în română
|
||||||
|
- Fii empatic și încurajator
|
||||||
|
- Dacă nu ești sigur, încurajează studiul personal și rugăciunea
|
||||||
|
|
||||||
|
Versuri relevante pentru această întrebare:
|
||||||
|
${versesContext}
|
||||||
|
|
||||||
|
Conversația anterioară:
|
||||||
|
${conversationHistory}
|
||||||
|
|
||||||
|
Întrebarea curentă: ${message}`
|
||||||
|
|
||||||
|
// Call Azure OpenAI
|
||||||
|
const response = await fetch(
|
||||||
|
`${process.env.AZURE_OPENAI_ENDPOINT}/openai/deployments/${process.env.AZURE_OPENAI_DEPLOYMENT}/chat/completions?api-version=${process.env.AZURE_OPENAI_API_VERSION}`,
|
||||||
|
{
|
||||||
|
method: 'POST',
|
||||||
|
headers: {
|
||||||
|
'api-key': process.env.AZURE_OPENAI_KEY!,
|
||||||
|
'Content-Type': 'application/json',
|
||||||
|
},
|
||||||
|
body: JSON.stringify({
|
||||||
|
messages: [
|
||||||
|
{
|
||||||
|
role: 'system',
|
||||||
|
content: systemPrompt
|
||||||
|
},
|
||||||
|
{
|
||||||
|
role: 'user',
|
||||||
|
content: message
|
||||||
|
}
|
||||||
|
],
|
||||||
|
max_tokens: 800,
|
||||||
|
temperature: 0.7,
|
||||||
|
top_p: 0.9
|
||||||
|
}),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
if (!response.ok) {
|
||||||
|
throw new Error(`Azure OpenAI API error: ${response.status}`)
|
||||||
|
}
|
||||||
|
|
||||||
|
const data = await response.json()
|
||||||
|
return data.choices[0].message.content
|
||||||
|
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Error calling Azure OpenAI:', error)
|
||||||
|
|
||||||
|
// Fallback to simple response if AI fails
|
||||||
|
return `Îmi pare rău, dar întâmpin o problemă tehnică în acest moment. Te încurajez să cercetezi acest subiect în Scripturi și să te rogi pentru înțelegere.
|
||||||
|
|
||||||
|
"Cercetați Scripturile, pentru că socotiți că în ele aveți viața veșnică, și tocmai ele mărturisesc despre Mine" (Ioan 5:39).
|
||||||
|
|
||||||
|
"Dacă vreunul dintre voi duce lipsă de înțelepciune, să ceară de la Dumnezeu, care dă tuturor cu dărnicie și fără mustrare, și i se va da" (Iacov 1:5).`
|
||||||
}
|
}
|
||||||
|
|
||||||
if (lowerMessage.includes('rugăciune') || lowerMessage.includes('rog')) {
|
|
||||||
return `Rugăciunea este comunicarea noastră directă cu Dumnezeu! Isus ne-a învățat să ne rugăm prin "Tatăl nostru" (Matei 6:9-13).
|
|
||||||
|
|
||||||
Iată câteva principii importante pentru rugăciune:
|
|
||||||
• "Rugați-vă neîncetat" (1 Tesaloniceni 5:17)
|
|
||||||
• "Cerceți și veți găsi; bateți și vi se va deschide" (Matei 7:7)
|
|
||||||
• "Nu vă îngrijorați de nimic, ci în toate, prin rugăciune și cerere, cu mulțumire, să fie cunoscute cererile voastre înaintea lui Dumnezeu" (Filipeni 4:6)
|
|
||||||
|
|
||||||
Rugăciunea poate include laudă, mulțumire, spovedanie și cereri - Dumnezeu vrea să audă totul din inima ta!`
|
|
||||||
}
|
|
||||||
|
|
||||||
if (lowerMessage.includes('credință') || lowerMessage.includes('cred')) {
|
|
||||||
return `Credința este fundamentul vieții creștine! "Fără credință este cu neputință să fim plăcuți lui Dumnezeu; căci cine se apropie de Dumnezeu trebuie să creadă că El este și că răsplătește pe cei ce Îl caută" (Evrei 11:6).
|
|
||||||
|
|
||||||
"Credința este o încredere neclintită în lucrurile nădăjduite, o dovadă a lucrurilor care nu se văd" (Evrei 11:1).
|
|
||||||
|
|
||||||
Isus a spus: "Adevărat vă spun că, dacă aveți credință cât un grăunte de muștar, veți zice muntelui acestuia: 'Mută-te de aici acolo!' și se va muta" (Matei 17:20).
|
|
||||||
|
|
||||||
Credința crește prin ascultarea Cuvântului lui Dumnezeu: "Credința vine din ascultare, iar ascultarea vine din Cuvântul lui Hristos" (Romani 10:17).`
|
|
||||||
}
|
|
||||||
|
|
||||||
if (lowerMessage.includes('speranță') || lowerMessage.includes('sper')) {
|
|
||||||
return `Speranța creștină nu este o dorință vagă, ci o certitudine bazată pe promisiunile lui Dumnezeu!
|
|
||||||
|
|
||||||
"Fie ca Dumnezeul speranței să vă umple de toată bucuria și pacea în credință, pentru ca să prisosiți în speranță, prin puterea Duhului Sfânt!" (Romani 15:13).
|
|
||||||
|
|
||||||
Speranța noastră este ancorata în Isus Hristos: "Hristos în voi, nădejdea slavei" (Coloseni 1:27).
|
|
||||||
|
|
||||||
"Binecuvântat să fie Dumnezeu, Tatăl Domnului nostru Isus Hristos, care, după îndurarea Sa cea mare, ne-a născut din nou, printr-o înviere a lui Isus Hristos din morți, pentru o moștenire care nu se poate strica" (1 Petru 1:3-4).`
|
|
||||||
}
|
|
||||||
|
|
||||||
if (lowerMessage.includes('iertare') || lowerMessage.includes('iert')) {
|
|
||||||
return `Iertarea este una dintre cele mai puternice învățături ale lui Isus! El ne-a învățat să ne rugăm: "Iartă-ne greșelile noastre, precum și noi iertăm greșiților noștri" (Matei 6:12).
|
|
||||||
|
|
||||||
"Dacă iertați oamenilor greșelile lor, și Tatăl vostru cel ceresc vă va ierta greșelile voastre" (Matei 6:14).
|
|
||||||
|
|
||||||
Petru a întrebat pe Isus: "De câte ori să iert?" Isus a răspuns: "Nu îți zic până la șapte ori, ci până la șaptezeci de ori câte șapte" (Matei 18:21-22) - adică mereu!
|
|
||||||
|
|
||||||
Iertarea nu înseamnă că minimalizăm răul, ci că alegem să nu ținem seama de el, așa cum Dumnezeu face cu noi prin Hristos.`
|
|
||||||
}
|
|
||||||
|
|
||||||
if (lowerMessage.includes('pace') || lowerMessage.includes('liniște')) {
|
|
||||||
return `Pacea lui Dumnezeu este diferită de pacea lumii! Isus a spus: "Pace vă las, pacea Mea vă dau; nu cum dă lumea, vă dau Eu. Să nu vi se tulbure inima și să nu vă fie frică!" (Ioan 14:27).
|
|
||||||
|
|
||||||
"Pacea lui Dumnezeu, care întrece orice pricepere, vă va păzi inimile și gândurile în Hristos Isus" (Filipeni 4:7).
|
|
||||||
|
|
||||||
Pentru a avea pace:
|
|
||||||
• "În toate, prin rugăciune și cerere, cu mulțumire, să fie cunoscute cererile voastre înaintea lui Dumnezeu" (Filipeni 4:6)
|
|
||||||
• "Aruncați toată grija voastră asupra Lui, căci El îngrijește de voi" (1 Petru 5:7)
|
|
||||||
• "Isus le-a zis: 'Veniți la Mine, toți cei trudiți și împovărați, și Eu vă voi da odihnă'" (Matei 11:28)`
|
|
||||||
}
|
|
||||||
|
|
||||||
// Default response for other questions
|
|
||||||
return `Mulțumesc pentru întrebarea ta! Aceasta este o întrebare foarte importantă din punct de vedere biblic.
|
|
||||||
|
|
||||||
Te încurajez să cercetezi acest subiect în Scriptură, să te rogi pentru înțelegere și să discuți cu lideri spirituali maturi. "Cercetați Scripturile, pentru că socotiți că în ele aveți viața veșnică, și tocmai ele mărturisesc despre Mine" (Ioan 5:39).
|
|
||||||
|
|
||||||
Dacă ai întrebări mai specifice despre anumite pasaje biblice sau doctrine, voi fi bucuros să te ajut mai detaliat. Dumnezeu să te binecuvânteze în căutarea ta după adevăr!
|
|
||||||
|
|
||||||
"Dacă vreunul dintre voi duce lipsă de înțelepciune, să ceară de la Dumnezeu, care dă tuturor cu dărnicie și fără mustrare, și i se va da" (Iacob 1:5).`
|
|
||||||
}
|
}
|
||||||
@@ -1,6 +1,7 @@
|
|||||||
import './globals.css'
|
import './globals.css'
|
||||||
import type { Metadata } from 'next'
|
import type { Metadata } from 'next'
|
||||||
import { MuiThemeProvider } from '@/components/providers/theme-provider'
|
import { MuiThemeProvider } from '@/components/providers/theme-provider'
|
||||||
|
import FloatingChat from '@/components/chat/floating-chat'
|
||||||
|
|
||||||
export const metadata: Metadata = {
|
export const metadata: Metadata = {
|
||||||
title: 'Ghid Biblic - Biblical Guide',
|
title: 'Ghid Biblic - Biblical Guide',
|
||||||
@@ -17,6 +18,7 @@ export default function RootLayout({
|
|||||||
<body>
|
<body>
|
||||||
<MuiThemeProvider>
|
<MuiThemeProvider>
|
||||||
{children}
|
{children}
|
||||||
|
<FloatingChat />
|
||||||
</MuiThemeProvider>
|
</MuiThemeProvider>
|
||||||
</body>
|
</body>
|
||||||
</html>
|
</html>
|
||||||
|
|||||||
426
components/chat/floating-chat.tsx
Normal file
426
components/chat/floating-chat.tsx
Normal file
@@ -0,0 +1,426 @@
|
|||||||
|
'use client'
|
||||||
|
import {
|
||||||
|
Fab,
|
||||||
|
Drawer,
|
||||||
|
Box,
|
||||||
|
Typography,
|
||||||
|
TextField,
|
||||||
|
Button,
|
||||||
|
Paper,
|
||||||
|
Avatar,
|
||||||
|
Chip,
|
||||||
|
IconButton,
|
||||||
|
Divider,
|
||||||
|
List,
|
||||||
|
ListItem,
|
||||||
|
ListItemText,
|
||||||
|
useTheme,
|
||||||
|
Slide,
|
||||||
|
Grow,
|
||||||
|
Zoom,
|
||||||
|
} from '@mui/material'
|
||||||
|
import {
|
||||||
|
Chat,
|
||||||
|
Send,
|
||||||
|
Close,
|
||||||
|
SmartToy,
|
||||||
|
Person,
|
||||||
|
ContentCopy,
|
||||||
|
ThumbUp,
|
||||||
|
ThumbDown,
|
||||||
|
Minimize,
|
||||||
|
Launch,
|
||||||
|
} from '@mui/icons-material'
|
||||||
|
import { useState, useRef, useEffect } from 'react'
|
||||||
|
|
||||||
|
interface ChatMessage {
|
||||||
|
id: string
|
||||||
|
role: 'user' | 'assistant'
|
||||||
|
content: string
|
||||||
|
timestamp: Date
|
||||||
|
}
|
||||||
|
|
||||||
|
export default function FloatingChat() {
|
||||||
|
const theme = useTheme()
|
||||||
|
const [isOpen, setIsOpen] = useState(false)
|
||||||
|
const [isMinimized, setIsMinimized] = useState(false)
|
||||||
|
const [messages, setMessages] = useState<ChatMessage[]>([
|
||||||
|
{
|
||||||
|
id: '1',
|
||||||
|
role: 'assistant',
|
||||||
|
content: 'Bună ziua! Sunt asistentul tău AI pentru întrebări biblice. Cum te pot ajuta astăzi să înțelegi mai bine Scriptura?',
|
||||||
|
timestamp: new Date(),
|
||||||
|
}
|
||||||
|
])
|
||||||
|
const [inputMessage, setInputMessage] = useState('')
|
||||||
|
const [isLoading, setIsLoading] = useState(false)
|
||||||
|
const messagesEndRef = useRef<HTMLDivElement>(null)
|
||||||
|
|
||||||
|
const scrollToBottom = () => {
|
||||||
|
messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' })
|
||||||
|
}
|
||||||
|
|
||||||
|
useEffect(() => {
|
||||||
|
scrollToBottom()
|
||||||
|
}, [messages])
|
||||||
|
|
||||||
|
const handleSendMessage = async () => {
|
||||||
|
if (!inputMessage.trim() || isLoading) return
|
||||||
|
|
||||||
|
const userMessage: ChatMessage = {
|
||||||
|
id: Date.now().toString(),
|
||||||
|
role: 'user',
|
||||||
|
content: inputMessage,
|
||||||
|
timestamp: new Date(),
|
||||||
|
}
|
||||||
|
|
||||||
|
setMessages(prev => [...prev, userMessage])
|
||||||
|
setInputMessage('')
|
||||||
|
setIsLoading(true)
|
||||||
|
|
||||||
|
try {
|
||||||
|
const response = await fetch('/api/chat', {
|
||||||
|
method: 'POST',
|
||||||
|
headers: {
|
||||||
|
'Content-Type': 'application/json',
|
||||||
|
},
|
||||||
|
body: JSON.stringify({
|
||||||
|
message: inputMessage,
|
||||||
|
history: messages.slice(-5),
|
||||||
|
}),
|
||||||
|
})
|
||||||
|
|
||||||
|
if (!response.ok) {
|
||||||
|
throw new Error('Failed to get response')
|
||||||
|
}
|
||||||
|
|
||||||
|
const data = await response.json()
|
||||||
|
|
||||||
|
const assistantMessage: ChatMessage = {
|
||||||
|
id: (Date.now() + 1).toString(),
|
||||||
|
role: 'assistant',
|
||||||
|
content: data.response || 'Îmi pare rău, nu am putut procesa întrebarea ta. Te rog încearcă din nou.',
|
||||||
|
timestamp: new Date(),
|
||||||
|
}
|
||||||
|
|
||||||
|
setMessages(prev => [...prev, assistantMessage])
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Error sending message:', error)
|
||||||
|
const errorMessage: ChatMessage = {
|
||||||
|
id: (Date.now() + 1).toString(),
|
||||||
|
role: 'assistant',
|
||||||
|
content: 'Îmi pare rău, a apărut o eroare. Te rog verifică conexiunea și încearcă din nou.',
|
||||||
|
timestamp: new Date(),
|
||||||
|
}
|
||||||
|
setMessages(prev => [...prev, errorMessage])
|
||||||
|
} finally {
|
||||||
|
setIsLoading(false)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const handleKeyPress = (event: React.KeyboardEvent) => {
|
||||||
|
if (event.key === 'Enter' && !event.shiftKey) {
|
||||||
|
event.preventDefault()
|
||||||
|
handleSendMessage()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const copyToClipboard = (text: string) => {
|
||||||
|
navigator.clipboard.writeText(text)
|
||||||
|
}
|
||||||
|
|
||||||
|
const suggestedQuestions = [
|
||||||
|
'Ce spune Biblia despre iubire?',
|
||||||
|
'Explică-mi parabola semănătorului',
|
||||||
|
'Care sunt fructele Duhului?',
|
||||||
|
'Ce înseamnă să fii născut din nou?',
|
||||||
|
'Cum pot să mă rog mai bine?',
|
||||||
|
]
|
||||||
|
|
||||||
|
const toggleChat = () => {
|
||||||
|
setIsOpen(!isOpen)
|
||||||
|
if (isMinimized) setIsMinimized(false)
|
||||||
|
}
|
||||||
|
|
||||||
|
const minimizeChat = () => {
|
||||||
|
setIsMinimized(!isMinimized)
|
||||||
|
}
|
||||||
|
|
||||||
|
const openFullChat = () => {
|
||||||
|
window.open('/chat', '_blank')
|
||||||
|
}
|
||||||
|
|
||||||
|
return (
|
||||||
|
<>
|
||||||
|
{/* Floating Action Button */}
|
||||||
|
<Zoom in={!isOpen} unmountOnExit>
|
||||||
|
<Fab
|
||||||
|
color="primary"
|
||||||
|
onClick={toggleChat}
|
||||||
|
sx={{
|
||||||
|
position: 'fixed',
|
||||||
|
bottom: 24,
|
||||||
|
right: 24,
|
||||||
|
zIndex: 1000,
|
||||||
|
background: 'linear-gradient(45deg, #2C5F6B 30%, #8B7355 90%)',
|
||||||
|
'&:hover': {
|
||||||
|
background: 'linear-gradient(45deg, #1e4148 30%, #6d5a43 90%)',
|
||||||
|
}
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
<Chat />
|
||||||
|
</Fab>
|
||||||
|
</Zoom>
|
||||||
|
|
||||||
|
{/* Chat Overlay */}
|
||||||
|
<Slide direction="up" in={isOpen} mountOnExit>
|
||||||
|
<Paper
|
||||||
|
elevation={8}
|
||||||
|
sx={{
|
||||||
|
position: 'fixed',
|
||||||
|
bottom: 0,
|
||||||
|
right: 0,
|
||||||
|
width: { xs: '100vw', sm: '50vw', md: '40vw' },
|
||||||
|
height: isMinimized ? 'auto' : '100vh',
|
||||||
|
zIndex: 1200,
|
||||||
|
borderRadius: { xs: 0, sm: '12px 0 0 0' },
|
||||||
|
overflow: 'hidden',
|
||||||
|
display: 'flex',
|
||||||
|
flexDirection: 'column',
|
||||||
|
background: 'linear-gradient(to bottom, #f8f9fa, #ffffff)',
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
{/* Header */}
|
||||||
|
<Box
|
||||||
|
sx={{
|
||||||
|
p: 2,
|
||||||
|
background: 'linear-gradient(45deg, #2C5F6B 30%, #8B7355 90%)',
|
||||||
|
color: 'white',
|
||||||
|
display: 'flex',
|
||||||
|
alignItems: 'center',
|
||||||
|
justifyContent: 'space-between',
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
<Box sx={{ display: 'flex', alignItems: 'center', gap: 1 }}>
|
||||||
|
<Avatar sx={{ bgcolor: 'rgba(255,255,255,0.2)' }}>
|
||||||
|
<SmartToy />
|
||||||
|
</Avatar>
|
||||||
|
<Box>
|
||||||
|
<Typography variant="subtitle1" fontWeight="bold">
|
||||||
|
Chat AI Biblic
|
||||||
|
</Typography>
|
||||||
|
<Typography variant="caption" sx={{ opacity: 0.9 }}>
|
||||||
|
Asistent pentru întrebări biblice
|
||||||
|
</Typography>
|
||||||
|
</Box>
|
||||||
|
</Box>
|
||||||
|
<Box>
|
||||||
|
<IconButton
|
||||||
|
size="small"
|
||||||
|
onClick={minimizeChat}
|
||||||
|
sx={{ color: 'white', mr: 0.5 }}
|
||||||
|
>
|
||||||
|
<Minimize />
|
||||||
|
</IconButton>
|
||||||
|
<IconButton
|
||||||
|
size="small"
|
||||||
|
onClick={openFullChat}
|
||||||
|
sx={{ color: 'white', mr: 0.5 }}
|
||||||
|
>
|
||||||
|
<Launch />
|
||||||
|
</IconButton>
|
||||||
|
<IconButton
|
||||||
|
size="small"
|
||||||
|
onClick={toggleChat}
|
||||||
|
sx={{ color: 'white' }}
|
||||||
|
>
|
||||||
|
<Close />
|
||||||
|
</IconButton>
|
||||||
|
</Box>
|
||||||
|
</Box>
|
||||||
|
|
||||||
|
{!isMinimized && (
|
||||||
|
<>
|
||||||
|
{/* Suggested Questions */}
|
||||||
|
<Box sx={{ p: 2, borderBottom: 1, borderColor: 'divider' }}>
|
||||||
|
<Typography variant="body2" color="text.secondary" sx={{ mb: 1 }}>
|
||||||
|
Întrebări sugerate:
|
||||||
|
</Typography>
|
||||||
|
<Box sx={{ display: 'flex', flexWrap: 'wrap', gap: 0.5 }}>
|
||||||
|
{suggestedQuestions.slice(0, 3).map((question, index) => (
|
||||||
|
<Chip
|
||||||
|
key={index}
|
||||||
|
label={question}
|
||||||
|
size="small"
|
||||||
|
variant="outlined"
|
||||||
|
onClick={() => setInputMessage(question)}
|
||||||
|
sx={{
|
||||||
|
fontSize: '0.75rem',
|
||||||
|
cursor: 'pointer',
|
||||||
|
'&:hover': {
|
||||||
|
bgcolor: 'primary.light',
|
||||||
|
color: 'white',
|
||||||
|
},
|
||||||
|
}}
|
||||||
|
/>
|
||||||
|
))}
|
||||||
|
</Box>
|
||||||
|
</Box>
|
||||||
|
|
||||||
|
{/* Messages */}
|
||||||
|
<Box
|
||||||
|
sx={{
|
||||||
|
flexGrow: 1,
|
||||||
|
overflow: 'auto',
|
||||||
|
p: 1,
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
{messages.map((message) => (
|
||||||
|
<Box
|
||||||
|
key={message.id}
|
||||||
|
sx={{
|
||||||
|
display: 'flex',
|
||||||
|
justifyContent: message.role === 'user' ? 'flex-end' : 'flex-start',
|
||||||
|
mb: 2,
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
<Box
|
||||||
|
sx={{
|
||||||
|
display: 'flex',
|
||||||
|
flexDirection: message.role === 'user' ? 'row-reverse' : 'row',
|
||||||
|
alignItems: 'flex-start',
|
||||||
|
maxWidth: '85%',
|
||||||
|
gap: 1,
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
<Avatar
|
||||||
|
sx={{
|
||||||
|
width: 32,
|
||||||
|
height: 32,
|
||||||
|
bgcolor: message.role === 'user' ? 'primary.main' : 'secondary.main',
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
{message.role === 'user' ? <Person fontSize="small" /> : <SmartToy fontSize="small" />}
|
||||||
|
</Avatar>
|
||||||
|
|
||||||
|
<Paper
|
||||||
|
elevation={1}
|
||||||
|
sx={{
|
||||||
|
p: 1.5,
|
||||||
|
bgcolor: message.role === 'user' ? 'primary.light' : 'background.paper',
|
||||||
|
color: message.role === 'user' ? 'white' : 'text.primary',
|
||||||
|
borderRadius: 2,
|
||||||
|
maxWidth: '100%',
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
<Typography
|
||||||
|
variant="body2"
|
||||||
|
sx={{
|
||||||
|
whiteSpace: 'pre-wrap',
|
||||||
|
lineHeight: 1.4,
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
{message.content}
|
||||||
|
</Typography>
|
||||||
|
|
||||||
|
{message.role === 'assistant' && (
|
||||||
|
<Box sx={{ display: 'flex', gap: 0.5, mt: 1, justifyContent: 'flex-end' }}>
|
||||||
|
<IconButton
|
||||||
|
size="small"
|
||||||
|
onClick={() => copyToClipboard(message.content)}
|
||||||
|
>
|
||||||
|
<ContentCopy fontSize="small" />
|
||||||
|
</IconButton>
|
||||||
|
<IconButton size="small">
|
||||||
|
<ThumbUp fontSize="small" />
|
||||||
|
</IconButton>
|
||||||
|
<IconButton size="small">
|
||||||
|
<ThumbDown fontSize="small" />
|
||||||
|
</IconButton>
|
||||||
|
</Box>
|
||||||
|
)}
|
||||||
|
|
||||||
|
<Typography
|
||||||
|
variant="caption"
|
||||||
|
sx={{
|
||||||
|
display: 'block',
|
||||||
|
textAlign: 'right',
|
||||||
|
mt: 0.5,
|
||||||
|
opacity: 0.7,
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
{message.timestamp.toLocaleTimeString('ro-RO', {
|
||||||
|
hour: '2-digit',
|
||||||
|
minute: '2-digit',
|
||||||
|
})}
|
||||||
|
</Typography>
|
||||||
|
</Paper>
|
||||||
|
</Box>
|
||||||
|
</Box>
|
||||||
|
))}
|
||||||
|
|
||||||
|
{isLoading && (
|
||||||
|
<Box sx={{ display: 'flex', justifyContent: 'flex-start', mb: 2 }}>
|
||||||
|
<Box sx={{ display: 'flex', alignItems: 'flex-start', gap: 1 }}>
|
||||||
|
<Avatar sx={{ width: 32, height: 32, bgcolor: 'secondary.main' }}>
|
||||||
|
<SmartToy fontSize="small" />
|
||||||
|
</Avatar>
|
||||||
|
<Paper elevation={1} sx={{ p: 1.5, borderRadius: 2 }}>
|
||||||
|
<Typography variant="body2">
|
||||||
|
Scriu răspunsul...
|
||||||
|
</Typography>
|
||||||
|
</Paper>
|
||||||
|
</Box>
|
||||||
|
</Box>
|
||||||
|
)}
|
||||||
|
|
||||||
|
<div ref={messagesEndRef} />
|
||||||
|
</Box>
|
||||||
|
|
||||||
|
<Divider />
|
||||||
|
|
||||||
|
{/* Input */}
|
||||||
|
<Box sx={{ p: 2 }}>
|
||||||
|
<Box sx={{ display: 'flex', gap: 1 }}>
|
||||||
|
<TextField
|
||||||
|
fullWidth
|
||||||
|
size="small"
|
||||||
|
multiline
|
||||||
|
maxRows={3}
|
||||||
|
placeholder="Scrie întrebarea ta despre Biblie..."
|
||||||
|
value={inputMessage}
|
||||||
|
onChange={(e) => setInputMessage(e.target.value)}
|
||||||
|
onKeyPress={handleKeyPress}
|
||||||
|
disabled={isLoading}
|
||||||
|
variant="outlined"
|
||||||
|
sx={{
|
||||||
|
'& .MuiOutlinedInput-root': {
|
||||||
|
borderRadius: 2,
|
||||||
|
}
|
||||||
|
}}
|
||||||
|
/>
|
||||||
|
<Button
|
||||||
|
variant="contained"
|
||||||
|
onClick={handleSendMessage}
|
||||||
|
disabled={!inputMessage.trim() || isLoading}
|
||||||
|
sx={{
|
||||||
|
minWidth: 'auto',
|
||||||
|
px: 2,
|
||||||
|
borderRadius: 2,
|
||||||
|
background: 'linear-gradient(45deg, #2C5F6B 30%, #8B7355 90%)',
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
<Send fontSize="small" />
|
||||||
|
</Button>
|
||||||
|
</Box>
|
||||||
|
<Typography variant="caption" color="text.secondary" sx={{ mt: 0.5, display: 'block' }}>
|
||||||
|
Enter pentru a trimite • Shift+Enter pentru linie nouă
|
||||||
|
</Typography>
|
||||||
|
</Box>
|
||||||
|
</>
|
||||||
|
)}
|
||||||
|
</Paper>
|
||||||
|
</Slide>
|
||||||
|
</>
|
||||||
|
)
|
||||||
|
}
|
||||||
@@ -24,7 +24,6 @@ import {
|
|||||||
import {
|
import {
|
||||||
Menu as MenuIcon,
|
Menu as MenuIcon,
|
||||||
MenuBook,
|
MenuBook,
|
||||||
Chat,
|
|
||||||
Favorite as Prayer,
|
Favorite as Prayer,
|
||||||
Search,
|
Search,
|
||||||
AccountCircle,
|
AccountCircle,
|
||||||
@@ -37,7 +36,6 @@ import { useRouter } from 'next/navigation'
|
|||||||
const pages = [
|
const pages = [
|
||||||
{ name: 'Acasă', path: '/', icon: <Home /> },
|
{ name: 'Acasă', path: '/', icon: <Home /> },
|
||||||
{ name: 'Biblia', path: '/bible', icon: <MenuBook /> },
|
{ name: 'Biblia', path: '/bible', icon: <MenuBook /> },
|
||||||
{ name: 'Chat AI', path: '/chat', icon: <Chat /> },
|
|
||||||
{ name: 'Rugăciuni', path: '/prayers', icon: <Prayer /> },
|
{ name: 'Rugăciuni', path: '/prayers', icon: <Prayer /> },
|
||||||
{ name: 'Căutare', path: '/search', icon: <Search /> },
|
{ name: 'Căutare', path: '/search', icon: <Search /> },
|
||||||
]
|
]
|
||||||
|
|||||||
140
lib/vector-search.ts
Normal file
140
lib/vector-search.ts
Normal file
@@ -0,0 +1,140 @@
|
|||||||
|
import { Pool } from 'pg'
|
||||||
|
|
||||||
|
const pool = new Pool({
|
||||||
|
connectionString: process.env.DATABASE_URL,
|
||||||
|
})
|
||||||
|
|
||||||
|
export interface BibleVerse {
|
||||||
|
id: string
|
||||||
|
ref: string
|
||||||
|
book: string
|
||||||
|
chapter: number
|
||||||
|
verse: number
|
||||||
|
text_raw: string
|
||||||
|
similarity?: number
|
||||||
|
combined_score?: number
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function getEmbedding(text: string): Promise<number[]> {
|
||||||
|
const response = await fetch(
|
||||||
|
`${process.env.AZURE_OPENAI_ENDPOINT}/openai/deployments/${process.env.AZURE_OPENAI_EMBED_DEPLOYMENT}/embeddings?api-version=${process.env.AZURE_OPENAI_API_VERSION}`,
|
||||||
|
{
|
||||||
|
method: 'POST',
|
||||||
|
headers: {
|
||||||
|
'api-key': process.env.AZURE_OPENAI_KEY!,
|
||||||
|
'Content-Type': 'application/json',
|
||||||
|
},
|
||||||
|
body: JSON.stringify({
|
||||||
|
input: [text],
|
||||||
|
}),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
if (!response.ok) {
|
||||||
|
throw new Error(`Embedding API error: ${response.status}`)
|
||||||
|
}
|
||||||
|
|
||||||
|
const data = await response.json()
|
||||||
|
return data.data[0].embedding
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function searchBibleSemantic(
|
||||||
|
query: string,
|
||||||
|
limit: number = 10
|
||||||
|
): Promise<BibleVerse[]> {
|
||||||
|
try {
|
||||||
|
const queryEmbedding = await getEmbedding(query)
|
||||||
|
|
||||||
|
const client = await pool.connect()
|
||||||
|
try {
|
||||||
|
const result = await client.query(
|
||||||
|
`
|
||||||
|
SELECT ref, book, chapter, verse, text_raw,
|
||||||
|
1 - (embedding <=> $1) AS similarity
|
||||||
|
FROM bible_passages
|
||||||
|
WHERE embedding IS NOT NULL
|
||||||
|
ORDER BY embedding <=> $1
|
||||||
|
LIMIT $2
|
||||||
|
`,
|
||||||
|
[JSON.stringify(queryEmbedding), limit]
|
||||||
|
)
|
||||||
|
|
||||||
|
return result.rows
|
||||||
|
} finally {
|
||||||
|
client.release()
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Error in semantic search:', error)
|
||||||
|
throw error
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function searchBibleHybrid(
|
||||||
|
query: string,
|
||||||
|
limit: number = 10
|
||||||
|
): Promise<BibleVerse[]> {
|
||||||
|
try {
|
||||||
|
const queryEmbedding = await getEmbedding(query)
|
||||||
|
|
||||||
|
const client = await pool.connect()
|
||||||
|
try {
|
||||||
|
const result = await client.query(
|
||||||
|
`
|
||||||
|
WITH vector_search AS (
|
||||||
|
SELECT id, 1 - (embedding <=> $1) AS vector_sim
|
||||||
|
FROM bible_passages
|
||||||
|
WHERE embedding IS NOT NULL
|
||||||
|
ORDER BY embedding <=> $1
|
||||||
|
LIMIT 100
|
||||||
|
),
|
||||||
|
text_search AS (
|
||||||
|
SELECT id, ts_rank(tsv, plainto_tsquery('romanian', $3)) AS text_rank
|
||||||
|
FROM bible_passages
|
||||||
|
WHERE tsv @@ plainto_tsquery('romanian', $3)
|
||||||
|
)
|
||||||
|
SELECT bp.ref, bp.book, bp.chapter, bp.verse, bp.text_raw,
|
||||||
|
COALESCE(vs.vector_sim, 0) * 0.7 + COALESCE(ts.text_rank, 0) * 0.3 AS combined_score
|
||||||
|
FROM bible_passages bp
|
||||||
|
LEFT JOIN vector_search vs ON vs.id = bp.id
|
||||||
|
LEFT JOIN text_search ts ON ts.id = bp.id
|
||||||
|
WHERE vs.id IS NOT NULL OR ts.id IS NOT NULL
|
||||||
|
ORDER BY combined_score DESC
|
||||||
|
LIMIT $2
|
||||||
|
`,
|
||||||
|
[JSON.stringify(queryEmbedding), limit, query]
|
||||||
|
)
|
||||||
|
|
||||||
|
return result.rows
|
||||||
|
} finally {
|
||||||
|
client.release()
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Error in hybrid search:', error)
|
||||||
|
throw error
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function getContextVerses(
|
||||||
|
book: string,
|
||||||
|
chapter: number,
|
||||||
|
verse: number,
|
||||||
|
contextSize: number = 2
|
||||||
|
): Promise<BibleVerse[]> {
|
||||||
|
const client = await pool.connect()
|
||||||
|
try {
|
||||||
|
const result = await client.query(
|
||||||
|
`
|
||||||
|
SELECT ref, book, chapter, verse, text_raw
|
||||||
|
FROM bible_passages
|
||||||
|
WHERE book = $1 AND chapter = $2
|
||||||
|
AND verse BETWEEN $3 AND $4
|
||||||
|
ORDER BY verse
|
||||||
|
`,
|
||||||
|
[book, chapter, verse - contextSize, verse + contextSize]
|
||||||
|
)
|
||||||
|
|
||||||
|
return result.rows
|
||||||
|
} finally {
|
||||||
|
client.release()
|
||||||
|
}
|
||||||
|
}
|
||||||
212
multi-language-implementation-plan.md
Normal file
212
multi-language-implementation-plan.md
Normal file
@@ -0,0 +1,212 @@
|
|||||||
|
# Multi-Language Support Implementation Plan
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
Add comprehensive multi-language support to the Ghid Biblic application, starting with English as the second language alongside Romanian.
|
||||||
|
|
||||||
|
## Current State
|
||||||
|
- **Database**: Already supports multiple languages (`lang` field) and translations (`translation` field)
|
||||||
|
- **Frontend**: Hardcoded Romanian interface
|
||||||
|
- **Vector Search**: Romanian-only search logic
|
||||||
|
- **Bible Data**: Only Romanian (FIDELA) version imported
|
||||||
|
|
||||||
|
## Implementation Phases
|
||||||
|
|
||||||
|
### Phase 1: Core Infrastructure
|
||||||
|
1. **Install i18n Framework**
|
||||||
|
- Add `next-intl` for Next.js internationalization
|
||||||
|
- Configure locale routing (`/ro/`, `/en/`)
|
||||||
|
- Set up translation file structure
|
||||||
|
|
||||||
|
2. **Language Configuration**
|
||||||
|
- Create language detection and switching logic
|
||||||
|
- Add language persistence (localStorage/cookies)
|
||||||
|
- Configure default language fallbacks
|
||||||
|
|
||||||
|
3. **Translation Files Structure**
|
||||||
|
```
|
||||||
|
messages/
|
||||||
|
├── ro.json (Romanian - existing content)
|
||||||
|
├── en.json (English translations)
|
||||||
|
└── common.json (shared terms)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 2: UI Internationalization
|
||||||
|
1. **Navigation Component**
|
||||||
|
- Translate all menu items and labels
|
||||||
|
- Add language switcher dropdown
|
||||||
|
- Update routing for locale-aware navigation
|
||||||
|
|
||||||
|
2. **Chat Interface**
|
||||||
|
- Translate all UI text and prompts
|
||||||
|
- Add suggested questions per language
|
||||||
|
- Update loading states and error messages
|
||||||
|
|
||||||
|
3. **Page Content**
|
||||||
|
- Home page (`/` → `/[locale]/`)
|
||||||
|
- Bible browser (`/bible` → `/[locale]/bible`)
|
||||||
|
- Search page (`/search` → `/[locale]/search`)
|
||||||
|
- Prayer requests (`/prayers` → `/[locale]/prayers`)
|
||||||
|
|
||||||
|
### Phase 3: Backend Localization
|
||||||
|
1. **Vector Search Updates**
|
||||||
|
- Modify search functions to filter by language
|
||||||
|
- Add language parameter to search APIs
|
||||||
|
- Update hybrid search for language-specific full-text search
|
||||||
|
|
||||||
|
2. **Chat API Enhancement**
|
||||||
|
- Language-aware Bible verse retrieval
|
||||||
|
- Localized AI response prompts
|
||||||
|
- Language-specific fallback responses
|
||||||
|
|
||||||
|
3. **API Route Updates**
|
||||||
|
- Add locale parameter to all API endpoints
|
||||||
|
- Update error responses for each language
|
||||||
|
- Configure language-specific search configurations
|
||||||
|
|
||||||
|
### Phase 4: Bible Data Management
|
||||||
|
1. **English Bible Import**
|
||||||
|
- Source: API.Bible or public domain English Bible (KJV/ESV)
|
||||||
|
- Adapt existing import script for English
|
||||||
|
- Generate English embeddings using Azure OpenAI
|
||||||
|
|
||||||
|
2. **Language-Aware Bible Browser**
|
||||||
|
- Add language selector in Bible interface
|
||||||
|
- Filter books/chapters/verses by selected language
|
||||||
|
- Show parallel verses when both languages available
|
||||||
|
|
||||||
|
### Phase 5: Enhanced Features
|
||||||
|
1. **Parallel Bible View**
|
||||||
|
- Side-by-side Romanian/English verse display
|
||||||
|
- Cross-reference linking between translations
|
||||||
|
- Language comparison in search results
|
||||||
|
|
||||||
|
2. **Smart Language Detection**
|
||||||
|
- Auto-detect query language in chat
|
||||||
|
- Suggest language switch based on user input
|
||||||
|
- Mixed-language search capabilities
|
||||||
|
|
||||||
|
3. **Advanced Search Features**
|
||||||
|
- Cross-language semantic search
|
||||||
|
- Translation comparison tools
|
||||||
|
- Language-specific biblical term glossaries
|
||||||
|
|
||||||
|
## Technical Implementation Details
|
||||||
|
|
||||||
|
### Routing Structure
|
||||||
|
```
|
||||||
|
Current: /page
|
||||||
|
New: /[locale]/page
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
- /ro/biblia (Romanian Bible)
|
||||||
|
- /en/bible (English Bible)
|
||||||
|
- /ro/rugaciuni (Romanian Prayers)
|
||||||
|
- /en/prayers (English Prayers)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Database Schema Changes
|
||||||
|
**No changes needed** - current schema already supports:
|
||||||
|
- Multiple languages via `lang` field
|
||||||
|
- Multiple translations via `translation` field
|
||||||
|
- Unique constraints per translation/language
|
||||||
|
|
||||||
|
### Vector Search Updates
|
||||||
|
```typescript
|
||||||
|
// Current
|
||||||
|
searchBibleHybrid(query: string, limit: number)
|
||||||
|
|
||||||
|
// Enhanced
|
||||||
|
searchBibleHybrid(query: string, language: string, limit: number)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Translation File Structure
|
||||||
|
```json
|
||||||
|
// messages/en.json
|
||||||
|
{
|
||||||
|
"navigation": {
|
||||||
|
"home": "Home",
|
||||||
|
"bible": "Bible",
|
||||||
|
"prayers": "Prayers",
|
||||||
|
"search": "Search"
|
||||||
|
},
|
||||||
|
"chat": {
|
||||||
|
"placeholder": "Ask your biblical question...",
|
||||||
|
"suggestions": [
|
||||||
|
"What does the Bible say about love?",
|
||||||
|
"Explain the parable of the sower",
|
||||||
|
"What are the fruits of the Spirit?"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Language Switcher Component
|
||||||
|
- Dropdown in navigation header
|
||||||
|
- Flag icons for visual identification
|
||||||
|
- Persist language choice across sessions
|
||||||
|
- Redirect to equivalent page in new language
|
||||||
|
|
||||||
|
## Dependencies to Add
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"next-intl": "^3.x",
|
||||||
|
"@formatjs/intl-localematcher": "^0.x",
|
||||||
|
"negotiator": "^0.x"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## File Structure Changes
|
||||||
|
```
|
||||||
|
app/
|
||||||
|
├── [locale]/
|
||||||
|
│ ├── page.tsx
|
||||||
|
│ ├── bible/
|
||||||
|
│ ├── prayers/
|
||||||
|
│ ├── search/
|
||||||
|
│ └── layout.tsx
|
||||||
|
├── api/ (unchanged)
|
||||||
|
└── globals.css
|
||||||
|
|
||||||
|
messages/
|
||||||
|
├── en.json
|
||||||
|
├── ro.json
|
||||||
|
└── index.ts
|
||||||
|
|
||||||
|
components/
|
||||||
|
├── language-switcher.tsx
|
||||||
|
├── navigation.tsx (updated)
|
||||||
|
└── chat/ (updated)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Testing Strategy
|
||||||
|
1. **Unit Tests**: Translation loading and language switching
|
||||||
|
2. **Integration Tests**: API endpoints with locale parameters
|
||||||
|
3. **E2E Tests**: Complete user flows in both languages
|
||||||
|
4. **Performance Tests**: Vector search with language filtering
|
||||||
|
|
||||||
|
## Rollout Plan
|
||||||
|
1. **Development**: Implement Phase 1-3 (core infrastructure and UI)
|
||||||
|
2. **Testing**: Deploy to staging with Romanian/English support
|
||||||
|
3. **Beta Release**: Limited user testing with feedback collection
|
||||||
|
4. **Production**: Full release with both languages
|
||||||
|
5. **Future**: Add additional languages based on user demand
|
||||||
|
|
||||||
|
## Estimated Timeline
|
||||||
|
- **Phase 1-2**: 2-3 days (i18n setup and UI translation)
|
||||||
|
- **Phase 3**: 1-2 days (backend localization)
|
||||||
|
- **Phase 4**: 2-3 days (English Bible import and embeddings)
|
||||||
|
- **Phase 5**: 3-4 days (enhanced features)
|
||||||
|
- **Total**: 8-12 days for complete implementation
|
||||||
|
|
||||||
|
## Success Metrics
|
||||||
|
- Language switching works seamlessly
|
||||||
|
- Vector search returns accurate results in both languages
|
||||||
|
- AI chat responses are contextually appropriate per language
|
||||||
|
- User can browse Bible in preferred language
|
||||||
|
- Performance remains optimal with language filtering
|
||||||
|
|
||||||
|
## Future Considerations
|
||||||
|
- Spanish, French, German language support
|
||||||
|
- Regional dialect variations
|
||||||
|
- Audio Bible integration per language
|
||||||
|
- Collaborative translation features for community contributions
|
||||||
169
package-lock.json
generated
169
package-lock.json
generated
@@ -24,6 +24,7 @@
|
|||||||
"@tailwindcss/postcss": "^4.1.13",
|
"@tailwindcss/postcss": "^4.1.13",
|
||||||
"@types/node": "^24.5.2",
|
"@types/node": "^24.5.2",
|
||||||
"@types/pdf-parse": "^1.1.5",
|
"@types/pdf-parse": "^1.1.5",
|
||||||
|
"@types/pg": "^8.15.5",
|
||||||
"@types/react": "^19.1.13",
|
"@types/react": "^19.1.13",
|
||||||
"@types/react-dom": "^19.1.9",
|
"@types/react-dom": "^19.1.9",
|
||||||
"autoprefixer": "^10.4.21",
|
"autoprefixer": "^10.4.21",
|
||||||
@@ -35,6 +36,8 @@
|
|||||||
"next": "^15.5.3",
|
"next": "^15.5.3",
|
||||||
"openai": "^5.22.0",
|
"openai": "^5.22.0",
|
||||||
"pdf-parse": "^1.1.1",
|
"pdf-parse": "^1.1.1",
|
||||||
|
"pg": "^8.16.3",
|
||||||
|
"pgvector": "^0.2.1",
|
||||||
"postcss": "^8.5.6",
|
"postcss": "^8.5.6",
|
||||||
"prisma": "^6.16.2",
|
"prisma": "^6.16.2",
|
||||||
"react": "^19.1.1",
|
"react": "^19.1.1",
|
||||||
@@ -4182,6 +4185,17 @@
|
|||||||
"@types/node": "*"
|
"@types/node": "*"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
"node_modules/@types/pg": {
|
||||||
|
"version": "8.15.5",
|
||||||
|
"resolved": "https://registry.npmjs.org/@types/pg/-/pg-8.15.5.tgz",
|
||||||
|
"integrity": "sha512-LF7lF6zWEKxuT3/OR8wAZGzkg4ENGXFNyiV/JeOt9z5B+0ZVwbql9McqX5c/WStFq1GaGso7H1AzP/qSzmlCKQ==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"@types/node": "*",
|
||||||
|
"pg-protocol": "*",
|
||||||
|
"pg-types": "^2.2.0"
|
||||||
|
}
|
||||||
|
},
|
||||||
"node_modules/@types/prop-types": {
|
"node_modules/@types/prop-types": {
|
||||||
"version": "15.7.15",
|
"version": "15.7.15",
|
||||||
"resolved": "https://registry.npmjs.org/@types/prop-types/-/prop-types-15.7.15.tgz",
|
"resolved": "https://registry.npmjs.org/@types/prop-types/-/prop-types-15.7.15.tgz",
|
||||||
@@ -9639,6 +9653,104 @@
|
|||||||
"integrity": "sha512-xCy9V055GLEqoFaHoC1SoLIaLmWctgCUaBaWxDZ7/Zx4CTyX7cJQLJOok/orfjZAh9kEYpjJa4d0KcJmCbctZA==",
|
"integrity": "sha512-xCy9V055GLEqoFaHoC1SoLIaLmWctgCUaBaWxDZ7/Zx4CTyX7cJQLJOok/orfjZAh9kEYpjJa4d0KcJmCbctZA==",
|
||||||
"license": "MIT"
|
"license": "MIT"
|
||||||
},
|
},
|
||||||
|
"node_modules/pg": {
|
||||||
|
"version": "8.16.3",
|
||||||
|
"resolved": "https://registry.npmjs.org/pg/-/pg-8.16.3.tgz",
|
||||||
|
"integrity": "sha512-enxc1h0jA/aq5oSDMvqyW3q89ra6XIIDZgCX9vkMrnz5DFTw/Ny3Li2lFQ+pt3L6MCgm/5o2o8HW9hiJji+xvw==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"pg-connection-string": "^2.9.1",
|
||||||
|
"pg-pool": "^3.10.1",
|
||||||
|
"pg-protocol": "^1.10.3",
|
||||||
|
"pg-types": "2.2.0",
|
||||||
|
"pgpass": "1.0.5"
|
||||||
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 16.0.0"
|
||||||
|
},
|
||||||
|
"optionalDependencies": {
|
||||||
|
"pg-cloudflare": "^1.2.7"
|
||||||
|
},
|
||||||
|
"peerDependencies": {
|
||||||
|
"pg-native": ">=3.0.1"
|
||||||
|
},
|
||||||
|
"peerDependenciesMeta": {
|
||||||
|
"pg-native": {
|
||||||
|
"optional": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/pg-cloudflare": {
|
||||||
|
"version": "1.2.7",
|
||||||
|
"resolved": "https://registry.npmjs.org/pg-cloudflare/-/pg-cloudflare-1.2.7.tgz",
|
||||||
|
"integrity": "sha512-YgCtzMH0ptvZJslLM1ffsY4EuGaU0cx4XSdXLRFae8bPP4dS5xL1tNB3k2o/N64cHJpwU7dxKli/nZ2lUa5fLg==",
|
||||||
|
"license": "MIT",
|
||||||
|
"optional": true
|
||||||
|
},
|
||||||
|
"node_modules/pg-connection-string": {
|
||||||
|
"version": "2.9.1",
|
||||||
|
"resolved": "https://registry.npmjs.org/pg-connection-string/-/pg-connection-string-2.9.1.tgz",
|
||||||
|
"integrity": "sha512-nkc6NpDcvPVpZXxrreI/FOtX3XemeLl8E0qFr6F2Lrm/I8WOnaWNhIPK2Z7OHpw7gh5XJThi6j6ppgNoaT1w4w==",
|
||||||
|
"license": "MIT"
|
||||||
|
},
|
||||||
|
"node_modules/pg-int8": {
|
||||||
|
"version": "1.0.1",
|
||||||
|
"resolved": "https://registry.npmjs.org/pg-int8/-/pg-int8-1.0.1.tgz",
|
||||||
|
"integrity": "sha512-WCtabS6t3c8SkpDBUlb1kjOs7l66xsGdKpIPZsg4wR+B3+u9UAum2odSsF9tnvxg80h4ZxLWMy4pRjOsFIqQpw==",
|
||||||
|
"license": "ISC",
|
||||||
|
"engines": {
|
||||||
|
"node": ">=4.0.0"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/pg-pool": {
|
||||||
|
"version": "3.10.1",
|
||||||
|
"resolved": "https://registry.npmjs.org/pg-pool/-/pg-pool-3.10.1.tgz",
|
||||||
|
"integrity": "sha512-Tu8jMlcX+9d8+QVzKIvM/uJtp07PKr82IUOYEphaWcoBhIYkoHpLXN3qO59nAI11ripznDsEzEv8nUxBVWajGg==",
|
||||||
|
"license": "MIT",
|
||||||
|
"peerDependencies": {
|
||||||
|
"pg": ">=8.0"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/pg-protocol": {
|
||||||
|
"version": "1.10.3",
|
||||||
|
"resolved": "https://registry.npmjs.org/pg-protocol/-/pg-protocol-1.10.3.tgz",
|
||||||
|
"integrity": "sha512-6DIBgBQaTKDJyxnXaLiLR8wBpQQcGWuAESkRBX/t6OwA8YsqP+iVSiond2EDy6Y/dsGk8rh/jtax3js5NeV7JQ==",
|
||||||
|
"license": "MIT"
|
||||||
|
},
|
||||||
|
"node_modules/pg-types": {
|
||||||
|
"version": "2.2.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/pg-types/-/pg-types-2.2.0.tgz",
|
||||||
|
"integrity": "sha512-qTAAlrEsl8s4OiEQY69wDvcMIdQN6wdz5ojQiOy6YRMuynxenON0O5oCpJI6lshc6scgAY8qvJ2On/p+CXY0GA==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"pg-int8": "1.0.1",
|
||||||
|
"postgres-array": "~2.0.0",
|
||||||
|
"postgres-bytea": "~1.0.0",
|
||||||
|
"postgres-date": "~1.0.4",
|
||||||
|
"postgres-interval": "^1.1.0"
|
||||||
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": ">=4"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/pgpass": {
|
||||||
|
"version": "1.0.5",
|
||||||
|
"resolved": "https://registry.npmjs.org/pgpass/-/pgpass-1.0.5.tgz",
|
||||||
|
"integrity": "sha512-FdW9r/jQZhSeohs1Z3sI1yxFQNFvMcnmfuj4WBMUTxOrAyLMaTcE1aAMBiTlbMNaXvBCQuVi0R7hd8udDSP7ug==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"split2": "^4.1.0"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/pgvector": {
|
||||||
|
"version": "0.2.1",
|
||||||
|
"resolved": "https://registry.npmjs.org/pgvector/-/pgvector-0.2.1.tgz",
|
||||||
|
"integrity": "sha512-nKaQY9wtuiidwLMdVIce1O3kL0d+FxrigCVzsShnoqzOSaWWWOvuctb/sYwlai5cTwwzRSNa+a/NtN2kVZGNJw==",
|
||||||
|
"license": "MIT",
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 18"
|
||||||
|
}
|
||||||
|
},
|
||||||
"node_modules/picocolors": {
|
"node_modules/picocolors": {
|
||||||
"version": "1.1.1",
|
"version": "1.1.1",
|
||||||
"resolved": "https://registry.npmjs.org/picocolors/-/picocolors-1.1.1.tgz",
|
"resolved": "https://registry.npmjs.org/picocolors/-/picocolors-1.1.1.tgz",
|
||||||
@@ -9726,6 +9838,45 @@
|
|||||||
"integrity": "sha512-1NNCs6uurfkVbeXG4S8JFT9t19m45ICnif8zWLd5oPSZ50QnwMfK+H3jv408d4jw/7Bttv5axS5IiHoLaVNHeQ==",
|
"integrity": "sha512-1NNCs6uurfkVbeXG4S8JFT9t19m45ICnif8zWLd5oPSZ50QnwMfK+H3jv408d4jw/7Bttv5axS5IiHoLaVNHeQ==",
|
||||||
"license": "MIT"
|
"license": "MIT"
|
||||||
},
|
},
|
||||||
|
"node_modules/postgres-array": {
|
||||||
|
"version": "2.0.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/postgres-array/-/postgres-array-2.0.0.tgz",
|
||||||
|
"integrity": "sha512-VpZrUqU5A69eQyW2c5CA1jtLecCsN2U/bD6VilrFDWq5+5UIEVO7nazS3TEcHf1zuPYO/sqGvUvW62g86RXZuA==",
|
||||||
|
"license": "MIT",
|
||||||
|
"engines": {
|
||||||
|
"node": ">=4"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/postgres-bytea": {
|
||||||
|
"version": "1.0.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/postgres-bytea/-/postgres-bytea-1.0.0.tgz",
|
||||||
|
"integrity": "sha512-xy3pmLuQqRBZBXDULy7KbaitYqLcmxigw14Q5sj8QBVLqEwXfeybIKVWiqAXTlcvdvb0+xkOtDbfQMOf4lST1w==",
|
||||||
|
"license": "MIT",
|
||||||
|
"engines": {
|
||||||
|
"node": ">=0.10.0"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/postgres-date": {
|
||||||
|
"version": "1.0.7",
|
||||||
|
"resolved": "https://registry.npmjs.org/postgres-date/-/postgres-date-1.0.7.tgz",
|
||||||
|
"integrity": "sha512-suDmjLVQg78nMK2UZ454hAG+OAW+HQPZ6n++TNDUX+L0+uUlLywnoxJKDou51Zm+zTCjrCl0Nq6J9C5hP9vK/Q==",
|
||||||
|
"license": "MIT",
|
||||||
|
"engines": {
|
||||||
|
"node": ">=0.10.0"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"node_modules/postgres-interval": {
|
||||||
|
"version": "1.2.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/postgres-interval/-/postgres-interval-1.2.0.tgz",
|
||||||
|
"integrity": "sha512-9ZhXKM/rw350N1ovuWHbGxnGh/SNJ4cnxHiM0rxE4VN41wsg8P8zWn9hv/buK00RP4WvlOyr/RBDiptyxVbkZQ==",
|
||||||
|
"license": "MIT",
|
||||||
|
"dependencies": {
|
||||||
|
"xtend": "^4.0.0"
|
||||||
|
},
|
||||||
|
"engines": {
|
||||||
|
"node": ">=0.10.0"
|
||||||
|
}
|
||||||
|
},
|
||||||
"node_modules/pretty-format": {
|
"node_modules/pretty-format": {
|
||||||
"version": "27.5.1",
|
"version": "27.5.1",
|
||||||
"resolved": "https://registry.npmjs.org/pretty-format/-/pretty-format-27.5.1.tgz",
|
"resolved": "https://registry.npmjs.org/pretty-format/-/pretty-format-27.5.1.tgz",
|
||||||
@@ -10480,6 +10631,15 @@
|
|||||||
"url": "https://github.com/sponsors/wooorm"
|
"url": "https://github.com/sponsors/wooorm"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
"node_modules/split2": {
|
||||||
|
"version": "4.2.0",
|
||||||
|
"resolved": "https://registry.npmjs.org/split2/-/split2-4.2.0.tgz",
|
||||||
|
"integrity": "sha512-UcjcJOWknrNkF6PLX83qcHM6KHgVKNkV62Y8a5uYDVv9ydGQVwAHMKqHdJje1VTWpljG0WYpCDhrCdAOYH4TWg==",
|
||||||
|
"license": "ISC",
|
||||||
|
"engines": {
|
||||||
|
"node": ">= 10.x"
|
||||||
|
}
|
||||||
|
},
|
||||||
"node_modules/sprintf-js": {
|
"node_modules/sprintf-js": {
|
||||||
"version": "1.0.3",
|
"version": "1.0.3",
|
||||||
"resolved": "https://registry.npmjs.org/sprintf-js/-/sprintf-js-1.0.3.tgz",
|
"resolved": "https://registry.npmjs.org/sprintf-js/-/sprintf-js-1.0.3.tgz",
|
||||||
@@ -11638,6 +11798,15 @@
|
|||||||
"node": ">=0.4.0"
|
"node": ">=0.4.0"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
"node_modules/xtend": {
|
||||||
|
"version": "4.0.2",
|
||||||
|
"resolved": "https://registry.npmjs.org/xtend/-/xtend-4.0.2.tgz",
|
||||||
|
"integrity": "sha512-LKYU1iAXJXUgAXn9URjiu+MWhyUXHsvfp7mcuYm9dSUKK0/CjtrUwFAxD82/mCWbtLsGjFIad0wIsod4zrTAEQ==",
|
||||||
|
"license": "MIT",
|
||||||
|
"engines": {
|
||||||
|
"node": ">=0.4"
|
||||||
|
}
|
||||||
|
},
|
||||||
"node_modules/y18n": {
|
"node_modules/y18n": {
|
||||||
"version": "5.0.8",
|
"version": "5.0.8",
|
||||||
"resolved": "https://registry.npmjs.org/y18n/-/y18n-5.0.8.tgz",
|
"resolved": "https://registry.npmjs.org/y18n/-/y18n-5.0.8.tgz",
|
||||||
|
|||||||
@@ -37,6 +37,7 @@
|
|||||||
"@tailwindcss/postcss": "^4.1.13",
|
"@tailwindcss/postcss": "^4.1.13",
|
||||||
"@types/node": "^24.5.2",
|
"@types/node": "^24.5.2",
|
||||||
"@types/pdf-parse": "^1.1.5",
|
"@types/pdf-parse": "^1.1.5",
|
||||||
|
"@types/pg": "^8.15.5",
|
||||||
"@types/react": "^19.1.13",
|
"@types/react": "^19.1.13",
|
||||||
"@types/react-dom": "^19.1.9",
|
"@types/react-dom": "^19.1.9",
|
||||||
"autoprefixer": "^10.4.21",
|
"autoprefixer": "^10.4.21",
|
||||||
@@ -48,6 +49,8 @@
|
|||||||
"next": "^15.5.3",
|
"next": "^15.5.3",
|
||||||
"openai": "^5.22.0",
|
"openai": "^5.22.0",
|
||||||
"pdf-parse": "^1.1.1",
|
"pdf-parse": "^1.1.1",
|
||||||
|
"pg": "^8.16.3",
|
||||||
|
"pgvector": "^0.2.1",
|
||||||
"postcss": "^8.5.6",
|
"postcss": "^8.5.6",
|
||||||
"prisma": "^6.16.2",
|
"prisma": "^6.16.2",
|
||||||
"react": "^19.1.1",
|
"react": "^19.1.1",
|
||||||
|
|||||||
@@ -78,6 +78,26 @@ model BibleVerse {
|
|||||||
@@index([version])
|
@@index([version])
|
||||||
}
|
}
|
||||||
|
|
||||||
|
model BiblePassage {
|
||||||
|
id String @id @default(uuid())
|
||||||
|
testament String // 'OT' or 'NT'
|
||||||
|
book String
|
||||||
|
chapter Int
|
||||||
|
verse Int
|
||||||
|
ref String // Generated field: "book chapter:verse"
|
||||||
|
lang String @default("ro")
|
||||||
|
translation String @default("FIDELA")
|
||||||
|
textRaw String @db.Text
|
||||||
|
textNorm String @db.Text // Normalized text for embedding
|
||||||
|
embedding Unsupported("vector(3072)")?
|
||||||
|
createdAt DateTime @default(now())
|
||||||
|
updatedAt DateTime @updatedAt
|
||||||
|
|
||||||
|
@@unique([translation, lang, book, chapter, verse])
|
||||||
|
@@index([book, chapter])
|
||||||
|
@@index([testament])
|
||||||
|
}
|
||||||
|
|
||||||
model ChatMessage {
|
model ChatMessage {
|
||||||
id String @id @default(uuid())
|
id String @id @default(uuid())
|
||||||
userId String
|
userId String
|
||||||
|
|||||||
121
scripts/bible_search.py
Normal file
121
scripts/bible_search.py
Normal file
@@ -0,0 +1,121 @@
|
|||||||
|
import os
|
||||||
|
import asyncio
|
||||||
|
from typing import List, Dict
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
import httpx
|
||||||
|
import psycopg
|
||||||
|
from psycopg.rows import dict_row
|
||||||
|
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
AZ_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT", "").rstrip("/")
|
||||||
|
AZ_API_KEY = os.getenv("AZURE_OPENAI_KEY")
|
||||||
|
AZ_API_VER = os.getenv("AZURE_OPENAI_API_VERSION", "2024-05-01-preview")
|
||||||
|
AZ_DEPLOYMENT = os.getenv("AZURE_OPENAI_EMBED_DEPLOYMENT", "embed-3")
|
||||||
|
DB_URL = os.getenv("DATABASE_URL")
|
||||||
|
|
||||||
|
EMBED_URL = f"{AZ_ENDPOINT}/openai/deployments/{AZ_DEPLOYMENT}/embeddings?api-version={AZ_API_VER}"
|
||||||
|
|
||||||
|
async def get_embedding(text: str) -> List[float]:
|
||||||
|
"""Get embedding for a text using Azure OpenAI"""
|
||||||
|
payload = {"input": [text]}
|
||||||
|
headers = {"api-key": AZ_API_KEY, "Content-Type": "application/json"}
|
||||||
|
|
||||||
|
async with httpx.AsyncClient() as client:
|
||||||
|
for attempt in range(3):
|
||||||
|
try:
|
||||||
|
r = await client.post(EMBED_URL, headers=headers, json=payload, timeout=30)
|
||||||
|
if r.status_code == 200:
|
||||||
|
data = r.json()
|
||||||
|
return data["data"][0]["embedding"]
|
||||||
|
elif r.status_code in (429, 500, 503):
|
||||||
|
backoff = 2 ** attempt
|
||||||
|
await asyncio.sleep(backoff)
|
||||||
|
else:
|
||||||
|
raise RuntimeError(f"Embedding error {r.status_code}: {r.text}")
|
||||||
|
except Exception as e:
|
||||||
|
if attempt == 2:
|
||||||
|
raise e
|
||||||
|
await asyncio.sleep(2 ** attempt)
|
||||||
|
|
||||||
|
async def search_bible_semantic(query: str, limit: int = 10) -> List[Dict]:
|
||||||
|
"""Search Bible using semantic similarity"""
|
||||||
|
# Get embedding for the query
|
||||||
|
query_embedding = await get_embedding(query)
|
||||||
|
|
||||||
|
# Search for similar verses
|
||||||
|
with psycopg.connect(DB_URL, row_factory=dict_row) as conn:
|
||||||
|
with conn.cursor() as cur:
|
||||||
|
cur.execute("""
|
||||||
|
SELECT ref, book, chapter, verse, text_raw,
|
||||||
|
1 - (embedding <=> %s) AS similarity
|
||||||
|
FROM bible_passages
|
||||||
|
WHERE embedding IS NOT NULL
|
||||||
|
ORDER BY embedding <=> %s
|
||||||
|
LIMIT %s
|
||||||
|
""", (query_embedding, query_embedding, limit))
|
||||||
|
|
||||||
|
return cur.fetchall()
|
||||||
|
|
||||||
|
async def search_bible_hybrid(query: str, limit: int = 10) -> List[Dict]:
|
||||||
|
"""Search Bible using hybrid semantic + lexical search"""
|
||||||
|
# Get embedding for the query
|
||||||
|
query_embedding = await get_embedding(query)
|
||||||
|
|
||||||
|
# Create search query for full-text search
|
||||||
|
search_query = " & ".join(query.split())
|
||||||
|
|
||||||
|
with psycopg.connect(DB_URL, row_factory=dict_row) as conn:
|
||||||
|
with conn.cursor() as cur:
|
||||||
|
cur.execute("""
|
||||||
|
WITH vector_search AS (
|
||||||
|
SELECT id, 1 - (embedding <=> %s) AS vector_sim
|
||||||
|
FROM bible_passages
|
||||||
|
WHERE embedding IS NOT NULL
|
||||||
|
ORDER BY embedding <=> %s
|
||||||
|
LIMIT 100
|
||||||
|
),
|
||||||
|
text_search AS (
|
||||||
|
SELECT id, ts_rank(tsv, plainto_tsquery('romanian', %s)) AS text_rank
|
||||||
|
FROM bible_passages
|
||||||
|
WHERE tsv @@ plainto_tsquery('romanian', %s)
|
||||||
|
)
|
||||||
|
SELECT bp.ref, bp.book, bp.chapter, bp.verse, bp.text_raw,
|
||||||
|
COALESCE(vs.vector_sim, 0) * 0.7 + COALESCE(ts.text_rank, 0) * 0.3 AS combined_score
|
||||||
|
FROM bible_passages bp
|
||||||
|
LEFT JOIN vector_search vs ON vs.id = bp.id
|
||||||
|
LEFT JOIN text_search ts ON ts.id = bp.id
|
||||||
|
WHERE vs.id IS NOT NULL OR ts.id IS NOT NULL
|
||||||
|
ORDER BY combined_score DESC
|
||||||
|
LIMIT %s
|
||||||
|
""", (query_embedding, query_embedding, query, query, limit))
|
||||||
|
|
||||||
|
return cur.fetchall()
|
||||||
|
|
||||||
|
async def get_context_verses(book: str, chapter: int, verse: int, context_size: int = 2) -> List[Dict]:
|
||||||
|
"""Get surrounding verses for context"""
|
||||||
|
with psycopg.connect(DB_URL, row_factory=dict_row) as conn:
|
||||||
|
with conn.cursor() as cur:
|
||||||
|
cur.execute("""
|
||||||
|
SELECT ref, book, chapter, verse, text_raw
|
||||||
|
FROM bible_passages
|
||||||
|
WHERE book = %s AND chapter = %s
|
||||||
|
AND verse BETWEEN %s AND %s
|
||||||
|
ORDER BY verse
|
||||||
|
""", (book, chapter, verse - context_size, verse + context_size))
|
||||||
|
|
||||||
|
return cur.fetchall()
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
async def test_search():
|
||||||
|
results = await search_bible_semantic("dragoste", 5)
|
||||||
|
print("Semantic search results for 'dragoste':")
|
||||||
|
for result in results:
|
||||||
|
print(f"{result['ref']}: {result['text_raw'][:100]}... (similarity: {result['similarity']:.3f})")
|
||||||
|
|
||||||
|
print("\nHybrid search results for 'dragoste':")
|
||||||
|
hybrid_results = await search_bible_hybrid("dragoste", 5)
|
||||||
|
for result in hybrid_results:
|
||||||
|
print(f"{result['ref']}: {result['text_raw'][:100]}... (score: {result['combined_score']:.3f})")
|
||||||
|
|
||||||
|
asyncio.run(test_search())
|
||||||
305
scripts/import-romanian-bible-md.ts
Normal file
305
scripts/import-romanian-bible-md.ts
Normal file
@@ -0,0 +1,305 @@
|
|||||||
|
import { PrismaClient } from '@prisma/client'
|
||||||
|
import * as fs from 'fs'
|
||||||
|
import * as path from 'path'
|
||||||
|
|
||||||
|
const prisma = new PrismaClient()
|
||||||
|
|
||||||
|
// Book name mappings from Romanian to standardized names
|
||||||
|
const BOOK_MAPPINGS: Record<string, { name: string; abbreviation: string; testament: string; orderNum: number }> = {
|
||||||
|
'Geneza': { name: 'Geneza', abbreviation: 'GEN', testament: 'OT', orderNum: 1 },
|
||||||
|
'Exodul': { name: 'Exodul', abbreviation: 'EXO', testament: 'OT', orderNum: 2 },
|
||||||
|
'Leviticul': { name: 'Leviticul', abbreviation: 'LEV', testament: 'OT', orderNum: 3 },
|
||||||
|
'Numeri': { name: 'Numerii', abbreviation: 'NUM', testament: 'OT', orderNum: 4 },
|
||||||
|
'Deuteronom': { name: 'Deuteronomul', abbreviation: 'DEU', testament: 'OT', orderNum: 5 },
|
||||||
|
'Iosua': { name: 'Iosua', abbreviation: 'JOS', testament: 'OT', orderNum: 6 },
|
||||||
|
'Judecători': { name: 'Judecătorii', abbreviation: 'JDG', testament: 'OT', orderNum: 7 },
|
||||||
|
'Rut': { name: 'Rut', abbreviation: 'RUT', testament: 'OT', orderNum: 8 },
|
||||||
|
'1 Samuel': { name: '1 Samuel', abbreviation: '1SA', testament: 'OT', orderNum: 9 },
|
||||||
|
'2 Samuel': { name: '2 Samuel', abbreviation: '2SA', testament: 'OT', orderNum: 10 },
|
||||||
|
'1 Imparati': { name: '1 Împărați', abbreviation: '1KI', testament: 'OT', orderNum: 11 },
|
||||||
|
'2 Imparati': { name: '2 Împărați', abbreviation: '2KI', testament: 'OT', orderNum: 12 },
|
||||||
|
'1 Cronici': { name: '1 Cronici', abbreviation: '1CH', testament: 'OT', orderNum: 13 },
|
||||||
|
'2 Cronici': { name: '2 Cronici', abbreviation: '2CH', testament: 'OT', orderNum: 14 },
|
||||||
|
'Ezra': { name: 'Ezra', abbreviation: 'EZR', testament: 'OT', orderNum: 15 },
|
||||||
|
'Neemia': { name: 'Neemia', abbreviation: 'NEH', testament: 'OT', orderNum: 16 },
|
||||||
|
'Estera': { name: 'Estera', abbreviation: 'EST', testament: 'OT', orderNum: 17 },
|
||||||
|
'Iov': { name: 'Iov', abbreviation: 'JOB', testament: 'OT', orderNum: 18 },
|
||||||
|
'Psalmii': { name: 'Psalmii', abbreviation: 'PSA', testament: 'OT', orderNum: 19 },
|
||||||
|
'Proverbe': { name: 'Proverbele', abbreviation: 'PRO', testament: 'OT', orderNum: 20 },
|
||||||
|
'Eclesiastul': { name: 'Eclesiastul', abbreviation: 'ECC', testament: 'OT', orderNum: 21 },
|
||||||
|
'Cântarea Cântărilor': { name: 'Cântarea Cântărilor', abbreviation: 'SNG', testament: 'OT', orderNum: 22 },
|
||||||
|
'Isaia': { name: 'Isaia', abbreviation: 'ISA', testament: 'OT', orderNum: 23 },
|
||||||
|
'Ieremia': { name: 'Ieremia', abbreviation: 'JER', testament: 'OT', orderNum: 24 },
|
||||||
|
'Plângerile': { name: 'Plângerile', abbreviation: 'LAM', testament: 'OT', orderNum: 25 },
|
||||||
|
'Ezechiel': { name: 'Ezechiel', abbreviation: 'EZK', testament: 'OT', orderNum: 26 },
|
||||||
|
'Daniel': { name: 'Daniel', abbreviation: 'DAN', testament: 'OT', orderNum: 27 },
|
||||||
|
'Osea': { name: 'Osea', abbreviation: 'HOS', testament: 'OT', orderNum: 28 },
|
||||||
|
'Ioel': { name: 'Ioel', abbreviation: 'JOL', testament: 'OT', orderNum: 29 },
|
||||||
|
'Amos': { name: 'Amos', abbreviation: 'AMO', testament: 'OT', orderNum: 30 },
|
||||||
|
'Obadia': { name: 'Obadia', abbreviation: 'OBA', testament: 'OT', orderNum: 31 },
|
||||||
|
'Iona': { name: 'Iona', abbreviation: 'JON', testament: 'OT', orderNum: 32 },
|
||||||
|
'Mica': { name: 'Mica', abbreviation: 'MIC', testament: 'OT', orderNum: 33 },
|
||||||
|
'Naum': { name: 'Naum', abbreviation: 'NAM', testament: 'OT', orderNum: 34 },
|
||||||
|
'Habacuc': { name: 'Habacuc', abbreviation: 'HAB', testament: 'OT', orderNum: 35 },
|
||||||
|
'Țefania': { name: 'Țefania', abbreviation: 'ZEP', testament: 'OT', orderNum: 36 },
|
||||||
|
'Hagai': { name: 'Hagai', abbreviation: 'HAG', testament: 'OT', orderNum: 37 },
|
||||||
|
'Zaharia': { name: 'Zaharia', abbreviation: 'ZEC', testament: 'OT', orderNum: 38 },
|
||||||
|
'Maleahi': { name: 'Maleahi', abbreviation: 'MAL', testament: 'OT', orderNum: 39 },
|
||||||
|
|
||||||
|
// New Testament
|
||||||
|
'Matei': { name: 'Matei', abbreviation: 'MAT', testament: 'NT', orderNum: 40 },
|
||||||
|
'Marcu': { name: 'Marcu', abbreviation: 'MRK', testament: 'NT', orderNum: 41 },
|
||||||
|
'Luca': { name: 'Luca', abbreviation: 'LUK', testament: 'NT', orderNum: 42 },
|
||||||
|
'Ioan': { name: 'Ioan', abbreviation: 'JHN', testament: 'NT', orderNum: 43 },
|
||||||
|
'Faptele Apostolilor': { name: 'Faptele Apostolilor', abbreviation: 'ACT', testament: 'NT', orderNum: 44 },
|
||||||
|
'Romani': { name: 'Romani', abbreviation: 'ROM', testament: 'NT', orderNum: 45 },
|
||||||
|
'1 Corinteni': { name: '1 Corinteni', abbreviation: '1CO', testament: 'NT', orderNum: 46 },
|
||||||
|
'2 Corinteni': { name: '2 Corinteni', abbreviation: '2CO', testament: 'NT', orderNum: 47 },
|
||||||
|
'Galateni': { name: 'Galateni', abbreviation: 'GAL', testament: 'NT', orderNum: 48 },
|
||||||
|
'Efeseni': { name: 'Efeseni', abbreviation: 'EPH', testament: 'NT', orderNum: 49 },
|
||||||
|
'Filipeni': { name: 'Filipeni', abbreviation: 'PHP', testament: 'NT', orderNum: 50 },
|
||||||
|
'Coloseni': { name: 'Coloseni', abbreviation: 'COL', testament: 'NT', orderNum: 51 },
|
||||||
|
'1 Tesaloniceni': { name: '1 Tesaloniceni', abbreviation: '1TH', testament: 'NT', orderNum: 52 },
|
||||||
|
'2 Tesaloniceni': { name: '2 Tesaloniceni', abbreviation: '2TH', testament: 'NT', orderNum: 53 },
|
||||||
|
'1 Timotei': { name: '1 Timotei', abbreviation: '1TI', testament: 'NT', orderNum: 54 },
|
||||||
|
'2 Timotei': { name: '2 Timotei', abbreviation: '2TI', testament: 'NT', orderNum: 55 },
|
||||||
|
'Titus': { name: 'Titus', abbreviation: 'TIT', testament: 'NT', orderNum: 56 },
|
||||||
|
'Filimon': { name: 'Filimon', abbreviation: 'PHM', testament: 'NT', orderNum: 57 },
|
||||||
|
'Evrei': { name: 'Evrei', abbreviation: 'HEB', testament: 'NT', orderNum: 58 },
|
||||||
|
'Iacov': { name: 'Iacov', abbreviation: 'JAS', testament: 'NT', orderNum: 59 },
|
||||||
|
'1 Petru': { name: '1 Petru', abbreviation: '1PE', testament: 'NT', orderNum: 60 },
|
||||||
|
'2 Petru': { name: '2 Petru', abbreviation: '2PE', testament: 'NT', orderNum: 61 },
|
||||||
|
'1 Ioan': { name: '1 Ioan', abbreviation: '1JN', testament: 'NT', orderNum: 62 },
|
||||||
|
'2 Ioan': { name: '2 Ioan', abbreviation: '2JN', testament: 'NT', orderNum: 63 },
|
||||||
|
'3 Ioan': { name: '3 Ioan', abbreviation: '3JN', testament: 'NT', orderNum: 64 },
|
||||||
|
'Iuda': { name: 'Iuda', abbreviation: 'JUD', testament: 'NT', orderNum: 65 },
|
||||||
|
'Revelaţia': { name: 'Revelația', abbreviation: 'REV', testament: 'NT', orderNum: 66 },
|
||||||
|
}
|
||||||
|
|
||||||
|
interface ParsedVerse {
|
||||||
|
verseNum: number
|
||||||
|
text: string
|
||||||
|
}
|
||||||
|
|
||||||
|
interface ParsedChapter {
|
||||||
|
chapterNum: number
|
||||||
|
verses: ParsedVerse[]
|
||||||
|
}
|
||||||
|
|
||||||
|
interface ParsedBook {
|
||||||
|
name: string
|
||||||
|
chapters: ParsedChapter[]
|
||||||
|
}
|
||||||
|
|
||||||
|
async function parseRomanianBible(filePath: string): Promise<ParsedBook[]> {
|
||||||
|
console.log(`Reading Romanian Bible from: ${filePath}`)
|
||||||
|
|
||||||
|
const content = fs.readFileSync(filePath, 'utf-8')
|
||||||
|
const lines = content.split('\n')
|
||||||
|
|
||||||
|
const books: ParsedBook[] = []
|
||||||
|
let currentBook: ParsedBook | null = null
|
||||||
|
let currentChapter: ParsedChapter | null = null
|
||||||
|
let isInBibleContent = false
|
||||||
|
|
||||||
|
for (let i = 0; i < lines.length; i++) {
|
||||||
|
const line = lines[i].trim()
|
||||||
|
|
||||||
|
// Start processing after "VECHIUL TESTAMENT"
|
||||||
|
if (line === 'VECHIUL TESTAMENT' || line === 'TESTAMENT') {
|
||||||
|
isInBibleContent = true
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!isInBibleContent) continue
|
||||||
|
|
||||||
|
// Book detection: … BookName …
|
||||||
|
const bookMatch = line.match(/^…\s*(.+?)\s*…$/)
|
||||||
|
if (bookMatch) {
|
||||||
|
// Save previous book if exists
|
||||||
|
if (currentBook && currentBook.chapters.length > 0) {
|
||||||
|
books.push(currentBook)
|
||||||
|
}
|
||||||
|
|
||||||
|
const bookName = bookMatch[1].trim()
|
||||||
|
console.log(`Found book: ${bookName}`)
|
||||||
|
|
||||||
|
currentBook = {
|
||||||
|
name: bookName,
|
||||||
|
chapters: []
|
||||||
|
}
|
||||||
|
currentChapter = null
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Chapter detection: Capitolul X or CApitoLuL X
|
||||||
|
const chapterMatch = line.match(/^[cC][aA][pP][iI][tT][oO][lL][uU][lL]\s+(\d+)$/i)
|
||||||
|
if (chapterMatch && currentBook) {
|
||||||
|
// Save previous chapter if exists
|
||||||
|
if (currentChapter && currentChapter.verses.length > 0) {
|
||||||
|
currentBook.chapters.push(currentChapter)
|
||||||
|
}
|
||||||
|
|
||||||
|
const chapterNum = parseInt(chapterMatch[1])
|
||||||
|
console.log(` Chapter ${chapterNum}`)
|
||||||
|
|
||||||
|
currentChapter = {
|
||||||
|
chapterNum,
|
||||||
|
verses: []
|
||||||
|
}
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verse detection: starts with number
|
||||||
|
const verseMatch = line.match(/^(\d+)\s+(.+)$/)
|
||||||
|
if (verseMatch && currentChapter) {
|
||||||
|
const verseNum = parseInt(verseMatch[1])
|
||||||
|
let verseText = verseMatch[2].trim()
|
||||||
|
|
||||||
|
// Handle paragraph markers
|
||||||
|
verseText = verseText.replace(/^¶\s*/, '')
|
||||||
|
|
||||||
|
// Look ahead for continuation lines (lines that don't start with numbers or special markers)
|
||||||
|
let j = i + 1
|
||||||
|
while (j < lines.length) {
|
||||||
|
const nextLine = lines[j].trim()
|
||||||
|
|
||||||
|
// Stop if we hit a new verse, chapter, book, or empty line
|
||||||
|
if (!nextLine ||
|
||||||
|
nextLine.match(/^\d+\s/) || // New verse
|
||||||
|
nextLine.match(/^[cC][aA][pP][iI][tT][oO][lL][uU][lL]\s+\d+$/i) || // New chapter
|
||||||
|
nextLine.match(/^….*…$/) || // New book
|
||||||
|
nextLine === 'TESTAMENT') { // Testament marker
|
||||||
|
break
|
||||||
|
}
|
||||||
|
|
||||||
|
// Add continuation line
|
||||||
|
verseText += ' ' + nextLine
|
||||||
|
j++
|
||||||
|
}
|
||||||
|
|
||||||
|
// Clean up the text
|
||||||
|
verseText = verseText.replace(/\s+/g, ' ').trim()
|
||||||
|
|
||||||
|
currentChapter.verses.push({
|
||||||
|
verseNum,
|
||||||
|
text: verseText
|
||||||
|
})
|
||||||
|
|
||||||
|
// Skip the lines we've processed
|
||||||
|
i = j - 1
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Save the last book and chapter
|
||||||
|
if (currentChapter && currentChapter.verses.length > 0 && currentBook) {
|
||||||
|
currentBook.chapters.push(currentChapter)
|
||||||
|
}
|
||||||
|
if (currentBook && currentBook.chapters.length > 0) {
|
||||||
|
books.push(currentBook)
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log(`Parsed ${books.length} books`)
|
||||||
|
return books
|
||||||
|
}
|
||||||
|
|
||||||
|
async function importRomanianBible() {
|
||||||
|
try {
|
||||||
|
console.log('Starting Romanian Bible import...')
|
||||||
|
|
||||||
|
// Clear existing data
|
||||||
|
console.log('Clearing existing data...')
|
||||||
|
await prisma.bibleVerse.deleteMany()
|
||||||
|
await prisma.bibleChapter.deleteMany()
|
||||||
|
await prisma.bibleBook.deleteMany()
|
||||||
|
|
||||||
|
// Parse the markdown file
|
||||||
|
const filePath = path.join(process.cwd(), 'bibles', 'Biblia-Fidela-limba-romana.md')
|
||||||
|
const books = await parseRomanianBible(filePath)
|
||||||
|
|
||||||
|
console.log(`Importing ${books.length} books into database...`)
|
||||||
|
|
||||||
|
for (const book of books) {
|
||||||
|
const bookInfo = BOOK_MAPPINGS[book.name]
|
||||||
|
if (!bookInfo) {
|
||||||
|
console.warn(`Warning: No mapping found for book "${book.name}", skipping...`)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log(`Creating book: ${bookInfo.name}`)
|
||||||
|
|
||||||
|
// Create book
|
||||||
|
const createdBook = await prisma.bibleBook.create({
|
||||||
|
data: {
|
||||||
|
id: bookInfo.orderNum,
|
||||||
|
name: bookInfo.name,
|
||||||
|
testament: bookInfo.testament,
|
||||||
|
orderNum: bookInfo.orderNum
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
// Create chapters and verses
|
||||||
|
for (const chapter of book.chapters) {
|
||||||
|
console.log(` Creating chapter ${chapter.chapterNum} with ${chapter.verses.length} verses`)
|
||||||
|
|
||||||
|
const createdChapter = await prisma.bibleChapter.create({
|
||||||
|
data: {
|
||||||
|
bookId: createdBook.id,
|
||||||
|
chapterNum: chapter.chapterNum
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
// Create verses in batch (deduplicate by verse number)
|
||||||
|
const uniqueVerses = chapter.verses.reduce((acc, verse) => {
|
||||||
|
acc[verse.verseNum] = verse // This will overwrite duplicates
|
||||||
|
return acc
|
||||||
|
}, {} as Record<number, ParsedVerse>)
|
||||||
|
|
||||||
|
const versesData = Object.values(uniqueVerses).map(verse => ({
|
||||||
|
chapterId: createdChapter.id,
|
||||||
|
verseNum: verse.verseNum,
|
||||||
|
text: verse.text,
|
||||||
|
version: 'FIDELA'
|
||||||
|
}))
|
||||||
|
|
||||||
|
if (versesData.length > 0) {
|
||||||
|
await prisma.bibleVerse.createMany({
|
||||||
|
data: versesData
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Print summary
|
||||||
|
const bookCount = await prisma.bibleBook.count()
|
||||||
|
const chapterCount = await prisma.bibleChapter.count()
|
||||||
|
const verseCount = await prisma.bibleVerse.count()
|
||||||
|
|
||||||
|
console.log('\n✅ Romanian Bible import completed successfully!')
|
||||||
|
console.log(`📚 Books imported: ${bookCount}`)
|
||||||
|
console.log(`📖 Chapters imported: ${chapterCount}`)
|
||||||
|
console.log(`📝 Verses imported: ${verseCount}`)
|
||||||
|
|
||||||
|
} catch (error) {
|
||||||
|
console.error('❌ Error importing Romanian Bible:', error)
|
||||||
|
throw error
|
||||||
|
} finally {
|
||||||
|
await prisma.$disconnect()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Run the import
|
||||||
|
if (require.main === module) {
|
||||||
|
importRomanianBible()
|
||||||
|
.then(() => {
|
||||||
|
console.log('Import completed successfully!')
|
||||||
|
process.exit(0)
|
||||||
|
})
|
||||||
|
.catch((error) => {
|
||||||
|
console.error('Import failed:', error)
|
||||||
|
process.exit(1)
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
export { importRomanianBible }
|
||||||
231
scripts/ingest_bible_pgvector.py
Normal file
231
scripts/ingest_bible_pgvector.py
Normal file
@@ -0,0 +1,231 @@
|
|||||||
|
import os, re, json, math, time, asyncio
|
||||||
|
from typing import List, Dict, Tuple, Iterable
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
import httpx
|
||||||
|
import psycopg
|
||||||
|
from psycopg.rows import dict_row
|
||||||
|
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
AZ_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT", "").rstrip("/")
|
||||||
|
AZ_API_KEY = os.getenv("AZURE_OPENAI_KEY")
|
||||||
|
AZ_API_VER = os.getenv("AZURE_OPENAI_API_VERSION", "2024-05-01-preview")
|
||||||
|
AZ_DEPLOYMENT = os.getenv("AZURE_OPENAI_EMBED_DEPLOYMENT", "embed-3")
|
||||||
|
EMBED_DIMS = int(os.getenv("EMBED_DIMS", "3072"))
|
||||||
|
DB_URL = os.getenv("DATABASE_URL")
|
||||||
|
BIBLE_MD_PATH = os.getenv("BIBLE_MD_PATH")
|
||||||
|
LANG_CODE = os.getenv("LANG_CODE", "ro")
|
||||||
|
TRANSLATION = os.getenv("TRANSLATION_CODE", "FIDELA")
|
||||||
|
|
||||||
|
assert AZ_ENDPOINT and AZ_API_KEY and DB_URL and BIBLE_MD_PATH, "Missing required env vars"
|
||||||
|
|
||||||
|
EMBED_URL = f"{AZ_ENDPOINT}/openai/deployments/{AZ_DEPLOYMENT}/embeddings?api-version={AZ_API_VER}"
|
||||||
|
|
||||||
|
BOOKS_OT = [
|
||||||
|
"Geneza","Exodul","Leviticul","Numeri","Deuteronom","Iosua","Judecători","Rut",
|
||||||
|
"1 Samuel","2 Samuel","1 Imparati","2 Imparati","1 Cronici","2 Cronici","Ezra","Neemia","Estera",
|
||||||
|
"Iov","Psalmii","Proverbe","Eclesiastul","Cântarea Cântărilor","Isaia","Ieremia","Plângerile",
|
||||||
|
"Ezechiel","Daniel","Osea","Ioel","Amos","Obadia","Iona","Mica","Naum","Habacuc","Țefania","Hagai","Zaharia","Maleahi"
|
||||||
|
]
|
||||||
|
BOOKS_NT = [
|
||||||
|
"Matei","Marcu","Luca","Ioan","Faptele Apostolilor","Romani","1 Corinteni","2 Corinteni",
|
||||||
|
"Galateni","Efeseni","Filipeni","Coloseni","1 Tesaloniceni","2 Tesaloniceni","1 Timotei","2 Timotei",
|
||||||
|
"Titus","Filimon","Evrei","Iacov","1 Petru","2 Petru","1 Ioan","2 Ioan","3 Ioan","Iuda","Revelaţia"
|
||||||
|
]
|
||||||
|
|
||||||
|
BOOK_CANON = {b:("OT" if b in BOOKS_OT else "NT") for b in BOOKS_OT + BOOKS_NT}
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Verse:
|
||||||
|
testament: str
|
||||||
|
book: str
|
||||||
|
chapter: int
|
||||||
|
verse: int
|
||||||
|
text_raw: str
|
||||||
|
text_norm: str
|
||||||
|
|
||||||
|
def normalize_text(s: str) -> str:
|
||||||
|
s = re.sub(r"\s+", " ", s.strip())
|
||||||
|
s = s.replace(" ", " ")
|
||||||
|
return s
|
||||||
|
|
||||||
|
BOOK_RE = re.compile(r"^(?P<book>[A-ZĂÂÎȘȚ][^\n]+?)\s*$")
|
||||||
|
CH_RE = re.compile(r"^(?i:Capitolul|CApitoLuL)\s+(?P<ch>\d+)\b")
|
||||||
|
VERSE_RE = re.compile(r"^(?P<v>\d+)\s+(?P<body>.+)$")
|
||||||
|
|
||||||
|
def parse_bible_md(md_text: str):
|
||||||
|
cur_book, cur_ch = None, None
|
||||||
|
testament = None
|
||||||
|
is_in_bible_content = False
|
||||||
|
|
||||||
|
for line in md_text.splitlines():
|
||||||
|
line = line.rstrip()
|
||||||
|
|
||||||
|
# Start processing after "VECHIUL TESTAMENT" or when we find book markers
|
||||||
|
if line == 'VECHIUL TESTAMENT' or line == 'TESTAMENT' or '…' in line:
|
||||||
|
is_in_bible_content = True
|
||||||
|
|
||||||
|
if not is_in_bible_content:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Book detection: … BookName …
|
||||||
|
book_match = re.match(r'^…\s*(.+?)\s*…$', line)
|
||||||
|
if book_match:
|
||||||
|
bname = book_match.group(1).strip()
|
||||||
|
if bname in BOOK_CANON:
|
||||||
|
cur_book = bname
|
||||||
|
testament = BOOK_CANON[bname]
|
||||||
|
cur_ch = None
|
||||||
|
print(f"Found book: {bname}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Chapter detection: Capitolul X or CApitoLuL X
|
||||||
|
m_ch = CH_RE.match(line)
|
||||||
|
if m_ch and cur_book:
|
||||||
|
cur_ch = int(m_ch.group("ch"))
|
||||||
|
print(f" Chapter {cur_ch}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Verse detection: starts with number
|
||||||
|
m_v = VERSE_RE.match(line)
|
||||||
|
if m_v and cur_book and cur_ch:
|
||||||
|
vnum = int(m_v.group("v"))
|
||||||
|
body = m_v.group("body").strip()
|
||||||
|
|
||||||
|
# Remove paragraph markers
|
||||||
|
body = re.sub(r'^¶\s*', '', body)
|
||||||
|
|
||||||
|
raw = body
|
||||||
|
norm = normalize_text(body)
|
||||||
|
yield {
|
||||||
|
"testament": testament, "book": cur_book, "chapter": cur_ch, "verse": vnum,
|
||||||
|
"text_raw": raw, "text_norm": norm
|
||||||
|
}
|
||||||
|
|
||||||
|
async def embed_batch(client, inputs):
|
||||||
|
payload = {"input": inputs}
|
||||||
|
headers = {"api-key": AZ_API_KEY, "Content-Type": "application/json"}
|
||||||
|
for attempt in range(6):
|
||||||
|
try:
|
||||||
|
r = await client.post(EMBED_URL, headers=headers, json=payload, timeout=60)
|
||||||
|
if r.status_code == 200:
|
||||||
|
data = r.json()
|
||||||
|
ordered = sorted(data["data"], key=lambda x: x["index"])
|
||||||
|
return [d["embedding"] for d in ordered]
|
||||||
|
elif r.status_code in (429, 500, 503):
|
||||||
|
backoff = 2 ** attempt + (0.1 * attempt)
|
||||||
|
print(f"Rate limited, waiting {backoff:.1f}s...")
|
||||||
|
await asyncio.sleep(backoff)
|
||||||
|
else:
|
||||||
|
raise RuntimeError(f"Embedding error {r.status_code}: {r.text}")
|
||||||
|
except Exception as e:
|
||||||
|
backoff = 2 ** attempt + (0.1 * attempt)
|
||||||
|
print(f"Error on attempt {attempt + 1}: {e}, waiting {backoff:.1f}s...")
|
||||||
|
await asyncio.sleep(backoff)
|
||||||
|
raise RuntimeError("Failed to embed after retries")
|
||||||
|
|
||||||
|
# First, we need to create the table with proper SQL
|
||||||
|
CREATE_TABLE_SQL = """
|
||||||
|
CREATE TABLE IF NOT EXISTS bible_passages (
|
||||||
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||||
|
testament TEXT NOT NULL,
|
||||||
|
book TEXT NOT NULL,
|
||||||
|
chapter INT NOT NULL,
|
||||||
|
verse INT NOT NULL,
|
||||||
|
ref TEXT GENERATED ALWAYS AS (book || ' ' || chapter || ':' || verse) STORED,
|
||||||
|
lang TEXT NOT NULL DEFAULT 'ro',
|
||||||
|
translation TEXT NOT NULL DEFAULT 'FIDELA',
|
||||||
|
text_raw TEXT NOT NULL,
|
||||||
|
text_norm TEXT NOT NULL,
|
||||||
|
tsv tsvector,
|
||||||
|
embedding vector(1536),
|
||||||
|
created_at TIMESTAMPTZ DEFAULT now(),
|
||||||
|
updated_at TIMESTAMPTZ DEFAULT now()
|
||||||
|
);
|
||||||
|
"""
|
||||||
|
|
||||||
|
CREATE_INDEXES_SQL = """
|
||||||
|
-- Uniqueness by canonical reference within translation/language
|
||||||
|
CREATE UNIQUE INDEX IF NOT EXISTS ux_ref_lang ON bible_passages (translation, lang, book, chapter, verse);
|
||||||
|
|
||||||
|
-- Full-text index
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_tsv ON bible_passages USING GIN (tsv);
|
||||||
|
|
||||||
|
-- Other indexes
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_book_ch ON bible_passages (book, chapter);
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_testament ON bible_passages (testament);
|
||||||
|
"""
|
||||||
|
|
||||||
|
UPSERT_SQL = """
|
||||||
|
INSERT INTO bible_passages (testament, book, chapter, verse, lang, translation, text_raw, text_norm, tsv, embedding)
|
||||||
|
VALUES (%(testament)s, %(book)s, %(chapter)s, %(verse)s, %(lang)s, %(translation)s, %(text_raw)s, %(text_norm)s,
|
||||||
|
to_tsvector(COALESCE(%(ts_lang)s,'simple')::regconfig, %(text_norm)s), %(embedding)s)
|
||||||
|
ON CONFLICT (translation, lang, book, chapter, verse) DO UPDATE
|
||||||
|
SET text_raw=EXCLUDED.text_raw,
|
||||||
|
text_norm=EXCLUDED.text_norm,
|
||||||
|
tsv=EXCLUDED.tsv,
|
||||||
|
embedding=EXCLUDED.embedding,
|
||||||
|
updated_at=now();
|
||||||
|
"""
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
print("Starting Bible embedding ingestion...")
|
||||||
|
|
||||||
|
md_text = Path(BIBLE_MD_PATH).read_text(encoding="utf-8", errors="ignore")
|
||||||
|
verses = list(parse_bible_md(md_text))
|
||||||
|
print(f"Parsed verses: {len(verses)}")
|
||||||
|
|
||||||
|
batch_size = 128
|
||||||
|
|
||||||
|
# First create the table structure
|
||||||
|
with psycopg.connect(DB_URL) as conn:
|
||||||
|
with conn.cursor() as cur:
|
||||||
|
print("Creating bible_passages table...")
|
||||||
|
cur.execute("CREATE EXTENSION IF NOT EXISTS vector;")
|
||||||
|
cur.execute(CREATE_TABLE_SQL)
|
||||||
|
cur.execute(CREATE_INDEXES_SQL)
|
||||||
|
conn.commit()
|
||||||
|
print("Table created successfully")
|
||||||
|
|
||||||
|
# Now process embeddings
|
||||||
|
async with httpx.AsyncClient() as client:
|
||||||
|
with psycopg.connect(DB_URL, autocommit=False) as conn:
|
||||||
|
with conn.cursor() as cur:
|
||||||
|
for i in range(0, len(verses), batch_size):
|
||||||
|
batch = verses[i:i+batch_size]
|
||||||
|
inputs = [v["text_norm"] for v in batch]
|
||||||
|
|
||||||
|
print(f"Generating embeddings for batch {i//batch_size + 1}/{(len(verses) + batch_size - 1)//batch_size}")
|
||||||
|
embs = await embed_batch(client, inputs)
|
||||||
|
|
||||||
|
rows = []
|
||||||
|
for v, e in zip(batch, embs):
|
||||||
|
rows.append({
|
||||||
|
**v,
|
||||||
|
"lang": LANG_CODE,
|
||||||
|
"translation": TRANSLATION,
|
||||||
|
"ts_lang": "romanian",
|
||||||
|
"embedding": e
|
||||||
|
})
|
||||||
|
|
||||||
|
cur.executemany(UPSERT_SQL, rows)
|
||||||
|
conn.commit()
|
||||||
|
print(f"Upserted {len(rows)} verses... {i+len(rows)}/{len(verses)}")
|
||||||
|
|
||||||
|
# Create IVFFLAT index after data is loaded
|
||||||
|
print("Creating IVFFLAT index...")
|
||||||
|
with psycopg.connect(DB_URL, autocommit=True) as conn:
|
||||||
|
with conn.cursor() as cur:
|
||||||
|
cur.execute("VACUUM ANALYZE bible_passages;")
|
||||||
|
cur.execute("""
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_vec_ivfflat
|
||||||
|
ON bible_passages USING ivfflat (embedding vector_cosine_ops)
|
||||||
|
WITH (lists = 200);
|
||||||
|
""")
|
||||||
|
|
||||||
|
print("✅ Bible embedding ingestion completed successfully!")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
||||||
372
temp/azure-embed3-bible-pgvector-guide.md
Normal file
372
temp/azure-embed3-bible-pgvector-guide.md
Normal file
@@ -0,0 +1,372 @@
|
|||||||
|
# Azure OpenAI **embed-3** → Postgres + pgvector Ingestion Guide (Bible Corpus)
|
||||||
|
|
||||||
|
**Goal**: Create a production‑ready Python script that ingests the full Bible (Markdown source) into **Postgres** with **pgvector** and **full‑text** metadata, using **Azure OpenAI `embed-3`** embeddings. The vectors will power a consumer chat assistant (Q&A & conversations about the Bible) and a backend agent that generates custom prayers.
|
||||||
|
|
||||||
|
> Sample corpus used here: Romanian *Biblia Fidela* (Markdown). Structure contains books, chapters, verses (e.g., *Geneza 1:1…*) and a TOC in the file. fileciteturn0file0
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 0) Architecture at a glance
|
||||||
|
|
||||||
|
- **Input**: Bible in Markdown (`*.md`) → parser → normalized records: *(book, chapter, verse, text, lang=ro)*
|
||||||
|
- **Embedding**: Azure OpenAI **embed-3** (prefer `text-embedding-3-large`, 3072‑D). Batch inputs to cut cost/latency.
|
||||||
|
- **Storage**: Postgres with:
|
||||||
|
- `pgvector` column `embedding vector(3072)`
|
||||||
|
- `tsvector` column for hybrid lexical search (Romanian or English config as needed)
|
||||||
|
- metadata columns for fast filtering (book, chapter, verse, testament, translation, language)
|
||||||
|
- **Indexes**: `ivfflat` over `embedding`, GIN over `tsv` (and btree over metadata)
|
||||||
|
- **Retrieval**:
|
||||||
|
- Dense vector kNN
|
||||||
|
- Hybrid: combine kNN score + BM25/tsvector
|
||||||
|
- Windowed context stitching (neighbor verses) for chat
|
||||||
|
- **Consumers**:
|
||||||
|
- Chat assistant: answer + cite (book:chapter:verse).
|
||||||
|
- Prayer agent: prompt‑compose with retrieved passages & user intents.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1) Prerequisites
|
||||||
|
|
||||||
|
### Postgres + pgvector
|
||||||
|
```bash
|
||||||
|
# Install pgvector (on Ubuntu)
|
||||||
|
sudo apt-get update && sudo apt-get install -y postgresql postgresql-contrib
|
||||||
|
# In psql as superuser:
|
||||||
|
CREATE EXTENSION IF NOT EXISTS vector;
|
||||||
|
```
|
||||||
|
|
||||||
|
### Python deps
|
||||||
|
```bash
|
||||||
|
python -m venv .venv && source .venv/bin/activate
|
||||||
|
pip install psycopg[binary] pgvector pydantic python-dotenv httpx tqdm rapidfuzz
|
||||||
|
```
|
||||||
|
|
||||||
|
> `httpx` for HTTP (async‑capable), `pgvector` adapter, `rapidfuzz` for optional de‑dup or heuristic joins, `tqdm` for progress.
|
||||||
|
|
||||||
|
### Azure OpenAI
|
||||||
|
- Create **Embeddings** deployment for **`text-embedding-3-large`** (or `-small` if cost sensitive). Name it (e.g.) `embeddings`.
|
||||||
|
- Collect:
|
||||||
|
- `AZURE_OPENAI_ENDPOINT=https://<your>.openai.azure.com/`
|
||||||
|
- `AZURE_OPENAI_API_KEY=...`
|
||||||
|
- `AZURE_OPENAI_API_VERSION=2024-05-01-preview` *(or your current stable)*
|
||||||
|
- `AZURE_OPENAI_EMBED_DEPLOYMENT=embeddings` *(your deployment name)*
|
||||||
|
|
||||||
|
Create `.env`:
|
||||||
|
```env
|
||||||
|
DATABASE_URL=postgresql://user:pass@localhost:5432/bible
|
||||||
|
AZURE_OPENAI_ENDPOINT=https://YOUR_RESOURCE.openai.azure.com/
|
||||||
|
AZURE_OPENAI_API_KEY=YOUR_KEY
|
||||||
|
AZURE_OPENAI_API_VERSION=2024-05-01-preview
|
||||||
|
AZURE_OPENAI_EMBED_DEPLOYMENT=embeddings
|
||||||
|
EMBED_DIMS=3072
|
||||||
|
BIBLE_MD_PATH=./Biblia-Fidela-limba-romana.md
|
||||||
|
LANG_CODE=ro
|
||||||
|
TRANSLATION_CODE=FIDELA
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2) Database schema
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- One-time setup in your database
|
||||||
|
CREATE EXTENSION IF NOT EXISTS vector;
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS bible_passages (
|
||||||
|
id BIGSERIAL PRIMARY KEY,
|
||||||
|
testament TEXT NOT NULL, -- 'OT' or 'NT'
|
||||||
|
book TEXT NOT NULL,
|
||||||
|
chapter INT NOT NULL,
|
||||||
|
verse INT NOT NULL,
|
||||||
|
ref TEXT GENERATED ALWAYS AS (book || ' ' || chapter || ':' || verse) STORED,
|
||||||
|
lang TEXT NOT NULL DEFAULT 'ro',
|
||||||
|
translation TEXT NOT NULL DEFAULT 'FIDELA',
|
||||||
|
text_raw TEXT NOT NULL, -- exact verse text
|
||||||
|
text_norm TEXT NOT NULL, -- normalized/cleaned text (embedding input)
|
||||||
|
tsv tsvector,
|
||||||
|
embedding vector(3072), -- 1536 if using embed-3-small
|
||||||
|
created_at TIMESTAMPTZ DEFAULT now(),
|
||||||
|
updated_at TIMESTAMPTZ DEFAULT now()
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Uniqueness by canonical reference within translation/language
|
||||||
|
CREATE UNIQUE INDEX IF NOT EXISTS ux_ref_lang ON bible_passages (translation, lang, book, chapter, verse);
|
||||||
|
|
||||||
|
-- Full-text index (choose config; Romanian available if installed via ISPELL; else use 'simple' or 'english')
|
||||||
|
-- If you have pg_catalog.romanian, use that. Else fallback to 'simple' but keep lexemes.
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_tsv ON bible_passages USING GIN (tsv);
|
||||||
|
|
||||||
|
-- Vector index (choose nlist to match data size; we set after populating table)
|
||||||
|
-- First create a flat index for small data, or IVFFLAT for scale:
|
||||||
|
-- Requires ANALYZE beforehand and SET enable_seqscan=off for kNN plans.
|
||||||
|
```
|
||||||
|
|
||||||
|
After loading, build the IVFFLAT index (the table must be populated first):
|
||||||
|
```sql
|
||||||
|
-- Example: around 31k verses ⇒ nlist ~ 100–200 is reasonable; tune per EXPLAIN ANALYZE
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_vec_ivfflat
|
||||||
|
ON bible_passages USING ivfflat (embedding vector_cosine_ops)
|
||||||
|
WITH (lists = 200);
|
||||||
|
```
|
||||||
|
|
||||||
|
Trigger to keep `updated_at` fresh:
|
||||||
|
```sql
|
||||||
|
CREATE OR REPLACE FUNCTION touch_updated_at() RETURNS TRIGGER AS $$
|
||||||
|
BEGIN NEW.updated_at = now(); RETURN NEW; END; $$ LANGUAGE plpgsql;
|
||||||
|
|
||||||
|
DROP TRIGGER IF EXISTS trg_bible_updated ON bible_passages;
|
||||||
|
CREATE TRIGGER trg_bible_updated BEFORE UPDATE ON bible_passages
|
||||||
|
FOR EACH ROW EXECUTE PROCEDURE touch_updated_at();
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3) Parsing & Chunking strategy (large, high‑quality)
|
||||||
|
|
||||||
|
**Why verse‑level?** It’s the canonical granular unit for Bible QA.
|
||||||
|
**Context‑stitching**: during retrieval, fetch neighbor verses (±N) to maintain narrative continuity.
|
||||||
|
**Normalization** steps (for `text_norm`):
|
||||||
|
- Strip verse numbers and sidenotes if present in raw lines.
|
||||||
|
- Collapse whitespace, unify quotes, remove page headers/footers and TOC artifacts.
|
||||||
|
- Preserve punctuation; avoid stemming before embeddings.
|
||||||
|
- Lowercasing optional (OpenAI embeddings are case-robust).
|
||||||
|
|
||||||
|
**Testament/book detection**: From headings and TOC present in the Markdown; detect Book → Chapter → Verse boundaries via regex.
|
||||||
|
Example regex heuristics (tune to your file):
|
||||||
|
- Book headers: `^(?P<book>[A-ZĂÂÎȘȚ].+?)\s*$` (bounded by known canon order)
|
||||||
|
- Chapter headers: `^Capitolul\s+(?P<ch>\d+)` or `^CApitoLuL\s+(?P<ch>\d+)` (case variations)
|
||||||
|
- Verse lines: `^(?P<verse>\d+)\s+(.+)$`
|
||||||
|
|
||||||
|
> The provided Markdown clearly shows book order (e.g., *Geneza*, *Exodul*, …; NT: *Matei*, *Marcu*, …) and verse lines like “**1** LA început…”. fileciteturn0file0
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4) Python ingestion script
|
||||||
|
|
||||||
|
> **Save as** `ingest_bible_pgvector.py`
|
||||||
|
|
||||||
|
```python
|
||||||
|
import os, re, json, math, time, asyncio
|
||||||
|
from typing import List, Dict, Tuple, Iterable
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
import httpx
|
||||||
|
import psycopg
|
||||||
|
from psycopg.rows import dict_row
|
||||||
|
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
AZ_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT", "").rstrip("/")
|
||||||
|
AZ_API_KEY = os.getenv("AZURE_OPENAI_API_KEY")
|
||||||
|
AZ_API_VER = os.getenv("AZURE_OPENAI_API_VERSION", "2024-05-01-preview")
|
||||||
|
AZ_DEPLOYMENT = os.getenv("AZURE_OPENAI_EMBED_DEPLOYMENT", "embeddings")
|
||||||
|
EMBED_DIMS = int(os.getenv("EMBED_DIMS", "3072"))
|
||||||
|
DB_URL = os.getenv("DATABASE_URL")
|
||||||
|
BIBLE_MD_PATH = os.getenv("BIBLE_MD_PATH")
|
||||||
|
LANG_CODE = os.getenv("LANG_CODE", "ro")
|
||||||
|
TRANSLATION = os.getenv("TRANSLATION_CODE", "FIDELA")
|
||||||
|
|
||||||
|
assert AZ_ENDPOINT and AZ_API_KEY and DB_URL and BIBLE_MD_PATH, "Missing required env vars"
|
||||||
|
|
||||||
|
EMBED_URL = f"{AZ_ENDPOINT}/openai/deployments/{AZ_DEPLOYMENT}/embeddings?api-version={AZ_API_VER}"
|
||||||
|
|
||||||
|
BOOKS_OT = [
|
||||||
|
"Geneza","Exodul","Leviticul","Numeri","Deuteronom","Iosua","Judecători","Rut",
|
||||||
|
"1 Samuel","2 Samuel","1 Imparati","2 Imparati","1 Cronici","2 Cronici","Ezra","Neemia","Estera",
|
||||||
|
"Iov","Psalmii","Proverbe","Eclesiastul","Cântarea Cântărilor","Isaia","Ieremia","Plângerile",
|
||||||
|
"Ezechiel","Daniel","Osea","Ioel","Amos","Obadia","Iona","Mica","Naum","Habacuc","Țefania","Hagai","Zaharia","Maleahi"
|
||||||
|
]
|
||||||
|
BOOKS_NT = [
|
||||||
|
"Matei","Marcu","Luca","Ioan","Faptele Apostolilor","Romani","1 Corinteni","2 Corinteni",
|
||||||
|
"Galateni","Efeseni","Filipeni","Coloseni","1 Tesaloniceni","2 Tesaloniceni","1 Timotei","2 Timotei",
|
||||||
|
"Titus","Filimon","Evrei","Iacov","1 Petru","2 Petru","1 Ioan","2 Ioan","3 Ioan","Iuda","Revelaţia"
|
||||||
|
]
|
||||||
|
|
||||||
|
BOOK_CANON = {b:("OT" if b in BOOKS_OT else "NT") for b in BOOKS_OT + BOOKS_NT}
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Verse:
|
||||||
|
testament: str
|
||||||
|
book: str
|
||||||
|
chapter: int
|
||||||
|
verse: int
|
||||||
|
text_raw: str
|
||||||
|
text_norm: str
|
||||||
|
|
||||||
|
def normalize_text(s: str) -> str:
|
||||||
|
s = re.sub(r"\s+", " ", s.strip())
|
||||||
|
s = s.replace(" ", " ")
|
||||||
|
return s
|
||||||
|
|
||||||
|
BOOK_RE = re.compile(r"^(?P<book>[A-ZĂÂÎȘȚ][^\n]+?)\s*$")
|
||||||
|
CH_RE = re.compile(r"^(?i:Capitolul|CApitoLuL)\s+(?P<ch>\d+)\b")
|
||||||
|
VERSE_RE = re.compile(r"^(?P<v>\d+)\s+(?P<body>.+)$")
|
||||||
|
|
||||||
|
def parse_bible_md(md_text: str):
|
||||||
|
cur_book, cur_ch = None, None
|
||||||
|
testament = None
|
||||||
|
for line in md_text.splitlines():
|
||||||
|
line = line.rstrip()
|
||||||
|
|
||||||
|
# Book detection
|
||||||
|
m_book = BOOK_RE.match(line)
|
||||||
|
if m_book:
|
||||||
|
bname = m_book.group("book").strip()
|
||||||
|
if bname in BOOK_CANON:
|
||||||
|
cur_book = bname
|
||||||
|
testament = BOOK_CANON[bname]
|
||||||
|
cur_ch = None
|
||||||
|
continue
|
||||||
|
|
||||||
|
m_ch = CH_RE.match(line)
|
||||||
|
if m_ch and cur_book:
|
||||||
|
cur_ch = int(m_ch.group("ch"))
|
||||||
|
continue
|
||||||
|
|
||||||
|
m_v = VERSE_RE.match(line)
|
||||||
|
if m_v and cur_book and cur_ch:
|
||||||
|
vnum = int(m_v.group("v"))
|
||||||
|
body = m_v.group("body").strip()
|
||||||
|
raw = body
|
||||||
|
norm = normalize_text(body)
|
||||||
|
yield {
|
||||||
|
"testament": testament, "book": cur_book, "chapter": cur_ch, "verse": vnum,
|
||||||
|
"text_raw": raw, "text_norm": norm
|
||||||
|
}
|
||||||
|
|
||||||
|
async def embed_batch(client, inputs):
|
||||||
|
payload = {"input": inputs}
|
||||||
|
headers = {"api-key": AZ_API_KEY, "Content-Type": "application/json"}
|
||||||
|
for attempt in range(6):
|
||||||
|
try:
|
||||||
|
r = await client.post(EMBED_URL, headers=headers, json=payload, timeout=60)
|
||||||
|
if r.status_code == 200:
|
||||||
|
data = r.json()
|
||||||
|
ordered = sorted(data["data"], key=lambda x: x["index"])
|
||||||
|
return [d["embedding"] for d in ordered]
|
||||||
|
elif r.status_code in (429, 500, 503):
|
||||||
|
backoff = 2 ** attempt + (0.1 * attempt)
|
||||||
|
await asyncio.sleep(backoff)
|
||||||
|
else:
|
||||||
|
raise RuntimeError(f"Embedding error {r.status_code}: {r.text}")
|
||||||
|
except Exception:
|
||||||
|
backoff = 2 ** attempt + (0.1 * attempt)
|
||||||
|
await asyncio.sleep(backoff)
|
||||||
|
raise RuntimeError("Failed to embed after retries")
|
||||||
|
|
||||||
|
UPSERT_SQL = """
|
||||||
|
INSERT INTO bible_passages (testament, book, chapter, verse, lang, translation, text_raw, text_norm, tsv, embedding)
|
||||||
|
VALUES (%(testament)s, %(book)s, %(chapter)s, %(verse)s, %(lang)s, %(translation)s, %(text_raw)s, %(text_norm)s,
|
||||||
|
to_tsvector(COALESCE(%(ts_lang)s,'simple')::regconfig, %(text_norm)s), %(embedding)s)
|
||||||
|
ON CONFLICT (translation, lang, book, chapter, verse) DO UPDATE
|
||||||
|
SET text_raw=EXCLUDED.text_raw,
|
||||||
|
text_norm=EXCLUDED.text_norm,
|
||||||
|
tsv=EXCLUDED.tsv,
|
||||||
|
embedding=EXCLUDED.embedding,
|
||||||
|
updated_at=now();
|
||||||
|
"""
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
md_text = Path(BIBLE_MD_PATH).read_text(encoding="utf-8", errors="ignore")
|
||||||
|
verses = list(parse_bible_md(md_text))
|
||||||
|
print(f"Parsed verses: {len(verses)}")
|
||||||
|
|
||||||
|
batch_size = 128
|
||||||
|
async with httpx.AsyncClient() as client, psycopg.connect(DB_URL, autocommit=False) as conn:
|
||||||
|
with conn.cursor() as cur:
|
||||||
|
for i in range(0, len(verses), batch_size):
|
||||||
|
batch = verses[i:i+batch_size]
|
||||||
|
inputs = [v["text_norm"] for v in batch]
|
||||||
|
embs = await embed_batch(client, inputs)
|
||||||
|
rows = []
|
||||||
|
for v, e in zip(batch, embs):
|
||||||
|
rows.append({
|
||||||
|
**v,
|
||||||
|
"lang": os.getenv("LANG_CODE","ro"),
|
||||||
|
"translation": os.getenv("TRANSLATION_CODE","FIDELA"),
|
||||||
|
"ts_lang": "romanian",
|
||||||
|
"embedding": e
|
||||||
|
})
|
||||||
|
cur.executemany(UPSERT_SQL, rows)
|
||||||
|
conn.commit()
|
||||||
|
print(f"Upserted {len(rows)} … {i+len(rows)}/{len(verses)}")
|
||||||
|
print("Done. Build IVFFLAT index after ANALYZE.")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
import asyncio
|
||||||
|
asyncio.run(main())
|
||||||
|
```
|
||||||
|
|
||||||
|
**Notes**
|
||||||
|
- If `romanian` text search config is unavailable, set `ts_lang='simple'`.
|
||||||
|
- For `embed-3-small`, set `EMBED_DIMS=1536` and change column type to `vector(1536)`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5) Post‑ingestion steps
|
||||||
|
|
||||||
|
```sql
|
||||||
|
VACUUM ANALYZE bible_passages;
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_vec_ivfflat
|
||||||
|
ON bible_passages USING ivfflat (embedding vector_cosine_ops)
|
||||||
|
WITH (lists = 200);
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_book_ch ON bible_passages (book, chapter);
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6) Retrieval patterns
|
||||||
|
|
||||||
|
### A) Pure vector kNN (cosine)
|
||||||
|
```sql
|
||||||
|
SELECT ref, book, chapter, verse, text_raw,
|
||||||
|
1 - (embedding <=> $1) AS cosine_sim
|
||||||
|
FROM bible_passages
|
||||||
|
ORDER BY embedding <=> $1
|
||||||
|
LIMIT $2;
|
||||||
|
```
|
||||||
|
|
||||||
|
### B) Hybrid lexical + vector (weighted)
|
||||||
|
```sql
|
||||||
|
WITH v AS (
|
||||||
|
SELECT id, 1 - (embedding <=> $1) AS vsim
|
||||||
|
FROM bible_passages
|
||||||
|
ORDER BY embedding <=> $1
|
||||||
|
LIMIT 100
|
||||||
|
),
|
||||||
|
l AS (
|
||||||
|
SELECT id, ts_rank(tsv, $2) AS lrank
|
||||||
|
FROM bible_passages
|
||||||
|
WHERE tsv @@ $2
|
||||||
|
)
|
||||||
|
SELECT bp.ref, bp.book, bp.chapter, bp.verse, bp.text_raw,
|
||||||
|
COALESCE(v.vsim, 0) * 0.7 + COALESCE(l.lrank, 0) * 0.3 AS score
|
||||||
|
FROM bible_passages bp
|
||||||
|
LEFT JOIN v ON v.id = bp.id
|
||||||
|
LEFT JOIN l ON l.id = bp.id
|
||||||
|
ORDER BY score DESC
|
||||||
|
LIMIT 20;
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7) Chat & Prayer agent tips
|
||||||
|
|
||||||
|
- **Answer grounding**: always cite `ref` (e.g., *Ioan 3:16*).
|
||||||
|
- **Multilingual output**: keep quotes in Romanian; explain in the user’s language.
|
||||||
|
- **Prayer agent**: constrain tone & doctrine; inject retrieved verses as anchors.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8) Ops
|
||||||
|
|
||||||
|
- Idempotent `UPSERT`.
|
||||||
|
- Backoff on 429/5xx.
|
||||||
|
- Consider keeping both `embed-3-large` and `-small` columns when migrating.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9) License & attribution
|
||||||
|
|
||||||
|
This guide references the structure of *Biblia Fidela* Markdown for ingestion demonstration. fileciteturn0file0
|
||||||
Reference in New Issue
Block a user