Summary of "I Built An AI Receptionist For A Dental Clinic (VAPI + Custom LLM)"

Concise summary — technology, features, how-to, and code overview

This project is an AI telephone receptionist for a dental clinic capable of booking, rescheduling, cancelling, confirming appointments; transferring/escalating to a human; taking messages; and answering clinic questions using a supplied knowledge base. The creator claims it can handle ~80–90% of calls 24/7. A demo call shows the agent (“Sarah”, TTS) collecting name, clarifying surname, capturing phone number, checking availability, booking a 4:00 PM appointment, and sending SMS/email confirmation. All call audio, call length, and LLM interactions are logged.

What this project does

Handles common telephony tasks: book, reschedule, cancel, confirm appointments; transfer/escalate; take messages; answer knowledge-base questions.
Works over a phone channel via a telephony provider through a voice agent API (VAPI — referred to as Vapy / Vapp in the demo).
Logs every LLM call, intent classification, and telemetry for debugging and improvement.

The demo claims ~80–90% of calls can be handled without human intervention.

Core architecture (high-level flow)

Speech-to-text (STT) transcribes the caller → LLM pipeline → text-to-speech (TTS) synthesizes responses.
Per user turn, the system runs in parallel:
- Intent classification (book / cancel / reschedule / confirm / escalate / etc.)
- RAG (retrieval-augmented generation) retrieval from the knowledge base
Prompt assembly consists of:
- Base prompt
- State-specific prompt (e.g., book_appointment)
- RAG content (scenarios / contextual facts) The assembled prompt is sent to the main LLM.
Telephony integration: responses are synthesized and played back through the telephony provider.
Observability: Langfuse collects and stores every input/output, LLM calls, intents, latencies, and other telemetry.

Models and services referenced

LLM in demo: Gemini Flash (preview / 2.5 Flash used). Code is OpenAI-compatible (can use OpenAI or other providers).
Vector DB (RAG): referred to as CQRN / Centrun for storing embeddings/collections.
Other providers/tools mentioned: OpenAI, Base Ten (inference), Grok, ngrok (local testing), Render (production hosting).
Langfuse: prompt and request observability.

Code structure and key files

main.py: orchestration / application entry point.
chat.py (router): chat completions endpoint (where VAPI calls land).
knowledgebased.py: endpoints for knowledge-base operations (/search, /add, /delete).
tools.py: definitions of agent tools (book appointment, confirm, check availability, transfer call, escalate/take message). Demo uses mock data; production must hook to real CRM/booking systems.
services/: helper functions used by routers (e.g., embed_text, search_rag).
prompts.py: intent definitions, tool metadata, prompt rendering functions (base + state + scenarios).
config.py: environment variables, embedding selection, API keys, and toggles (e.g., whether tools run locally).
schemas.py: data schemas for requests/responses.
Helper script: create_vapy_tools.py — auto-registers tool definitions with VAPI based on prompts.py.

Knowledge base (RAG) format and management

Knowledge base entries are JSON objects with:

user says: example utterances that should trigger the entry
context: factual content to inject into prompts (e.g., opening hours)
response guidelines: instructions on how the assistant should respond
assistant says: example assistant utterances (helps RAG retrieval match dialogue patterns)

Endpoints:

Add entries: POST /add
Search: POST /search
Delete: POST /delete

Only recent user messages are sent to the RAG query to limit token costs and focus retrieval on the current context.

Operational notes, latency, and reliability

Intent classification runs each turn. It adds roughly ~200 ms latency but improves reliability, especially when a user changes intent mid-call.
The prompt is split into base + state + RAG to avoid very large monolithic prompts and keep behavior focused.
Langfuse captures per-call data (example: a 2m29s call with associated LLM completions and intent classifications) to enable debugging and iterative improvements.

Deployment / running guide (condensed)

Clone the repository (creator will publish code for free).
Provide environment variables / API keys:
- LLM provider key(s) (OpenAI, Gemini, Base Ten, Grok, etc.)
- Vector DB (CQRN) URL and API key; create a collection and set its name in env vars
- Langfuse API key
- VAPI / telephony API key
- Port (recommended 8000)
Local testing: run via ngrok to expose a public URL and register that URL with VAPI.
Register VAPI tools: run create_vapy_tools.py to auto-create the tools defined in prompts.py in VAPI.
Production: push to GitHub and deploy with a host like Render to get a permanent URL; update VAPI to point to that URL; integrate with telephony/CRM/appointment system.
Populate the knowledge base via POST /add (JSON format described above) and connect the agent tools to your clinic’s real systems instead of mock data.

Customisation, caveats, and services

Tools must be implemented to integrate with your clinic’s booking/CRM APIs (demo tools use mock data).
Prompt engineering is a primary challenge; the project is structured to make prompt design easier and more reliable.
Langfuse enables continuous observation and iterative improvement by storing prompts, completions, intents, and latencies.
The creator offers a paid customization/managed service (prompt tuning, tools integration, monitoring) — contact via the website in the video description.

What’s provided / promised

Full code and assets will be published free (link available in the video description).
The creator intends to support users via a community to help troubleshoot when trying the code.

Main speaker / sources

Hugo — creator/presenter (has ~2 years building voice AI systems).
Platforms/tools referenced: VAPI (Vapy/Vapim), Langfuse, Gemini Flash, vector DB CQRN / Centrun, ngrok, Render, OpenAI, Base Ten, Grok.

If desired, a short deployment checklist or a prioritized list of code files to inspect first can be produced separately.