RAG Explained

RAG (Retrieval-Augmented Generation) is the technique that makes our food recommendation app smart and accurate.

The Problem RAG Solves

Imagine asking a regular LLM: "Recommend a high-protein South Indian breakfast."

The LLM might:

Recommend dishes that don't exist
Give incorrect nutrition information
Suggest something with allergens you're avoiding
Not know about regional specialties

RAG solves this by giving the LLM access to your own curated data.

How RAG Works

┌──────────────────────────────────────────────────────────────────┐
│                         RAG Pipeline                             │
└──────────────────────────────────────────────────────────────────┘

Step 1: Query
┌─────────────────────────────────────────────────────────────────┐
│ User: "I want something spicy and healthy for breakfast"       │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
Step 2: Retrieve
┌─────────────────────────────────────────────────────────────────┐
│ Search vector database for similar dishes                       │
│ Apply filters: spice_level=high, is_healthy=true, meal=breakfast│
│                                                                  │
│ Results:                                                         │
│   1. Pesarattu (score: 0.92) - Spicy green moong dal crepe      │
│   2. Upma with Chili (score: 0.87) - Spiced semolina            │
│   3. Masala Dosa (score: 0.85) - Crispy with spicy potato       │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
Step 3: Augment
┌─────────────────────────────────────────────────────────────────┐
│ Build prompt with retrieved context:                             │
│                                                                  │
│ System: You are an Indian food expert.                          │
│                                                                  │
│ Available dishes from our database:                              │
│ 1. Pesarattu - A protein-rich crepe from Andhra Pradesh...      │
│ 2. Upma with Chili - A savory semolina dish...                  │
│ 3. Masala Dosa - A crispy fermented crepe...                    │
│                                                                  │
│ User wants: something spicy and healthy for breakfast            │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
Step 4: Generate
┌─────────────────────────────────────────────────────────────────┐
│ LLM generates response using the context:                        │
│                                                                  │
│ "Based on your preferences, I recommend Pesarattu!               │
│  It's a protein-packed crepe from Andhra Pradesh made with      │
│  green moong dal. The green chilies give it a nice kick while   │
│  keeping it healthy. Serve it with ginger chutney for an        │
│  extra spice boost!"                                             │
└─────────────────────────────────────────────────────────────────┘

Why RAG is Powerful

Aspect	Without RAG	With RAG
Data source	LLM's training data	Your curated database
Accuracy	May hallucinate	Grounded in real data
Updates	Need to retrain model	Just update database
Control	None	Full control over content
Filtering	Limited	Precise metadata filters

RAG in Our Application

Our RAG pipeline:

User asks a question → "What should I have for dinner?"
We search ChromaDB with:
- Semantic similarity (meaning-based search)
- Filters (exclude allergens, match dietary type)
- User preferences (spice level, cuisines)
We get relevant dishes with their full details:
- Name, description, ingredients
- Nutrition info, prep time
- Cuisine, spice level, allergens
We build a prompt that includes:
- System instructions (be a food expert)
- User preferences (stored locally)
- Retrieved dishes (from ChromaDB)
- The user's question
LLM generates response using only our data
We stream the response back to the user

Code Preview

Here's a simplified version of our RAG pipeline:

async def get_recommendation(query: str, preferences: dict):
    # Step 1: Retrieve relevant dishes
    dishes = retriever.search(
        query=query,
        exclude_allergens=preferences.get("allergies", []),
        spice_level=preferences.get("spice_level"),
        top_k=5
    )
    
    # Step 2: Build the prompt
    context = format_dishes_for_prompt(dishes)
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": f"Context:\n{context}\n\nQuery: {query}"}
    ]
    
    # Step 3: Generate response
    response = await openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        stream=True
    )
    
    return response

Key Takeaways

RAG = Retrieve + Augment + Generate
Your data stays in your control - not sent for training
Filtering happens at retrieval - before the LLM sees it
The LLM is just the "last mile" - making the data conversational

Next, let's understand how vector databases enable the "retrieve" step.

The Problem RAG Solves​

How RAG Works​

Why RAG is Powerful​

RAG in Our Application​

Code Preview​

Key Takeaways​