Skip to main content

Production Tips

When deploying to production, consider these best practices.

Security

Environment Variables

Never commit secrets. Use proper secret management:

# Bad - in docker-compose.yml
environment:
- OPENAI_API_KEY=sk-actual-key-here

# Good - from .env file (not committed)
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}

# Better - from secrets manager
# Use Docker secrets, Kubernetes secrets, or cloud provider secrets

API Key Protection

The OpenAI API key is only on the backend. The frontend never sees it:

Frontend → Backend → OpenAI

Key is here

Rate Limiting

Add rate limiting to prevent abuse:

from slowapi import Limiter
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter

@app.post("/api/chat")
@limiter.limit("10/minute") # 10 requests per minute
async def chat(request: Request, ...):
...

Performance

ChromaDB Persistence

Ensure ChromaDB data persists across restarts:

chroma:
volumes:
- chroma_data:/chroma/chroma # Named volume

Database Connection Pooling

For high traffic, use connection pooling:

from sqlalchemy.pool import QueuePool

engine = create_async_engine(
DATABASE_URL,
poolclass=QueuePool,
pool_size=5,
max_overflow=10,
)

Caching

Cache frequently accessed data:

from functools import lru_cache

@lru_cache(maxsize=100)
def get_food_by_name(name: str):
return retriever.get_by_name(name)

Scaling

Horizontal Scaling

Run multiple backend instances behind a load balancer:

backend:
deploy:
replicas: 3

Database Connection Limits

PostgreSQL has connection limits. With multiple replicas:

# Each replica: 5 connections
# 3 replicas = 15 connections
# Leave headroom for admin connections

pool_size = 5 # Per instance

Monitoring

Health Checks

Add comprehensive health checks:

@app.get("/health")
async def health():
checks = {
"status": "healthy",
"database": await check_database(),
"chroma": await check_chroma(),
"openai": await check_openai(),
}
return checks

async def check_database():
try:
async with async_session() as session:
await session.execute("SELECT 1")
return "ok"
except Exception as e:
return f"error: {str(e)}"

Logging

Use structured logging:

import logging
import json

class JSONFormatter(logging.Formatter):
def format(self, record):
return json.dumps({
"timestamp": self.formatTime(record),
"level": record.levelname,
"message": record.getMessage(),
"module": record.module,
})

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
handler = logging.StreamHandler()
handler.setFormatter(JSONFormatter())
logger.addHandler(handler)

Cost Optimization

OpenAI Costs

Monitor your usage:

# Log token usage
response = await client.chat.completions.create(...)
logger.info(f"Tokens used: {response.usage.total_tokens}")

Use Appropriate Models

ModelCostUse Case
gpt-4o-miniLowGeneral responses
gpt-4oMediumComplex reasoning
gpt-4HighRarely needed

Caching Responses

Cache common queries:

import hashlib

def cache_key(message: str, preferences: dict) -> str:
data = f"{message}:{json.dumps(preferences, sort_keys=True)}"
return hashlib.md5(data.encode()).hexdigest()

# Check cache before calling OpenAI
cached = redis.get(cache_key(message, prefs))
if cached:
return cached

Deployment Options

Docker on VPS

Simple and cost-effective:

ssh your-server
git clone your-repo
cd your-repo
docker compose up -d

Kubernetes

For larger deployments, use Kubernetes with:

  • Horizontal Pod Autoscaler
  • Ingress for routing
  • PersistentVolumeClaims for data
  • Secrets for API keys

Cloud Platforms

PlatformGood For
RailwayQuick deploys
Fly.ioEdge deployment
AWS ECSEnterprise scale
GCP Cloud RunServerless

Backup Strategy

Database Backup

# Backup
docker compose exec postgres pg_dump -U food food > backup.sql

# Restore
docker compose exec -T postgres psql -U food food < backup.sql

ChromaDB Backup

# Stop chroma, copy the volume
docker compose stop chroma
docker run --rm -v food_chroma_data:/data -v $(pwd):/backup alpine \
tar czf /backup/chroma_backup.tar.gz /data
docker compose start chroma

Checklist

Before going to production:

  • API keys in secret manager
  • Rate limiting enabled
  • Health checks configured
  • Logging set up
  • Backups automated
  • Monitoring in place
  • HTTPS configured
  • CORS restricted to your domain
  • Error tracking (Sentry, etc.)

Congratulations! You've built a complete AI-powered food recommendation app from scratch.