Menu-to-Insight: OCR & RAG Pipeline

Overview
This project was built to explore the backend design of food tracking applications. It includes a restaurant menu analysis feature that provides convenient, personalised food recommendations by grounding insights in verified nutritional data.
The system uses a RAG pipeline to ingest menu images via OCR -> retrieve nutritional context through Qdrant vector search -> generate dietary recommendations using a local Ollama model (Qwen-2.5).
The project is structured as a modular pipeline, where each service handles a focused responsibility - making the system easier to observe, scale and debug.
What it does
Menu Image Processing (OCR):
- Ingests an uploaded image of a restaurant menu
- Preprocesses and enhances the image to improve text clarity
- Detects and segments text regions within the image via PaddleOCR detector
- Extracts text from each region individually via PaddleOCR Recognisor -Reconstructs the extracted text into complete, ordered menu item - Outputs structured menu items ready for analysis.
Menu Analysis & Recommendation (RAG):
- Accepts extracted menu items, which do not contain nutritional information by default
- Converts each menu item into embeddings
- Semantically matches each item against a Qdrant database of verified foods with known nutritional values
- Infers calories and macros (protein, carbs, fats) based on the closest matching foods or ingredients
- Applies a similarity threshold to ensure only high-confidence matches are used
- Builds a prompt combining enriched menu data with the user’s dietary goals
- Generates personalised food recommendations with clear reasoning, constrained to the retrieved nutritional context.
Food Logging:
- Allows users to search for foods and log them with a specified quantity
- Stores logged foods with their nutritional values for tracking
- Generates a short AI summary after each log, reflecting on the meal and suggesting small improvements
- Tracks daily progress against calorie and protein targets
- Allows users to edit edit or delete existing food entries
- Updates recommended intake dynamically when users change their goals.
Architecture
The system is built as a set of containerised services, where each service handles a distinct responsibility:
- API Service – Entry point that accepts menu images, handles the flow between services, and sends data to the LLM.
- Vision Service – Preprocesses images and extracts structured text using PaddleOCR and OpenCV.
- Vector Service – Generates embeddings and performs semantic retrieval using Qdrant.
- Relational Service – Stores nutritional data and user food logs using SQLite and SQLAlchemy.
- Local LLM – An Ollama-hosted model generates recommendations grounded in retrieved nutritional context.
Each service follows a consistent internal structure:
- Routes expose APIs and handle request/response flow
- Services encapsulate business logic, keeping routes clean and focused on orchestration
- Schemas enforce data structure and ensure consistency across services
This separation of concerns keeps the system modular - making services easier to scale, debug and replace independently.
Demonstrates
- Designed a containerised, service-based architecture orchestrated with Docker Compose.
- Implemented an OCR pipeline using PaddleOCR and OpenCV for layout detection and text extraction.
- Built a RAG pipeline combining Qdrant vector search with Ollama-based LLM inference.
- Developed a full-stack application with a clear separation between frontend and backend.
- Integrated structured logging across services to support debugging and system reliability.
- Used local LLMs to maintain user privacy and ground responses in retrieved data, reducing hallucinations.
Tech Stack
Python, FastAPI, Docker, Ollama, Qdrant, PaddleOCR, OpenCV, SQLAlchemy, Pandas
Possible Extensions
- Multi-user authentication and personal dietary profiles.
- Long-term nutritional analytics and progress tracking.
- Conversational meal planning powered by RAG.
- Faster food logging from photos or quick inputs.