Menu-to-Insight: OCR & RAG Pipeline

PythonFastAPIDockerOllamaQdrantPaddleOCROpenCVSQL
Project screenshot
1 / 5

Overview

This project was built to explore the backend design of food tracking applications. It includes a restaurant menu analysis feature that provides convenient, personalised food recommendations by grounding insights in verified nutritional data.

The system uses a RAG pipeline to ingest menu images via OCR -> retrieve nutritional context through Qdrant vector search -> generate dietary recommendations using a local Ollama model (Qwen-2.5).

The project is structured as a modular pipeline, where each service handles a focused responsibility - making the system easier to observe, scale and debug.

What it does

Menu Image Processing (OCR):

  • Ingests an uploaded image of a restaurant menu
  • Preprocesses and enhances the image to improve text clarity
  • Detects and segments text regions within the image via PaddleOCR detector
  • Extracts text from each region individually via PaddleOCR Recognisor -Reconstructs the extracted text into complete, ordered menu item - Outputs structured menu items ready for analysis.

Menu Analysis & Recommendation (RAG):

  • Accepts extracted menu items, which do not contain nutritional information by default
  • Converts each menu item into embeddings
  • Semantically matches each item against a Qdrant database of verified foods with known nutritional values
  • Infers calories and macros (protein, carbs, fats) based on the closest matching foods or ingredients
  • Applies a similarity threshold to ensure only high-confidence matches are used
  • Builds a prompt combining enriched menu data with the user’s dietary goals
  • Generates personalised food recommendations with clear reasoning, constrained to the retrieved nutritional context.

Food Logging:

  • Allows users to search for foods and log them with a specified quantity
  • Stores logged foods with their nutritional values for tracking
  • Generates a short AI summary after each log, reflecting on the meal and suggesting small improvements
  • Tracks daily progress against calorie and protein targets
  • Allows users to edit edit or delete existing food entries
  • Updates recommended intake dynamically when users change their goals.

Architecture

The system is built as a set of containerised services, where each service handles a distinct responsibility:

  • API Service – Entry point that accepts menu images, handles the flow between services, and sends data to the LLM.
  • Vision Service – Preprocesses images and extracts structured text using PaddleOCR and OpenCV.
  • Vector Service – Generates embeddings and performs semantic retrieval using Qdrant.
  • Relational Service – Stores nutritional data and user food logs using SQLite and SQLAlchemy.
  • Local LLM – An Ollama-hosted model generates recommendations grounded in retrieved nutritional context.

Each service follows a consistent internal structure:

  • Routes expose APIs and handle request/response flow
  • Services encapsulate business logic, keeping routes clean and focused on orchestration
  • Schemas enforce data structure and ensure consistency across services

This separation of concerns keeps the system modular - making services easier to scale, debug and replace independently.

Demonstrates

  • Designed a containerised, service-based architecture orchestrated with Docker Compose.
  • Implemented an OCR pipeline using PaddleOCR and OpenCV for layout detection and text extraction.
  • Built a RAG pipeline combining Qdrant vector search with Ollama-based LLM inference.
  • Developed a full-stack application with a clear separation between frontend and backend.
  • Integrated structured logging across services to support debugging and system reliability.
  • Used local LLMs to maintain user privacy and ground responses in retrieved data, reducing hallucinations.

Tech Stack

Python, FastAPI, Docker, Ollama, Qdrant, PaddleOCR, OpenCV, SQLAlchemy, Pandas

Possible Extensions

  • Multi-user authentication and personal dietary profiles.
  • Long-term nutritional analytics and progress tracking.
  • Conversational meal planning powered by RAG.
  • Faster food logging from photos or quick inputs.