Menu-to-Insight: OCR & RAG Pipeline

PythonFastAPIDockerOllamaQdrantPaddleOCROpenCVSQL
Project screenshot
1 / 5

Overview

This project was built to gain experience deploying a multi-service, containerised application using Docker, as well as grounding a local LLM in a knowledge base.

The system uses a RAG pipeline to ingest images via OCR -> retrieve context through Qdrant vector search -> generate food recommendations using a local Ollama model (Qwen-2.5).

A limitation I encountered is that for each query, the LLM has to repeatedly retrieve raw data and rediscover knowledge. To improve this, I plan to implement a wiki layer to accumulate the LLM's insights, summaries and contradictions to provide more contextual answers.

What it does

Image Processing (OCR):

  • Ingests images of restaurant menus
  • Preprocesses and enhances images to improve text clarity
  • Detects text regions using PaddleOCR
  • Extracts text from each region
  • Reconstructs the extracted text into structured menu items

Analysis & Generation (RAG):

  • Converts menu items into embeddings
  • Semantically searches a vector database of verified foods
  • Infers nutritional information from the most relevant matches
  • Applies similarity thresholds to enforce high-confidence retrieval
  • Generates personalised recommendations with reasoning constrained to the retrieved context.

Food Logging:

  • Allows users to search and log foods with nutritional values
  • Provides an encouraging message after each log, reflecting on the meal and suggesting a small improvement
  • Dynamically tracks progress against calorie and protein targets
  • Allows users to edit entries and personalise their profile

Architecture

Built as a set of containerised services, each service focuses on a seperate responsibility:

  • API – Accepts images, orchestrates services and sends context to the LLM.
  • Vision – Preprocesses images and extracts text using PaddleOCR and OpenCV.
  • Vector – Generates embeddings and performs vector search using Qdrant.
  • Database – Stores nutritional data and food logs using SQLite and SQLAlchemy.
  • Local LLM – An Ollama-hosted model that generates recommendations grounded in retrieved context.

Each service follows a consistent internal structure:

  • Routes expose APIs and handle request/response flow
  • Services encapsulate business logic, keeping routes focused on orchestration
  • Schemas enforce data structure and ensure consistency across services

Each service is visible via structured logging and debugging endpoints. I focused on keeping services modular to support scalability and maintainability, so components, Docker images and LLM models could be easily swapped while keeping the system robust.

Demonstrates

  • Deploying containerised services orchestrated with Docker Compose.
  • Detecting, segmenting and extracting text and layout using OCR.
  • Using RAG to ground LLM in retrieved context, reducing hallucinations
  • Building a full-stack app with a clean, modular frontend and backend.
  • Implementing structured logging and debugging endpoints across services.
  • Hosting and configuring local Large Language Models

Tech Stack

Python, FastAPI, Docker, Ollama, Qdrant, PaddleOCR, OpenCV, SQLAlchemy, Pandas

Possible Extensions

  • Long-term memory via a wiki layer to build upon LLM's insights and context.
  • Conversational queries using previous messages and retrieved context to support follow-up questions.
  • Multi-user authentication.
  • Long-term nutritional analytics and progress tracking.