Menu-to-Insight: OCR & RAG Pipeline

PythonFastAPIDockerOllamaQdrantPaddleOCROpenCVSQL

1 / 5

Overview

This project was built to gain experience deploying a multi-service, containerised application using Docker, as well as grounding a local LLM in a knowledge base.

The system uses a RAG pipeline to ingest images via OCR -> retrieve context through Qdrant vector search -> generate food recommendations using a local Ollama model (Qwen-2.5).

A limitation I encountered is that for each query, the LLM has to repeatedly retrieve raw data and rediscover knowledge. To improve this, I plan to implement a wiki layer to accumulate the LLM's insights, summaries and contradictions to provide more contextual answers.

What it does

Image Processing (OCR):

Ingests images of restaurant menus
Preprocesses and enhances images to improve text clarity
Detects text regions using PaddleOCR
Extracts text from each region
Reconstructs the extracted text into structured menu items

Analysis & Generation (RAG):

Converts menu items into embeddings
Semantically searches a vector database of verified foods
Infers nutritional information from the most relevant matches
Applies similarity thresholds to enforce high-confidence retrieval
Generates personalised recommendations with reasoning constrained to the retrieved context.

Food Logging:

Allows users to search and log foods with nutritional values
Provides an encouraging message after each log, reflecting on the meal and suggesting a small improvement
Dynamically tracks progress against calorie and protein targets
Allows users to edit entries and personalise their profile

Architecture

Built as a set of containerised services, each service focuses on a seperate responsibility:

API – Accepts images, orchestrates services and sends context to the LLM.
Vision – Preprocesses images and extracts text using PaddleOCR and OpenCV.
Vector – Generates embeddings and performs vector search using Qdrant.
Database – Stores nutritional data and food logs using SQLite and SQLAlchemy.
Local LLM – An Ollama-hosted model that generates recommendations grounded in retrieved context.

Each service follows a consistent internal structure:

Routes expose APIs and handle request/response flow
Services encapsulate business logic, keeping routes focused on orchestration
Schemas enforce data structure and ensure consistency across services

Each service is visible via structured logging and debugging endpoints. I focused on keeping services modular to support scalability and maintainability, so components, Docker images and LLM models could be easily swapped while keeping the system robust.

Demonstrates

Deploying containerised services orchestrated with Docker Compose.
Detecting, segmenting and extracting text and layout using OCR.
Using RAG to ground LLM in retrieved context, reducing hallucinations
Building a full-stack app with a clean, modular frontend and backend.
Implementing structured logging and debugging endpoints across services.
Hosting and configuring local Large Language Models

Tech Stack

Python, FastAPI, Docker, Ollama, Qdrant, PaddleOCR, OpenCV, SQLAlchemy, Pandas

Possible Extensions

Long-term memory via a wiki layer to build upon LLM's insights and context.
Conversational queries using previous messages and retrieved context to support follow-up questions.
Multi-user authentication.
Long-term nutritional analytics and progress tracking.