Architecture Overview

The Visual RAG pipeline consists of the following key components:

  1. Data Ingestion: PDFs are uploaded to Apolo’s object storage and processed by a job that uses ColPali to generate embeddings for text and images.

  2. Storage:

  • LanceDB serves as the vector database for storing embeddings.

  • Apolo’s storage backend is used to persist raw data and intermediate outputs.

  1. Query Handling:

  • User queries are embedded using ColPali.

  • LanceDB retrieves the most relevant PDF pages (text and image embeddings).

  1. Response Generation: A visual LLM takes retrieved pages and the user query as input, generating a comprehensive answer.

  2. Visualization: Results are displayed via a Streamlit dashboard, showing the top-matched images and the LLM’s response.

Last updated

Was this helpful?