Architecture Overview
Last updated
Last updated
The Visual RAG pipeline consists of the following key components:
Data Ingestion: PDFs are uploaded to Apolo’s object storage and processed by a job that uses ColPali to generate embeddings for text and images.
Storage:
LanceDB serves as the vector database for storing embeddings.
Apolo’s storage backend is used to persist raw data and intermediate outputs.
Query Handling:
User queries are embedded using ColPali.
LanceDB retrieves the most relevant PDF pages (text and image embeddings).
Response Generation: A visual LLM takes retrieved pages and the user query as input, generating a comprehensive answer.
Visualization: Results are displayed via a Streamlit dashboard, showing the top-matched images and the LLM’s response.