Architecture Overview
The Visual RAG pipeline consists of the following key components:
- Data Ingestion: PDFs are uploaded to Apolo’s object storage and processed by a job that uses ColPali to generate embeddings for text and images. 
- Storage: 
- LanceDB serves as the vector database for storing embeddings. 
- Apolo’s storage backend is used to persist raw data and intermediate outputs. 
- Query Handling: 
- User queries are embedded using ColPali. 
- LanceDB retrieves the most relevant PDF pages (text and image embeddings). 
- Response Generation: A visual LLM takes retrieved pages and the user query as input, generating a comprehensive answer. 
- Visualization: Results are displayed via a Streamlit dashboard, showing the top-matched images and the LLM’s response. 
Last updated
Was this helpful?
