The Apolo Documentation Chatbot enables users to query Apolo’s technical documentation seamlessly, providing quick and accurate answers. Here’s how we built it.
Step 1: Set Up the Apolo RAG Architecture
The first step involves preparing the data infrastructure to support efficient querying and response generation. Here's what we'll do:
Define the data storage structure: Create a PostgreSQL schema with vector extensions to store embeddings and enable full-text indexing for fast retrieval.
Chunk the documentation: Preprocess the Apolo documentation into manageable text chunks for embeddings and efficient retrieval.
Generate embeddings: Use an embedding LLM to convert text chunks into numerical representations for semantic search.
Ingest data into PostgreSQL: Store the processed chunks and their embeddings in the database for future queries.
Here’s how we implemented this:
defbuild_apolo_docs_rag(): table_name ="apolo_docs" chunk_size =1024 chunk_overlap =100print("1. Processing data") apolo_docs_path =clone_repo_to_tmp( repo_url="https://github.com/neuro-inc/platform-docs.git" ) markdown_files = glob.glob(os.path.join(apolo_docs_path, "**/*.md"), recursive=True) docs =list( chain.from_iterable( [UnstructuredMarkdownLoader(f).load() for f in markdown_files] ) ) chunks =RecursiveCharacterTextSplitter( chunk_size=chunk_size, chunk_overlap=chunk_overlap ).split_documents(docs)print("2. Get ebeddings") sentences = [x.page_content for x in chunks] embeddings =get_embeddings(sentences=sentences)print("3. Ingest data")create_schema(table_name=table_name, dimensions=len(embeddings[0]))insert_data( table_name=table_name, embeddings=embeddings, sentences=sentences, batch_size=64 )
Breaking Down the Steps
Processing Data:
The clone_repo_to_tmp() function pulls the Apolo documentation repository, and the UnstructuredMarkdownLoader processes .md files into raw text. The text is then chunked into overlapping segments using RecursiveCharacterTextSplitter, which ensures each chunk retains contextual relevance.
Generating Embeddings:
To represent text chunks numerically, we use the get_embeddings() function. It leverages the embedding LLM hosted on Apolo’s platform to create vector representations for semantic search.
defget_embeddings(sentences: List[str],batch_size:int=4) -> List[List[float]]: embeddings = [] embedding_client =get_embedding_client()for i intqdm(range(0, len(sentences), batch_size)): sentences_batch = sentences[i : i + batch_size] response_batch = embedding_client.embeddings.create( input=sentences_batch, model="tgi" ) embeddings.extend([x.embedding for x in response_batch.data])return embeddings
Ingesting Data: The processed chunks and embeddings are stored in PostgreSQL. Using vector extensions (pgvector), we create a table with a schema that supports vector-based operations for semantic search.
defcreate_schema(table_name:str,dimensions:int): conn =get_db_connection() conn.execute("CREATE EXTENSION IF NOT EXISTS vector") conn.execute("CREATE EXTENSION IF NOT EXISTS pg_trgm;")register_vector(conn) conn.execute(f"DROP TABLE IF EXISTS {table_name}") conn.execute(f"CREATE TABLE {table_name} (id bigserial PRIMARY KEY, content text, embedding vector({dimensions}))" ) conn.execute(f"CREATE INDEX ON {table_name} USING GIN (to_tsvector('english', content))" )
Step 2: Query the Apolo Documentation
Once the RAG architecture is set up, the next step is enabling queries. The system retrieves relevant documentation chunks, generates a response using a generative LLM, and logs the interaction for continuous improvement.
Here’s the query flow:
Retrieve relevant chunks:
Use semantic search to find embeddings closest to the query embedding.
Use keyword search for matching phrases or terms in the text.
Re-rank results: Combine results from semantic and keyword searches and sort them by relevance using a reranker model.
Generate the response: Augment the top-ranked chunks with the user query to create a context-rich prompt for the generative LLM.
Log results: Store the query, context, and response in Argilla for feedback and future fine-tuning.