Leveraging Retrieval‑Augmented Generation for Persian University Knowledge Retrieval
Published in 15th IKT (accepted – oral), 2024
In this work, the authors develop a two‑stage retrieval‑augmented generation (RAG) pipeline to answer questions about university resources using locally scraped documents. Queries are first categorized to identify the most relevant subset of documents; a Persian large language model then generates answers using a carefully engineered prompt. The paper introduces UniversityQuestionBench (UQB), a benchmark derived from frequently asked questions by students across disciplines, and evaluates the RAG system using faithfulness, answer relevance and context relevance metrics. Experiments demonstrate that incorporating retrieval steps significantly improves the precision and contextual relevance of generated answers when compared with baseline models.