Adaptive Chunking for VideoRAG Pipelines with a Newly Gathered Bilingual Educational Dataset

Published in 29th CSICC (accepted), 2025

This work extends retrieval‑augmented generation to the video domain by introducing adaptive chunking – a method for segmenting and encoding long videos into meaningful units before retrieval. Using a bilingual educational video dataset, the authors build a video‑to‑text pipeline where a vision‑language model summarises segments and a language model synthesises answers to questions. Adaptive chunking allows the system to balance context length and relevance, yielding improved accuracy compared to fixed‑length chunking strategies in experiments presented at the 29th Computer Society of Iran Conference on Informatics and Computing (CSICC).