MEENA (PersianMMMU): Multimodal‑Multilingual Educational Exams for N‑level Assessment
Published in Under review (COLM 2025), 2025
Presents the first large‑scale Persian multimodal benchmark for evaluating vision‑language models on scientific reasoning, problem‑solving and human‑level understanding. Contains 7,500 Persian and 3,000 English multimodal questions with rich metadata such as difficulty, descriptive answers and student success rates and evaluates GPT‑4, Gemini and other models under zero‑shot, few‑shot and hallucination detection settings.
Download here