Publications

MEENA (PersianMMMU): Multimodal‑Multilingual Educational Exams for N‑level Assessment

Published in Under review (COLM 2025), 2025

Presents the first large‑scale Persian multimodal benchmark for evaluating vision‑language models on scientific reasoning, problem‑solving and human‑level understanding. Contains 7,500 Persian and 3,000 English multimodal questions with rich metadata such as difficulty, descriptive answers and student success rates and evaluates GPT‑4, Gemini and other models under zero‑shot, few‑shot and hallucination detection settings.

Download here

Context Awareness Gate for Retrieval‑Augmented Generation

Published in 15th IKT (accepted), 2024

Introduces the Context Awareness Gate (CAG), a mechanism that dynamically decides whether a query requires external context retrieval in a retrieval‑augmented generation pipeline. Includes a vector‑candidates method for scalable, LLM‑independent semantic search and demonstrates that skipping unnecessary retrieval improves answer quality.

Download here

Leveraging Retrieval‑Augmented Generation for Persian University Knowledge Retrieval

Published in 15th IKT (accepted – oral), 2024

Proposes a two‑stage retrieval‑augmented generation pipeline that combines Persian large language models with tailored prompt engineering to answer university‑related queries. Introduces the UniversityQuestionBench dataset and evaluates performance using faithfulness, answer relevance and context relevance metrics.

Download here

Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision‑Language Models

Published in NeurIPS 2024 (Datasets & Benchmarks Track), 2024

Introduces IllusionBench, a dataset that hides letters, faces and animals inside everyday scenes to audit whether modern vision‑language models can recognize abstract shapes. Human subjects achieve near‑perfect accuracy on the tasks, whereas state‑of‑the‑art models score below 40 % zero‑shot, revealing significant robustness gaps.

Download here

CLIP Exhibits Improved Compositional Generalization Through Representation Disentanglement

Published in Preprint (ICLR 2024 submission), 2023

Investigates how the compositional out‑of‑distribution generalization of CLIP models emerges from training data diversity and representation disentanglement. Demonstrates that richer attribute–object combinations in the training set lead to improved performance and that disentangling image and text representations enhances compositional generalization.

Download here