Arshia Hemmat

Hello, I'm Arshia Hemmat

I am a researcher in the Department of Computer Science at the University of Oxford and Computer Engineering student at University of Isfahan. My research focuses on trustworthy multimodal AI, including generative models, retrieval‑augmented pipelines and evaluation benchmarks for vision‑language systems. I am passionate about developing models that generalise from limited data and building open datasets to advance the research community.

Profile Photo

Selected Publications

IllusionBench dataset

Hidden in Plain Sight

NeurIPS 2024 – Datasets & Benchmarks

IllusionBench hides letters, faces and animals inside everyday scenes to audit whether vision‑language models recognise abstract shapes. Humans achieve near‑perfect scores, while state‑of‑the‑art models struggle.

University Knowledge Retrieval

Retrieval‑Augmented Generation for Persian University Knowledge

15th IKT 2024 – Oral

A two‑stage RAG pipeline combines Persian large language models with prompt engineering and introduces UniversityQuestionBench to answer university‑related queries with high faithfulness.

Context Awareness Gate

Context Awareness Gate

15th IKT 2024 – Accepted

Proposes a gate that decides whether a query requires external context before retrieval. Skipping irrelevant retrieval improves the quality of RAG systems.

LLM Requirement Engineering

LLM Requirement Engineering Survey

Frontiers in Computer Science 2025

A systematic review of how large language models support requirement engineering, highlighting applications, challenges and future research directions.

VideoRAG Adaptive Chunking

Adaptive Chunking for VideoRAG

29th CSICC 2025 – Accepted

Extends retrieval‑augmented generation to the video domain using adaptive chunking, which segments long videos into meaningful units before retrieval.

Mutation Testing

Advanced Mutation Testing

29th CSICC 2025 – Under review

Explores zero‑ and few‑shot mutation testing using GPT‑4 to automatically evaluate software test suites and highlight weak spots in code bases.

MEENA PersianMMMU

MEENA (PersianMMMU)

COLM 2025 – Under review

Introduces a large‑scale multimodal benchmark of Persian and English educational exams for assessing scientific reasoning and problem solving in vision‑language models.

Get in Touch

I'm always happy to discuss research collaborations or answer questions. Feel free to reach out!

Email (University): arshia@robots.ox.ac.uk

Email (Gmail): Arshiahemmat6@gmail.com

LinkedIn: linkedin.com/in/arshiahemmat

Location: Tehran, Iran