IndicIFEval is an evaluation dataset to assess the instruction-following capability of LLMs in Indic languages with verifiable, rule-based constraints.
Currently it supports 14 Indic languages, in addition to English.
| Assamese (as) | Nepali (ne) |
| Bengali (bn) | Odia (or) |
| Gujarati (gu) | Punjabi (pa) |
| Hindi (hi) | Sanskrit (sa) |
| Kannada (kn) | Tamil (ta) |
| Malayalam (ml) | Telugu (te) |
| Marathi (mr) | Urdu (ur) |
This repository contains the complete codebase for the IndicIFEval benchmark from Data Creation to Benchmark Evaluation.
This includes the scripts for indicifeval-trans creation, the pipeline for indicifeval-ground creation, and the custom evaluation configurations required for evaluation with lm-evaluation-harness.
Clone the repository and install the required dependencies. Please refer to the individual README for detailed usage.
The indicifeval-trans directory contains scripts to translate the English IFEval dataset into 14 Indic languages. Navigate to this directory and execute the main translation script to generate the localized prompts.
The indicifeval-ground directory houses the pipeline for synthetically generating instructions from native Indic content.
We use the Language Model Evaluation Harness for benchmarking. The lm-evaluation-harness directory contains the custom configurations required for our tasks. You must run the evaluation script specifying the model and the specific task configuration.
If you use IndicIFEval in your work, please cite us:
@article{jayakumar2026indicifeval,
title={IndicIFEval: A Benchmark for Verifiable Instruction-Following Evaluation in 14 Indic Languages},
author={Thanmay Jayakumar and Mohammed Safi Ur Rahman Khan and Raj Dabre and Ratish Puduppully and Anoop Kunchukuttan},
year={2026},
eprint={2602.22125},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2602.22125},
}This dataset is released under the CC BY 4.0.