Research Engineer at MBZUAI focused on shipping reasoning-first and multilingual LLMs with rigorous evaluation. I build the eval harnesses, data pipelines, and release tooling behind models like K2 and Nanda — and I care about making them safe, reliable, and multilingual.
Chain-of-thought, parameter-efficient reasoning, 70B scale, K2 series
Hindi, Arabic, Kazakh — bilingual data curation and strategy
Eval harnesses, safety benchmarks, regression testing, long-context
CI/CD for models, automated reporting, HuggingFace, open-weight releases
Large-scale collection, curation, quality filtering, pretraining data
Multi-agent simulation, factuality scoring, bias detection
Open to research engineering & applied ML roles.