Aaryamonvikram Singh

Research Engineer at MBZUAI focused on shipping reasoning-first and multilingual LLMs with rigorous evaluation. I build the eval harnesses, data pipelines, and release tooling behind models like K2 and Nanda — and I care about making them safe, reliable, and multilingual.

Experience

Research Engineer Sep 2025 – Present

MBZUAI — Institute of Foundation Models, Abu Dhabi

Co-authored K2-V2 (70B) and K2-Think (32B); supported K2-Think V2 (70B) release and evaluation
Built evaluation tooling for long-context, math, code, and safety benchmarks — prompting, deterministic scoring, reporting
Added regression tests and automated reports to catch quality/safety regressions pre-release
Delivered technical talks and office hours for the K2-Think Hackathon series

Research Assistant Oct 2024 – Aug 2025

MBZUAI — Institute of Foundation Models, Abu Dhabi

Led Nanda Family 10B/87B development and release (EACL 2026); drove bilingual Hindi–English data strategy end-to-end
Contributed dataset curation and evaluation for Jais-2 (Arabic) and Sherkala-Chat (Kazakh, COLM 2025)
Curated Suraksha Eval (Hindi safety benchmark) and built Hindi TxT360 for pretraining data
Co-developed FinChain — financial reasoning benchmark across 12 domains, 30+ LLMs benchmarked

Research Fellow Mar 2024 – Sep 2024

SimPPL (supervised by Swapneel Mehta), Remote

Designed multi-agent experiments to measure and reduce fake-news sharing between LLM agents
Collaborated with postdocs at MIT, Princeton, and Oxford on intervention design and evaluation

NLP Intern Jun 2023 – Jan 2024

MBZUAI — Prof. Preslav Nakov, Abu Dhabi

Built a multithreaded Python pipeline collecting 160K+ news articles from 5K+ sources
Shipped an end-to-end media factuality and bias scoring system (Streamlit, FastAPI, SQLite)
Trained transformers and NELA+CatBoost ensembles for article-level prediction and source profiling