Minglai Yang

📧 minglai.yang@scale.com

📍 San Francisco, CA, USA

I am a Research Scientist at Scale AI, where I work on agents and RL environments. Earlier in 2026, I was a Senior Member of Technical Staff at Abaka AI. I received my B.S. in University of Arizona logo Computer Science from the University of Arizona (GPA: 4.0/4.0) in Fall 2025, graduating summa cum laude in just over 2 years and receiving the Best Senior Award.

My research focuses on building LLMs that are trustworthy: robust (EMNLP 25), explainable (TMLR 26) and useful (EMNLP 25). Ultimately, I’m interested in these two overarching questions:

🔍 Deconstruction of LLMs: How can we open the black box to reveal the internal mechanisms?
🛠️ Reconstruction toward Trustworthy LLMs: How do we translate mechanistic insight into models that are robust, explainable, and useful in practice?

During my undergraduate years, I was fortunate to conduct research co-advised by Profs. Mihai Surdeanu, Liangming Pan, Kobus Barnard and Steven Bethard, in CLULAB, IVILAB and ML4AI LAB. I also collaborated with Profs. Adarsh Pyarelal, William Yang Wang and Chicheng Zhang. As Founder & President of AI Club at UA, I ran workshops, hosted invited speakers, and led industry collaborations—raising $14K+ to support student AI research and education.

In summer 2025, I was a research intern at Knowledge Engineering Group (KEG), Tsinghua University, supervised by Prof. Juanzi Li, working on LLM reasoning mechanisms. Before that, I worked as a Machine Learning Engineer intern at CoreTechs.

news

Jul 03, 2026	AlignSAE was accepted to TMLR 🎉 — the action editor recommended “Accept as is”. Grateful to all my co-authors!
Jun 15, 2026	I officially joined Scale AI as an L4 Machine Learning Research Scientist on the Agents team, working on agents and RL environments! 🎉
May 01, 2026	EchoRL was accepted to ICML 2026 🎉 — reviving advantage-degenerated prompts in RLVR via rollout echoing. Congrats to all my co-authors!
Jan 05, 2026	New chapter: I joined Abaka AI as a Senior Member of Technical Staff! 🚀
Dec 19, 2025	I graduated from the University of Arizona with a B.S. in Computer Science — summa cum laude (`GPA: 4.0/4.0`) in just over 2 years, and received the Best Senior Award 🎓
Oct 19, 2025	We took 2nd place at the Reddit Wildcat Hackathon 2025!
Oct 17, 2025	Honored to earn UA’s Top 10 Undergraduate Research Travel Grant 🎓—headed to my EMNLP oral; see you in Suzhou. ✈️
Aug 20, 2025	Both of my submissions were accepted to EMNLP 2025 Main (Oral) 🎉 (Acceptance Rate: 22.16%). Grateful to all my co-authors, with special thanks to Profs. Liangming Pan, Mihai Surdeanu and William Wang.
Jun 05, 2025	I will be a research intern at THUKEG, Department of CS in Tsinghua University this summer advised by Prof. Juanzi Li, focusing on reasoning mechanism.
May 09, 2025	Galileo Circle Scholar, University of Arizona — Top 0.8% academic award.
Feb 18, 2025	As President of the AI Club at the University of Arizona, I led the club to raise over $14,000.

selected publications

TMLR
AlignSAE: Concept-Aligned Sparse Autoencoders

Minglai Yang^*, Xinyu Guo, Zhengliang Shi, Jinhe Bi, Steven Bethard, Mihai Surdeanu^*, and Liangming Pan^*

Transactions on Machine Learning Research (TMLR), 2026

Abs arXiv Bib HTML Code Slides Website

Large Language Models (LLMs) encode factual knowledge within hidden parametric spaces that are difficult to inspect or control. While Sparse Autoencoders (SAEs) can decompose hidden activations into more fine-grained, interpretable features, they often struggle to reliably align these features with human-defined concepts, resulting in entangled and distributed feature representations. To address this, we introduce AlignSAE, a method that aligns SAE features with a predefined ontology through a "pre-train, then post-train" curriculum. After an initial unsupervised training phase, we apply supervised post-training to bind specific concepts to dedicated latent slots while preserving the remaining capacity for general reconstruction. This separation creates an interpretable interface where specific concepts can be inspected and controlled without interference from unrelated features. Empirical results demonstrate that AlignSAE enables precise causal interventions, such as reliable "concept swaps", by targeting single, semantically aligned slots, and further supports multi-hop reasoning and a mechanistic probe of grokking-like generalization dynamics.
@article{yang2025alignsaeconceptalignedsparseautoencoders, title = {AlignSAE: Concept-Aligned Sparse Autoencoders}, author = {Yang, Minglai and Guo, Xinyu and Shi, Zhengliang and Bi, Jinhe and Bethard, Steven and Surdeanu, Mihai and Pan, Liangming}, journal = {Transactions on Machine Learning Research (TMLR)}, year = {2026}, eprint = {2512.02004}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, url = {https://arxiv.org/abs/2512.02004}, google_scholar_id = {Tyk-4Ss8FVUC}, }
EMNLP
How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark

Minglai Yang^*, Ethan Huang , Liang Zhang, Mihai Surdeanu, William Wang, and Liangming Pan^*

Oral Presentation

EMNLP Main Conference , 2025

Abs arXiv Bib PDF Code Poster Slides Website

We introduce Grade School Math with Distracting Context (GSM-DC), a synthetic benchmark to evaluate Large Language Models’ (LLMs) reasoning robustness against systematically controlled irrelevant context (IC). GSM-DC constructs symbolic reasoning graphs with precise distractor injections, enabling rigorous, reproducible evaluation. Our experiments demonstrate that LLMs are significantly sensitive to IC, affecting both reasoning path selection and arithmetic accuracy. Additionally, training models with strong distractors improves performance in both in-distribution and out-of-distribution scenarios. We further propose a stepwise tree search guided by a process reward model, which notably enhances robustness in out-of-distribution conditions.
@article{yang2025llmreasoningdistractedirrelevant, title = {How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark}, author = {Yang, Minglai and Huang, Ethan and Zhang, Liang and Surdeanu, Mihai and Wang, William and Pan, Liangming}, year = {2025}, eprint = {2505.18761}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, url = {https://arxiv.org/pdf/2505.18761}, google_scholar_id = {UeHWp8X0CEIC}, }
EMNLP
CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality

Razvan-Gabriel Dumitru, Minglai Yang, Vikas Yadav, and Mihai Surdeanu

Oral Presentation

EMNLP Main Conference , 2025

Abs arXiv Bib PDF Code

We introduce CopySpec, a simple yet effective technique to tackle the inefficiencies LLMs face when generating responses that closely resemble previous outputs or responses that can be verbatim extracted from context. CopySpec identifies repeated sequences in the model’s chat history or context and speculates that the same tokens will follow, enabling seamless copying without compromising output quality and without requiring additional GPU memory. To evaluate the effectiveness of our approach, we conducted experiments using seven LLMs and five datasets: MT-Bench, CNN/DM, GSM8K, HumanEval, and our newly created dataset, MT-Redundant. MT-Redundant, introduced in this paper, transforms the second turn of MT-Bench into a request for variations of the first turn’s answer, simulating real-world scenarios where users request modifications to prior responses. Our results demonstrate significant speed-ups: up to 2.35x on CNN/DM, 3.08x on the second turn of select MT-Redundant categories, and 2.66x on the third turn of GSM8K’s self-correction tasks. Importantly, we show that CopySpec integrates seamlessly with speculative decoding, yielding an average 49% additional speed-up over speculative decoding for the second turn of MT-Redundant across all eight categories. While LLMs, even with speculative decoding, suffer from slower inference as context size grows, CopySpec leverages larger contexts to accelerate inference, making it a faster complementary solution. Our code and dataset are publicly available at https://github.com/razvandu/copyspec.
@article{dumitru2025copyspecacceleratingllmsspeculative, title = {CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality}, author = {Dumitru, Razvan-Gabriel and Yang, Minglai and Yadav, Vikas and Surdeanu, Mihai}, year = {2025}, eprint = {2502.08923}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, url = {https://arxiv.org/abs/2502.08923}, google_scholar_id = {u-x6o8ySG0sC}, }
ArXiv
Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

Minglai Yang^*, Xinyan Velocity Yu^*, Pengyuan Li, Xinyu Guo, Zhenting Qi, Konwoo Kim, Longtian Ye, Xiaolong Luo, Jinhe Bi , Henry Zhang , and 15 more authors

In Submission to EMNLP , 2026

Abs arXiv Bib

Document parsing and recognition are fundamental capabilities for vision-language models (VLMs) and document processing systems. However, existing Optical Character Recognition (OCR) and document parsing benchmarks are increasingly limited in coverage and difficulty: many focus on common document genres or uniformly sampled pages where modern parsers already perform strongly. Dr. DocBench provides expert-level, difficult document parsing evaluation with thousands of annotated pages from long documents, spanning 52 BISAC subject domains, selecting challenging documents through parser-failure-based sampling.
@misc{yang2026drdocbench, title = {Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing}, author = {Yang, Minglai and Yu, Xinyan Velocity and Li, Pengyuan and Guo, Xinyu and Qi, Zhenting and Kim, Konwoo and Ye, Longtian and Luo, Xiaolong and Bi, Jinhe and Zhang, Henry and Riaz, Haris and Zhang, Xuan and Xiao, Yunze and Liu, Bangya and Tang, Tom and Zhao, Yunfei and Lin, Qunshu and Wang, Zihan and Liu, Minghao and Li, Michael Lingzhi and Du, Yilun and Thomason, Jesse and Feris, Rogerio and Pentland, Alex and He, Zexue}, year = {2026}, eprint = {2606.01393}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, url = {https://arxiv.org/abs/2606.01393}, }