We introduce Grade School Math with Distracting Context (GSM-DC), a synthetic benchmark to evaluate Large Language Models’ (LLMs) reasoning robustness against systematically controlled irrelevant context (IC). GSM-DC constructs symbolic reasoning graphs with precise distractor injections, enabling rigorous, reproducible evaluation. Our experiments demonstrate that LLMs are significantly sensitive to IC, affecting both reasoning path selection and arithmetic accuracy. Additionally, training models with strong distractors improves performance in both in-distribution and out-of-distribution scenarios. We further propose a stepwise tree search guided by a process reward model, which notably enhances robustness in out-of-distribution conditions.
@article{yang2025llmreasoningdistractedirrelevant,title={How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark},author={Yang, Minglai and Huang, Ethan and Zhang, Liang and Surdeanu, Mihai and Wang, William and Pan, Liangming},year={2025},eprint={2505.18761},archiveprefix={arXiv},primaryclass={cs.CL},url={https://arxiv.org/abs/2505.18761},google_scholar_id={9yKSN-GCB0IC},}
We investigate the usage of Large Language Model (LLM) in collecting high-quality data to warm-start Reinforcement Learning (RL) algorithms for learning in some classical Markov Decision Process (MDP) environments. In this work, we focus on using LLM to generate an off-policy dataset that sufficiently covers state-actions visited by optimal policies, then later using an RL algorithm to explore the environment and improve the policy suggested by the LLM. Our algorithm, LORO, can both converge to an optimal policy and have a high sample efficiency thanks to the LLM’s good starting policy. On multiple OpenAI Gym environments, such as CartPole and Pendulum, we empirically demonstrate that LORO outperforms baseline algorithms such as pure LLM-based policies, pure RL, and a naive combination of the two, achieving up to 4\times the cumulative rewards of the pure RL baseline.
@article{duong2025improvingdataefficiencyreinforcementlearning,title={Improving the Data-efficiency of Reinforcement Learning by Warm-starting with LLM},author={Duong, Thang and Yang, Minglai and Zhang, Chicheng},year={2025},eprint={2505.10861},archiveprefix={arXiv},primaryclass={cs.LG},url={https://arxiv.org/abs/2505.10861},google_scholar_id={d1gkVwhDpl0C},}
We introduce CopySpec, a simple yet effective technique to tackle the inefficiencies LLMs face when generating responses that closely resemble previous outputs or responses that can be verbatim extracted from context. CopySpec identifies repeated sequences in the model’s chat history or context and speculates that the same tokens will follow, enabling seamless copying without compromising output quality and without requiring additional GPU memory. To evaluate the effectiveness of our approach, we conducted experiments using seven LLMs and five datasets: MT-Bench, CNN/DM, GSM8K, HumanEval, and our newly created dataset, MT-Redundant. MT-Redundant, introduced in this paper, transforms the second turn of MT-Bench into a request for variations of the first turn’s answer, simulating real-world scenarios where users request modifications to prior responses. Our results demonstrate significant speed-ups: up to 2.35x on CNN/DM, 3.08x on the second turn of select MT-Redundant categories, and 2.66x on the third turn of GSM8K’s self-correction tasks. Importantly, we show that CopySpec integrates seamlessly with speculative decoding, yielding an average 49% additional speed-up over speculative decoding for the second turn of MT-Redundant across all eight categories. While LLMs, even with speculative decoding, suffer from slower inference as context size grows, CopySpec leverages larger contexts to accelerate inference, making it a faster complementary solution. Our code and dataset are publicly available at https://github.com/razvandu/copyspec.
@article{dumitru2025copyspecacceleratingllmsspeculative,title={CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality},author={Dumitru, Razvan-Gabriel and Yang, Minglai and Yadav, Vikas and Surdeanu, Mihai},year={2025},eprint={2502.08923},archiveprefix={arXiv},primaryclass={cs.CL},url={https://arxiv.org/abs/2502.08923},google_scholar_id={u-x6o8ySG0sC},}
2024
MCM
Dynamic Balances: Modelling Variable Sex Ratios in Lamprey Populations for Ecosystem Management
M.
Yang, M.
Abbasi, and S.
Soporboev
International Mathematical Contest in Modeling (MCM)., Tucson, Arizona. ** The Problem can be found here
, May 2024
First, we develop an Adaptive Population Control (ACC) model that explains the skewed sex ratio at high and low levels of resource availability as a control measure by lamprey to maximize their growth rate. We find that the advantage of a variable sex ratio is greatest under extremely poor or extremely suitable environmental conditions. Next, to account for the male skewed populations of lamprey at high densities, we develop a modified logistic growth model that allows for variable carrying capacity for males and females. Differential Carrying Capacity (DCC) model accurately predicts the optimal sex ratio to achieve the maximal growth rate at various population densities. In line with ACC, we use Aquatic Nutritional Network (ANN), a niche width differential equation model is developed by integrating concepts from Lotka-Volterra equations. to study the dynamics of lamprey and their native prey/predator. We find that variable sex ratios allow time for regeneration of the native prey in Lamprey.