Haolin Liu

Department of Computer Science, University of Virginia

Hi! I am Haolin Liu, a third-year PhD student at the University of Virginia, where I am fortunate to be advised by Prof. Chen-Yu Wei. Prior to this, I received my bachelor’s degree in Computer Science from ShanghaiTech University, where I studied chemistry for 1.5 years before transitioning to computer science for 2.5 years.

I am interested in developing principled and practical algorithms for Reinforcement Learning (RL), and understanding the training dynamic of these algorithms. Recently, I mainly focus on RL theory and RL for LLM reasoning.

  • On the theoretical side, I aim to uncover unified principles for RL algorithm design and identify the minimal structural assumptions needed for sample-efficient RL. My recent works ([1], [2]) propose the most unified frameworks for RL theory to date, capable of handling both model-based and model-free RL in stationary and non-stationary environments.
  • On the practical side, I study the limitations of existing RL algorithms and develop new methods to overcome them. Recently, I have focused on better leveraging process supervision in RL and designing new exploration strategies to enhance LLM reasoning.

selected publications

  1. Preprint
    An Improved Model-Free Decision-Estimation Coefficient with Applications in Adversarial MDPs
    (α-β) Haolin Liu, Chen-Yu Wei, and Julian Zimmert
    2025
  2. MATH-AI
    One Token to Fool LLM-as-a-Judge
    Yulai Zhao*Haolin Liu*, Dian Yu, S.Y. Kung, Haitao Mi, and Dong Yu
    NeurIPS 2025 MATH-AI Workshop, 2025
  3. COLT
    Decision Making in Hybrid Environments: A Model Aggregation Approach
    (α-β) Haolin Liu, Chen-Yu Wei, and Julian Zimmert
    COLT, 2025
  4. NeurIPS
    Beating Adversarial Low-Rank MDPs with Unknown Transition and Bandit Feedback
    (α-β) Haolin Liu, Zakaria Mhammedi, Chen-Yu Wei, and Julian Zimmert
    NeurIPS, 2024
  5. NeurIPS
    Corruption-Robust Linear Bandits: Minimax Optimality and Gap-Dependent Misspecification
    (α-β) Haolin Liu, Artin Tajdini, Andrew Wagenmaker, and Chen-Yu Wei
    NeurIPS, 2024
  6. ICLR
    Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback (Spotlight)
    (α-β) Haolin Liu, Chen-Yu Wei, and Julian Zimmert
    ICLR, 2024
  7. NeurIPS
    Bypassing the simulator: Near-optimal adversarial linear contextual bandits
    (α-β) Haolin Liu, Chen-Yu Wei, and Julian Zimmert
    NeurIPS, 2023