Yiping Wang 王宜平

alt text 

Yiping Wang
Ph.D student
Paul G. Allen School of Computer Science & Engineering,
University of Washington
Email: ypwang61@cs.washington.edu

Google Scholar / X / Github / LinkedIn

About me

I'm a Ph.D. student in Paul G. Allen School of Computer Science & Engineering at University of Washington since Fall 2023. I feel very fortunate to work with Prof. Simon Shaolei Du. I am supported by Amazon AI Ph.D. Fellowship. From June 2024 to November 2025, I interned at Microsoft, where I was fortunate to be advised by Yelong Shen and Shuohang Wang. Prior to UW, I studied Computer Science and Mathematics at Zhejiang University, where I earned an honors degree from Chu Kochen Honors College.

My long-term research goal is to develop safe and scalable AI systems with super-human capabilities that can drive significant scientific progress. I believe one of the key challenges toward this goal is how to generalize well with much less supervision, which is essential to prevent reward hacking. Recently, I'm focusing on reinforcement learning for reasoning in large language models and AI for math. I have also explored diverse topics, including multimodal and machine learning theory.

Key News

  • 11/2025: Release ThetaEvolve. It enables RL training on dynamic environments like AlphaEvolve, and scales test-time learning (pure inference or RL training) for evolution on open optimization problems. We show that an 8B model can achieve better best-known bounds on open problem like Circle Packing under ThetaEvolve.

  • 11/2025: Release RLVE, where we scale RL with adaptive, verifiable environments with auto-tuning difficulty for model capability frontiers.

  • 10/2025: Honored to receive the Amazon AI Ph.D. Fellowship!

  • 05/2025: Release Spurious Rewards, which uses RLVR with random reward to incentivize the reasoning capability of pretrained models.

  • 05/2025: Present One-Shot RLVR in BAAI Talk.

  • 04/2025: Release One-Shot RLVR (Code, X), rank as #1 Paper of the day on HuggingFace Daily Papers! We find that with a strong base model, RLVR can improve LLM reasoning with only one proper training example.

  • 12/2024: Release a new video generation benchmark StoryEval, showing that current top video generative models can not present multi-event stories like "How to Put an Elephant in a Refrigerator".

  • 06/2024: Start my internship at Microsoft!

  • 05/2024: Release CLIPLoss, which designs a simple but efficient data selection methods for CLIP pretraining, gets the new SOTA in DataComp benchmark.

  • 10/2023: Release JoMA, which analyzes the training dynamics of multilayer transformer and characterizes the role of self-attention and MLP nonlinearity.

  • 09/2023: Become a husky in UW!

  • 05/2023: Release Scan&Snap, which analyzes the training dynamics of 1-layer linear transformer with next token prediction loss.