|
Yiping Wang 王宜平
About me
I'm a Ph.D. student in Paul G. Allen School of Computer Science & Engineering at University of Washington since Fall 2023.
I feel very fortunate to work with Prof. Simon Shaolei Du.
Prior to UW, I studied Computer Science and Mathematics in Zhejiang University, where I earned an honors degree from Chu Kochen Honors College.
I'm grateful to all my collaborators and mentors along the way.
I've been privileged to work closely with Dr. Yuandong Tian since Spring 2023.
I've been interning at Microsoft since June 2024, where I'm fortunate to be advised by Yelong Shen and Shuohang Wang.
During my undergraduate, I was fortunate to work with Prof. Huaxiu Yao and Prof. Linjun Zhang.
My long-term research goal is to develop safe and scalable AI systems with super-human capabilities that can drive significant scientific progress.
Recently, I'm focusing on reinforcement learning for reasoning in large language models and AI for math.
I have also explored diverse topics, including multimodal and machine learning theory.
News
10/2025: Get Amazon AI Ph.D. Fellowship, thanks Amazon!
08/2025: Give a talk on One-Shot RLVR at a group meeting at Tsinghua University.
05/2025: Release Spurious Rewards, which uses RLVR with random reward to incentivize the reasoning capability of pretrained models.
05/2025: Present One-Shot RLVR in BAAI Talk.
04/2025: Release One-Shot RLVR (Code, X), rank as #1 Paper of the day on HuggingFace Daily Papers! We find that with a strong base model, RLVR can improve LLM reasoning with only one proper training example.
12/2024: Release a new video generation benchmark StoryEval, showing that current top video generative models can not present multi-event stories like "How to Put an Elephant in a Refrigerator".
09/2024: Attend MoDL 2024 in New York sponsored by Simons Foundation, and presenting CLIPLoss (NeurIPS 2024 spotlight).
06/2024: Start my internship at Microsoft!
05/2024: Release CLIPLoss, which designs a simple but efficient data selection methods for CLIP pretraining, gets the new SOTA in DataComp benchmark.
10/2023: Release JoMA, which analyzes the training dynamics of multilayer transformer and characterizes the role of self-attention and MLP nonlinearity.
09/2023: Become a husky in UW!
05/2023: Release Scan&Snap, which analyzes the training dynamics of 1-layer linear transformer with next token prediction loss.
Main Research
(* denotes equal contribution or alphabetic ordering, † denotes corresponding author)
LLM RL
We analyze the empirical observations of Reinforcement Learning with Verifiable Rewards (RLVR) on Large Language Models (LLMs).
|
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang†, Qing Yang, Zhiyuan Zeng, Liliang Ren, Lucas Liu, Baolin Peng, Hao Cheng, Xuehai He, Kuan Wang, Jianfeng Gao, Weizhu Chen, Shuohang Wang†, Simon Shaolei Du†, Yelong Shen†
Preprint
[Arxiv]
[Code]
[W&B]
[X]
tl;dr: We show that for RLVR on LLMs, one proper training example can already bring non-trivial improvement.
|
Multimodal
We study the data selection for multimodal contrastive learning, and investigate the problems of long video generation.
|
Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation
Yiping Wang, Xuehai He, Kuan Wang, Luyao Ma, Jianwei Yang, Shuohang Wang, Simon Shaolei Du, Yelong Shen
CVPR 2025
[Arxiv]
[Code]
[Poster]
[X]
[Website]
tl;dr: Current top video generative models can not present multi-event stories like "How to Put an Elephant in a Refrigerator".
|
|
CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning
Yiping Wang*, Yifang Chen*, Wendan Yan, Alex Fang, Wenjing Zhou, Kevin Jamieson, Simon Shaolei Du
NeurIPS 2024 (Spotlight)
[Arxiv]
[Code]
[Poster]
[X]
[Previous Versions]
tl;dr: We design simple but efficient data selection methods for CLIP pretraining, and get new SOTA in DataComp benchmark.
|
Theory of Transformer Dynamics
We attempted to analyze the training dynamics of transformers in a mathematical way.
|
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Yuandong Tian, Yiping Wang, Beidi Chen, Simon Shaolei Du
NeurIPS 2023
(Oral presentation @ ICML2023-HiDL)
[Arxiv]
[Poster]
[X]
tl;dr: We analyze the 1-layer transformer with next token prediction loss, and rigorously prove its training process.
|
|
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Yuandong Tian, Yiping Wang, Zhenyu Zhang, Beidi Chen, Simon Shaolei Du
ICLR 2024
[Arxiv]
[X]
tl;dr: We analyze the training dynamics of multilayer transformer, characterizing the role of self-attention and MLP nonlinearity.
|
|