Selected Publications

For the comprehensive list, check out my Google Scholar page.

(* denotes equal contribution or alphabetic ordering.)

alt text 

CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning [Arxiv] [Code] [Poster] [Twitter] [Previous Versions]
Yiping Wang*, Yifang Chen*, Wendan Yan, Alex Fang, Wenjing Zhou, Kevin Jamieson, Simon S. Du
NeurIPS 2024 (Spotlight)

tl;dr: We design universal data selection methods for CLIP pretraining and achieve near SOTA results with less than 10% of preprocessing resources. It can obtain a new SOTA in DataComp benchmark when combined with other approaches.

alt text 

JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention [Arxiv] [Twitter]
Yuandong Tian, Yiping Wang, Zhenyu Zhang, Beidi Chen, Simon S. Du
ICLR 2024

tl;dr: We analyze the training dynamics of multilayer transformer, characterizing the role of self-attention, MLP nonlinearity, and the learning procedure of hierarchical structure, if the data follow hierarchical generative models.

alt text 

Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer [Arxiv] [Poster] [Twitter]
Yuandong Tian, Yiping Wang, Beidi Chen, Simon S. Du
NeurIPS 2023
Oral presentation at High-dimensional learning dynamics workshop @ ICML 2023

tl;dr: We analyze the 1-layer transformer with next token prediction loss, and rigorously prove its training process and reveal how the token is combined via self-attention layer and the nature of its inductive bias.

alt text 

Improved Active Multi-Task Representation Learning via Lasso [Arxiv]
Yiping Wang, Yifang Chen, Kevin Jamieson, Simon S. Du
ICML 2023

tl;dr: We improve the sample complexity of active multi-task representation learning by proposing a new LASSO-based strategy.

alt text 

C-Mixup: Improving Generalization in Regression [Arxiv] [Code]
Huaxiu Yao*, Yiping Wang*, Linjun Zhang, James Zou, Chelsea Finn
NeurIPS 2022

tl;dr: We propose a simple yet effective data augmentation method to improve generalization on regression tasks.