250 Hutchison Rd, Rochester, NY 14620

"Towards Newtonian Understanding of Deep Learning Training Dynamics"

 

Abstract: Deep learning models have shown impressive performance in practice, whose capabilities are gained from learning relevant patterns from data during training. In this talk, I will describe some recent progress on analyzing the training dynamics of transformers. First, I will show a dynamical system framework which we use to study how transformers learn a word co-occurrence task. Then, I will show how this framework can be generalized to study more complicated training dynamics on a task of mixture of linear classification. Finally, I will talk about application of deep learning theory to understand practical phenomena in sparse neural networks. 

 

Bio: Hongru Yang is currently a final-year PhD candidate in Computer Science at UT Austin, advised by Atlas Wang. His research is focusing on the theory aspect of deep learning and optimization. He has spent two years as a visiting student in Princeton University, hosted by Jason D. Lee. His work has been published in NeurIPS, ICLR, JMLR and AISTATS. 

Event Details

See Who Is Interested

0 people are interested in this event


User Activity

No recent activity