Monday, March 10, 2025 12:00pm to 1:00pm
About this Event
250 Hutchison Rd, Rochester, NY 14620
"Towards Newtonian Understanding of Deep Learning Training Dynamics"
Abstract: Deep learning models have shown impressive performance in practice, whose capabilities are gained from learning relevant patterns from data during training. In this talk, I will describe some recent progress on analyzing the training dynamics of transformers. First, I will show a dynamical system framework which we use to study how transformers learn a word co-occurrence task. Then, I will show how this framework can be generalized to study more complicated training dynamics on a task of mixture of linear classification. Finally, I will talk about application of deep learning theory to understand practical phenomena in sparse neural networks.
Bio: Hongru Yang is currently a final-year PhD candidate in Computer Science at UT Austin, advised by Atlas Wang. His research is focusing on the theory aspect of deep learning and optimization. He has spent two years as a visiting student in Princeton University, hosted by Jason D. Lee. His work has been published in NeurIPS, ICLR, JMLR and AISTATS.
0 people are interested in this event
User Activity
No recent activity