Prof. Daniel Soudry-Generalization and efficiency in deep learning

סמינר מחלקת מערכות - EE Systems Seminar

11 במרץ 2024, 15:00

Electrical Engineering-Kitot Building 011 Hall

Prof. Daniel Soudry-Generalization and efficiency in deep learning

(The talk will be given in English)

Speaker: Prof. Daniel Soudry

Faculty of Electrical Engineering, Technion

hall, Electrical Engineering-Kitot Building‏ 011

Monday, March 11^th, 2024

15:00 - 16:00

Generalization and efficiency in deep learning

Abstract

The talk will have four separate parts
(1) We examine neural networks (NN) with uniform random weights, conditioned on zero training loss. We prove they typically generalize well if there exists an underlying narrow ``teacher NN" that agrees with the labels.
(2) We characterize the functions realized by shallow ReLU NN denoisers – in the common theoretical scenario of zero training loss with a minimal weight norm.

(3) We present a simple method to enable, for the first time, the usage of 12-bits accumulators in deep learning, with no significant degradation in accuracy. Also, we show that as we decrease the accumulation precision further, using fine-grained gradient approximations, can improve the DNN accuracy.
(4) We find an analytical relation between compute time properties and scalability limitations, caused by the compute variance of straggling workers in a distributed setting. Then, we propose "DropCompute", a simple yet effective decentralized method to reduce the variation among workers and thus improve the robustness of the common synchronous training.

Relevant papers (*Indicates equal contribution):

[1] G. Buzaglo*, I. Harel*, M. Shpigel Nacson* et al., " How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers". Preprint.

[2] C. Zeno et al., "How do Minimum-Norm Shallow Denoisers Look in Function Space?", NeurIPS 2023.
[3] Y. Blumenfeld et al. “Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators”, ICLR 2024.

[4] N. Giladi*, S. Gottlieb*, et al., "DropCompute: simple and more robust distributed synchronous training via compute variance reduction", NeurIPS 2023
Short Bio:
Daniel Soudry is an associate professor and Schmidt Career Advancement Chair in AI in the Electrical and Computer Engineering Department at the Technion, working in the areas of machine learning and neural networks. His recent works focus on resource efficiency and implicit bias in neural networks. He did his post-doc in the Department of Statistics and the Center for Theoretical Neuroscience at Columbia University, and his Ph.D. in the Electrical Engineering Department at the Technion. He is a member of Israel’s Young Academy, and the recipient of the Gruss Lipper fellowship, the Goldberg Award, the ERC starting grant, and Intel's Rising Star Faculty Award.

השתתפות בסמינר תיתן קרדיט שמיעה = עפ"י רישום שם מלא + מספר ת.ז. בטופס הנוכחות שיועבר באולם במהלך הסמינר