03.02 10:00 - 10:45 USI East Campus, Room C1.03 |
|---|
| |
|---|
Abstract: In the theory part of the talk, we study the statistical performance of Maximum Likelihood Estimation (MLE). While MLE is known to be minimax optimal for low-complexity models, classical work showed that it can be suboptimal over “large” function classes. First, we develop a technique for detecting and quantifying the sub-optimality of MLE in regression over high-dimensional nonparametric classes. Secondly, we show that the variance term of MLE is always upper-bounded by the minimax rate, implying that any minimax sub-optimality must arise from bias. In the applied part of the talk, we propose an explanation for the empirical success of Test-Time Training (TTT) in foundation models, which we primarily validate through experiments with sparse autoencoders (SAEs).
Host: Prof. Ernst Wit | |
|---|
|
|---|
|
|
| | Gil Kur is a postdoctoral fellow at ETH Zürich, primarily hosted by Andreas Krause. He completed his PhD in Electrical Engineering and Computer Science at MIT under the supervision of Sasha Rakhlin and earned an MSc from the Weizmann Institute of Science under the supervision of Boaz Nadler. His research focuses on statistical learning theory, nonparametric and high-dimensional statistics, and methodology for foundations models. 10:00 |
|---|
| |
|---|
|
|
|
|