Archive / INF Seminars / INF_2024_06_06_Zhang
USI - Email

Unifying Vision Representation


Host: Prof. Laura Pozzi




USI East Campus, Room C1.03
10:30 - 11:15

Tong Zhang
EPFL, Switzerland
The rapid advancement of artificial intelligence (AI) techniques, propelled by foundation models like LLM, has ignited transformative revolutions. However, unlike language, foundation models in other modalities are trailing behind, presenting hurdles in expanding AI systems to diverse applications. Vision, unlike language, comprises natural signals captured from the environment, encompassing diverse representations such as 3D structures like point clouds and meshes, as well as 2D images and videos. Consequently, vision involves a more intricate and redundant representation, making the development of a foundational model within the vision community a formidable task. This seminar aims to explore vision systems through the lens of self-supervised representation learning, a cornerstone in many foundation models including LLM. The talk will assess existing challenges in mainstream vision self-supervised learning methods, propose feasible solutions, and delve into promising directions for further investigation. Additionally, this talk will discuss future endeavors in developing versatile representations across modalities, tasks, and architectures, which can propel the evolution of the vision foundation model.

Tong Zhang received the B.S. and M.S. degrees from Beihang University, Beijing, China, and New York University, New York, United States in 2011 and 2014 respectively, and he received the Ph.D. degree from the Australian National University, Canberra, Australia in 2020. He is working as a postdoctoral researcher at the Image and Visual Representation Lab (IVRL), EPFL. He was awarded the ACCV 2016 Best Student Paper Honorable Mention and the CVPR 2020 Paper Award Nominee. His research interests include subspace clustering, deep geometric learning, 3d Vision, and representation learning.