How can we narrow the gap between human and machine vision?
Video Record here
Today, modern deep neural networks (DNNs) routinely achieve human-level object recognition performance. However, their complexity makes it notoriously hard to understand how they arrive at a decision, which carries the risk that machine learning applications outpace our understanding of machine decisions—without knowing when machines will fail, and why; when machines will be biased, and why; when machines will be successful, and why. In order to improve our understanding of machine decision-making, I will present two behavioral comparisons of biological and artificial vision. The first one reveals striking discrepancies between machine vision and robust human perception: standard DNNs are prone to shortcut learning, a tendency to exploit unintended patterns that fail to generalize to out-of-distribution input. However, the second one - a large-scale distortion robustness benchmark - gives reason for cautious optimism: While there is still much room for improvement, the behavioral difference between human and machine vision is narrowing, with the best models now matching or exceeding human performance on most out-of-distribution distortion datasets. The single most important factor behind this success turns out to be a very simple one: not the type of training (e.g. self-supervised learning), not the type of model architecture, but instead the mere size of the training dataset. I will conclude by briefly discussing these findings in the context of the "bitter lesson" formulated by Rich Sutton, who argued that "building in how we think we think does not work in the long run".
Biosketch Robert Geirhos
Robert Geirhos recently obtained his PhD from the University of Tübingen and the International Max Planck Research School for Intelligent Systems, where he is working with Felix Wichmann, Matthias Bethge and Wieland Brendel. Robert holds a MSc degree in Computer Science, with distinction, and a BSc degree in Cognitive Science from the University of Tübingen. His studies were complemented by exchange semesters and research stints at the University of Glasgow and the University of Amsterdam, as well as a research internship at Meta AI/FAIR. In his research, Robert aims to develop a better understanding of the hypotheses, biases and assumptions of modern machine vision systems, and to use this understanding to make them more robust, interpretable and reliable.