Data Science Seminar

Improving Transfer Learning via Large-Scale Model Pre-Training

Watch the seminar here.

Transfer learning was successfully employed already at the very early rise of deep neural networks to obtain strong models when using only small amounts of data. Recently, evidence was obtained that increasing the model size while at the same time increasing the amount of data and compute for pre-training results in very large models that have even stronger generalization and transfer capabilities. In the talk, we will review evidence for quality of transfer on natural or medical images and its variation due to model and data size used during pre-training. We show evidence that in particular for low data regime or few-shot transfer large models pre-trained on large data (e.g. ImageNet-21k or larger) may provide strong benefits. We will then motivate necessity for systematic experiments that may deliver scaling laws for transfer performance dependent on model, data size, compute budget and composition of large source dataset used for pre-training. Such experiments require vast compute resources and proper utilization of supercomputing facilities. As an outlook, we will introduce COVIDNetX initiative that aims on studying large-scale transfer learning on a specific use case of performing medical imaging based COVID-19 diagnostics.

Joint work with Mehdi Cherti.

Biosketch Jenia Jitsev

Jenia Jitsev is a senior researcher at Juelich Supercomputing Center (JSC) and head of Cross-Sectional Team Deep Learning (CST-DL) that deals with large-scale transferable deep learning. His background is in machine learning, neuroscience and computer science, with research work evolving in the overlap of machine learning and computational neuroscience. In his Ph.D. studies, he investigated different forms of plasticity and unsupervised learning in visual cortex pathway circuits using hierarchical recurrent neural networks, with applications to face and object recognition under supervision of Prof. von der Malsburg. Further follow-up work included studies on self-generated network memory replay and its role in unsupervised learning as well as modelling of reward-driven reinforcement learning in the cortico-basal ganglia brain circuits. For his work on reinforcement learning, he won the Best Paper Award from IEEE and the International Society of Neural Networks. His current research focus is on various types of large-scale neural network training to obtain models that can be efficiently transferred across different sized datasets, domains and tasks. To enable this kind of large-scale learning and transfer experiments, he also deals with methods for efficient distributed training of deep learning models across multiple GPUs or other accelerators on supercomputers like JUWELS Booster at JSC.

to top
powered by webEdition CMS