This is a lecture series about deep learning methods used in computer vision, I held as part of the AIA course at TUB.
The series is heavily inspired by Fei-Fei Li’s CN231 and Joseph Redmond’s Ancient Secrets.
The sources to all slides are in the github repository (see link above). If you are reusing the material please credit original sources. References can be found in the respective slides.
Schedule
-
Lecture 01: Introduction to Machine Learning (slides)
References and further reading:
- An Introduction to Statistical Learning; James, Witten, Hastie, Tibshirani
- Pattern Recognition and Machine Learning; Bishop
- Pattern Classification; Duda, Hard, Stork
-
Lecture 02: Neural Networks (slides)
References and further reading:
- Universal Approximation Theorem (a graphical proof)
- Backpropagation (colah’s blog)
- Backpropagation (in depth read)
- Backpropagation (multi dimensional case)
- Backpropagation (the math for a linear layer)
- Automatic Differentiation in Machine Learning: a Survey
- Efficient BackProp
- Why Momentum Really Works
-
Lecture 03: Convolutional Neural Networks I (slides)
References and further reading:
- Dive into Deep Learning: Convolutional Neural Network
- Dive into Deep Learning: Modern Convolutional Neural Networks
- Feature Visualization
- Gradient-based learning applied to document recognition, LeCun et al, 1998
- ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky et al, 2012
- Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan & Zisserman, 2015
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Ioffe & Szegedy, 2015
- Going Deeper with Convolutions, Szegedy et al, 2015
- Deep Residual Learning for Image Recognition and Identity Mappings in Deep Residual Networks, He et al, 2016
-
Lecture 04: Convolutional Neural Networks II (slides)
References and further reading:
- Dive into Deep Learning: Computer Vision
- A guide to convolution arithmetic for deep learning
- Adversarial examples
- YOLO9000: Better, Faster, Stronger
- Focal Loss for Dense Object Detection
- YOLOv3: An Incremental Improvement
- Deconvolution Network for Semantic Segmentation, Noh et al, ICCV 2015
- U-Net: Convolutional Networks for Biomedical Image Segmentation, Ronnenberger et al, MICCAI 2015
- SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, Badrinarayanan et al, TPAMI 2017
- Pyramid Scene Parsing Network, Zhao et al, CVPR 2017
- Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation, Chen et al, ECCV 2018
- The PASCAL Visual Object Classes Challenge: A Retrospective, Everinghamet al, IJCV 2014
- Rich feature hierarchies for accurate object detection and semantic segmentation, Girshick et al, CVPR 2014
- You Only Look Once: Unified, Real-Time Object Detection, Redmon et al, CVPR 2016
- Mask R-CNN, He et al, ICCV 2017
- ObjectNet: A large-scale bias-controlled dataset for pushing the limits ofobject recognition models, Barbu et al, NeurIPS 2019
- Shortcut Learning in Deep Neural Networks, Geirhos et al, Nature Machine Intelligence 2020
- Deep Inside Convolutional Networks: Visualising Image Classification Modelsand Saliency Maps, Simonyan et al, 2013
- Visualizing and Understanding Convolutional Networks, Zeiler & Fergus, ECCV 2014
- Unmasking Clever Hans Predictors and Assessing What Machines Really Learn, Lapuschkin et al, Nature Communications 2019
-
Lecture 05: Video Analysis (slides)
References and further reading:
- Dive into Deep Learning: Recurrent Neural Networks
- Dive into Deep Learning: Modern Recurrent Neural Networks
- Deep Learning Book: Sequence Modeling: Recurrent and Recursive Nets
- Chris Olahs Blog: Understaning LSTM Networks
- Andrej Karpathys Blog: The Unreasonable Effectiveness of Recurrent Neural Networks
- all the papers referenced in the PDF.
Things we didn’t cover:
- Non-local Neural Networks
- A Closer Look at Spatiotemporal Convolutions for Action Recognition (R(2+1)D)
Pre Deep Learning:
-
Lecture 06: Self Supervised Learning (slides)
References and further reading:
- Dive into Deep Learning: Generative Adversarial Networks
- Ishan Misra: Lecture about Energy-based SSL
- Fei Fei Li: Lecture about Generative Methods
- Ali Ghodsi: Lecture with derivation of Variational Autoencoders
- Jeremy Jordon: Blog post about Variational Autoencoders
- Revisiting Self-Supervised Visual Representation Learning, Kolesnikov et al, CVPR 2019
- Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey, Jing & Tian, TPAMI 2019
- all papers referenced in the PDF
We didn’t talk about generating images with Transformers: