Introduction to Deep Learning for Computer Vision

This is a lecture series about deep learning methods used in computer vision, I held as part of the AIA course at TUB.

The series is heavily inspired by Fei-Fei Li’s CN231 and Joseph Redmond’s Ancient Secrets.

The sources to all slides are in the github repository (see link above). If you are reusing the material please credit original sources. References can be found in the respective slides.

Schedule

Lecture 01: Introduction to Machine Learning (slides)

References and further reading:
- An Introduction to Statistical Learning; James, Witten, Hastie, Tibshirani
- Pattern Recognition and Machine Learning; Bishop
- Pattern Classification; Duda, Hard, Stork
Lecture 02: Neural Networks (slides)

References and further reading:
Lecture 03: Convolutional Neural Networks I (slides)

References and further reading:
- Dive into Deep Learning: Convolutional Neural Network
- Dive into Deep Learning: Modern Convolutional Neural Networks
- Feature Visualization
- Gradient-based learning applied to document recognition, LeCun et al, 1998
- ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky et al, 2012
- Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan & Zisserman, 2015
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Ioffe & Szegedy, 2015
- Going Deeper with Convolutions, Szegedy et al, 2015
- Deep Residual Learning for Image Recognition and Identity Mappings in Deep Residual Networks, He et al, 2016
Lecture 04: Convolutional Neural Networks II (slides)

References and further reading:
- Dive into Deep Learning: Computer Vision
- A guide to convolution arithmetic for deep learning
- Adversarial examples
- YOLO9000: Better, Faster, Stronger
- Focal Loss for Dense Object Detection
- YOLOv3: An Incremental Improvement
- Deconvolution Network for Semantic Segmentation, Noh et al, ICCV 2015
- U-Net: Convolutional Networks for Biomedical Image Segmentation, Ronnenberger et al, MICCAI 2015
- SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, Badrinarayanan et al, TPAMI 2017
- Pyramid Scene Parsing Network, Zhao et al, CVPR 2017
- Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation, Chen et al, ECCV 2018
- The PASCAL Visual Object Classes Challenge: A Retrospective, Everinghamet al, IJCV 2014
- Rich feature hierarchies for accurate object detection and semantic segmentation, Girshick et al, CVPR 2014
- You Only Look Once: Unified, Real-Time Object Detection, Redmon et al, CVPR 2016
- Mask R-CNN, He et al, ICCV 2017
- ObjectNet: A large-scale bias-controlled dataset for pushing the limits ofobject recognition models, Barbu et al, NeurIPS 2019
- Shortcut Learning in Deep Neural Networks, Geirhos et al, Nature Machine Intelligence 2020
- Deep Inside Convolutional Networks: Visualising Image Classification Modelsand Saliency Maps, Simonyan et al, 2013
- Visualizing and Understanding Convolutional Networks, Zeiler & Fergus, ECCV 2014
- Unmasking Clever Hans Predictors and Assessing What Machines Really Learn, Lapuschkin et al, Nature Communications 2019
Lecture 05: Video Analysis (slides)

References and further reading:
Things we didn’t cover:
- Non-local Neural Networks
- A Closer Look at Spatiotemporal Convolutions for Action Recognition (R(2+1)D)
Pre Deep Learning:
Lecture 06: Self Supervised Learning (slides)

References and further reading:
We didn’t talk about generating images with Transformers:
- OpenAI: Image GPT