Created
March 8, 2022 21:18
-
-
Save vladiant/5b288397cdaac90d24f0bba7009513e3 to your computer and use it in GitHub Desktop.
Computer Vision and Perception for Self-Driving Cars (Deep Learning Course)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Computer Vision and Perception for Self-Driving Cars (Deep Learning Course) | |
https://www.youtube.com/watch?v=cPOtULagNnI | |
Python + Deep Learning | |
Robotics with Sakshay | |
https://www.youtube.com/c/roboticswithsakshay/videos | |
* Road Segmentation | |
* 2D Object Detection (yolo) | |
* Object tracking (deep sought) | |
* 3D data visualisation | |
* Multi task learning (depth estimation & semantic segmentation) | |
* 3D object detection | |
* Bird's eye view (transformers) | |
β¨οΈ (0:02:16) Fully Convolutional Network | Road Segmentation | |
π Kaggle Dataset: https://www.kaggle.com/sakshaymahna/kittiroadsegmentation | |
π Kaggle Notebook: https://www.kaggle.com/sakshaymahna/fully-convolutional-network | |
π KITTI Dataset: http://www.cvlibs.net/datasets/kitti/ | |
π Fully Convolutional Network Paper: https://arxiv.org/abs/1411.4038 | |
π Hand Crafted Road Segmentation: https://www.youtube.com/watch?v=hrin-qTn4L4 (Udacity Self Driving Cars Advanced Lane Detection) | |
π Deep Learning and CNNs: https://www.youtube.com/watch?v=aircAruvnKk (But what is a neural network? | Chapter 1, Deep learning) | |
Transposed convolutions - better than upscale interpolations! | |
VGG16 - encoder, decoder | |
Replace Add with Concatenate | |
Replace Concatenate with Conv2DTranspose (seems not that great!) | |
β¨οΈ (0:20:45) YOLO | 2D Object Detection | |
π Kaggle Competition/Dataset: https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles | |
π Visualization Notebook: https://www.kaggle.com/sakshaymahna/lyft-3d-object-detection-eda | |
π YOLO Notebook: https://www.kaggle.com/sakshaymahna/yolov3-keras-2d-object-detection | |
π Playlist on Fundamentals of Object Detection: https://www.youtube.com/playlist?list=PL_IHmaMAvkVxdDOBRg2CbcJBq9SY7ZUvs | |
π Blog on YOLO: https://www.section.io/engineering-education/introduction-to-yolo-algorithm-for-object-detection/ | |
π YOLO Paper: https://arxiv.org/abs/1506.02640 | |
β¨οΈ (0:35:51) Deep SORT | Object Tracking | |
π Dataset: https://www.kaggle.com/sakshaymahna/kittiroadsegmentation | |
π Notebook/Code: https://www.kaggle.com/sakshaymahna/deepsort/notebook | |
π Blog on Deep SORT: https://medium.com/analytics-vidhya/object-tracking-using-deepsort-in-tensorflow-2-ec013a2eeb4f | |
π Deep SORT Paper: https://arxiv.org/abs/1703.07402 | |
π Kalman Filter: https://www.youtube.com/playlist?list=PLn8PRpmsu08pzi6EMiYnR-076Mh-q3tWr | |
π Hungarian Algorithm: https://www.geeksforgeeks.org/hungarian-algorithm-assignment-problem-set-1-introduction/ | |
π Cosine Distance Metric: https://www.machinelearningplus.com/nlp/cosine-similarity/ | |
π Mahalanobis Distance: https://www.machinelearningplus.com/statistics/mahalanobis-distance/ | |
π YOLO Algorithm: https://youtu.be/C3qmhPVUXiE | |
Simple Online Realtime Tracking | |
Bounding box prediction, Kalman Filters, Linear Approximation, IoU matching techniques | |
Deal with occlusion; Linear velocity model | |
Mahalanobis distance -> similarity between point and probability distribution | |
Longer occlusion periods present a problem | |
β¨οΈ (0:52:37) KITTI 3D Data Visualization | Homogenous Transformations | |
π Dataset: https://www.kaggle.com/garymk/kitti-3d-object-detection-dataset | |
π Notebook/Code: https://www.kaggle.com/sakshaymahna/lidar-data-visualization/notebook | |
π LIDAR: https://geoslam.com/what-is-lidar/ | |
π Tesla doesn't use LIDAR: https://towardsdatascience.com/why-tesla-wont-use-lidar-57c325ae2ed5 | |
Homogeneous transformations; point clouds; | |
β¨οΈ (1:06:45) Multi Task Attention Network (MTAN) | Multi Task Learning | |
π Dataset: https://www.kaggle.com/sakshaymahna/cityscapes-depth-and-segmentation | |
π Notebook/Code: https://www.kaggle.com/sakshaymahna/mtan-multi-task-attention-network | |
π Data Visualization: https://www.kaggle.com/sakshaymahna/exploratory-data-analysis | |
π MTAN Paper: https://arxiv.org/abs/1803.10704 | |
π Blog on Multi Task Learning: https://ruder.io/multi-task/ | |
π Image Segmentation and FCN: https://youtu.be/U_v0Tovp4XQ | |
Encoder / Decoder / Attention submodules | |
β¨οΈ (1:20:58) SFA 3D | 3D Object Detection | |
π Dataset: https://www.kaggle.com/garymk/kitti-3d-object-detection-dataset | |
π Notebook/Code: https://www.kaggle.com/sakshaymahna/sfa3d | |
π Data Visualization: https://www.kaggle.com/sakshaymahna/l... | |
π Data Visualization Video: https://www.youtube.com/watch?v=tb1H42kE0eE | |
π SFA3D GitHub Repository: https://github.com/maudzung/SFA3D | |
π Feature Pyramid Networks: https://jonathan-hui.medium.com/understanding-feature-pyramid-networks-for-object-detection-fpn-45b227b9106c | |
π Keypoint Feature Pyramid Network: https://arxiv.org/pdf/2001.03343.pdf | |
π Heat Maps: https://en.wikipedia.org/wiki/Heat_map | |
π Focal Loss: https://medium.com/visionwizard/understanding-focal-loss-a-quick-read-b914422913e7 | |
π L1 Loss: https://afteracademy.com/blog/what-are-l1-and-l2-loss-functions | |
π Balanced L1 Loss: https://paperswithcode.com/method/balanced-l1-loss | |
π Learning Rate Decay: https://medium.com/analytics-vidhya/learning-rate-decay-and-methods-in-deep-learning-2cee564f910b | |
π Cosine Annealing: https://paperswithcode.com/method/cosine-annealing | |
Super Fast and Accurate 3D Object detection | |
Feature Pyramid network | |
β¨οΈ (1:40:24) UNetXST | Camera to Bird's Eye View | |
π Dataset: https://www.kaggle.com/sakshaymahna/semantic-segmentation-bev | |
π Dataset Visualization: https://www.kaggle.com/sakshaymahna/data-visualization | |
π Notebook/Code: https://www.kaggle.com/sakshaymahna/unetxst | |
π UNetXST Paper: https://arxiv.org/pdf/2005.04078.pdf | |
π UNetXST Github Repository: https://github.com/ika-rwth-aachen/Cam2BEV | |
π UNet: https://towardsdatascience.com/understanding-semantic-segmentation-with-unet-6be4f42d4b47 | |
π Image Transformations: https://kevinzakka.github.io/2017/01/10/stn-part1/ | |
π Spatial Transformer Networks: https://kevinzakka.github.io/2017/01/18/stn-part2/ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment