Last active
January 17, 2019 14:09
-
-
Save NISH1001/5a10afc98133c74eed4187e14d3b4d0a to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# YOLO (You Only Look Once)\n", | |
"Nishan Pantha:\n", | |
"- email: [email protected]\n", | |
"- Twitter: @nishparadox\n", | |
"- Github: NISH1001\n", | |
"- Linkedin: nishparadox\n", | |
"- Site: http://nishanpantha.com.np/" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## What is Object Classification\n", | |
"It is a process where a system tries to predict what an image is. That is, a classifier that predicts the label of the image. \n", | |
"In a simple term, if you have a classifier for **cat-dog** prediction, the system (or the classifier) predicts (tries to predict) the label of any test image fed to it. Here, it will either be a dog or a cat. \n", | |
"<br/>\n", | |
"*Remember: For any classifier, it will only predict the class label on which it was trained on. So, in the above case, despite showing an image of an elephant, the classifier will either output dog or a cat. To recognize the elephant, the whole system will have to be trained from scratch (or loaded from pre-trained model) on elephants' dataset.* \n", | |
"<br/>\n", | |
"\n", | |
"#### Approaches for object classification: \n", | |
"We can apply any existing ML/DL techniques for classification problem. Some of which are:\n", | |
"- Binary classification on image features\n", | |
"- SVM on image features\n", | |
"- Artificial Neural Networks on image features\n", | |
"- Convolutional Neural Networks on raw image\n", | |
"\n", | |
"Among this, CNN has been proven to be one of the best architectures for image classification. To get the scale of *how good CNNs are*, just know that latest state of the art models in some ways incorporate CNN." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## What is Object Localization\n", | |
"It is a process where a system tries the predict the location of the object within the image that is being predicted. \n", | |
"That is, if we fed an image of a cat to a cat-dog classifier system, in addition to identifying the label **cat**, the system also outputs the region in the image where the actual cat lies. \n", | |
"<br/>\n", | |
"\n", | |
"#### Approaches for Object Localization\n", | |
"- Sliding Window Technique\n", | |
"- Bounding Box Prediction\n", | |
"\n", | |
"*Remember Object Localization can only predict one object in the image. If there were multiple objects, it will be an object detection problem.*" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Object Detection\n", | |
"\n", | |
"Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. Well-researched domains of object detection include face detection and pedestrian detection. Object detection has applications in many areas of computer vision, including image retrieval and video surveillance.\n", | |
"\n", | |
"In a more convenient sense, it is the process of detecting every objects in the scene along with classifying their labels and finding the bounding box (or polygons) of that object.\n", | |
"\n", | |
"The last section consists of latest techniques like:\n", | |
"- YOLO\n", | |
"- RetinaNet\n", | |
"- RCNN\n", | |
"- Fast-RCNN\n", | |
"- Faster-RCNN\n", | |
"- Mask RCNN" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## YOLO\n", | |
"You Only Look Once is a state-of-the-art, real-time object detection system. It was originally developed around 2015 that outperformed every other techniques at that time. \n", | |
"YOLO has its own, neat, architecture based on CNN and anchor boxes and is proven to be on-the-go object detection technique for widely used problems. With the timeline, it has become faster and better with its versions named as:\n", | |
"- YOLO V1\n", | |
"- YOLO V2\n", | |
"- YOLO V3\n", | |
"\n", | |
"YOLO V2 is better than V1 in terms of accuracy and speed. \n", | |
"YOLO V3 is not faster than V2 but is more accurate than V2. \n", | |
"\n", | |
"#### References\n", | |
"The original paper can be found [here](https://arxiv.org/abs/1506.02640). \n", | |
"The paper reference to YOLO V3 can be found [here](https://pjreddie.com/media/files/papers/YOLOv3.pdf) which is nothing but an incremental improvement added to V2." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## How Does YOLO Work\n", | |
"Prior detection systems repurpose classifiers or localizers to perform detection. They apply the model to an image at multiple locations and scales. High scoring regions of the image are considered detections.\n", | |
"\n", | |
"We use a totally different approach. We apply a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Where to get YOLO\n", | |
"YOLO binaries (and sources) can be downloaded from following sources:\n", | |
"- https://pjreddie.com/darknet/yolo/\n", | |
"- Directly from github [here](https://github.com/pjreddie/darknet)\n", | |
"\n", | |
"YOLO is based on darknet, built in C. \n", | |
"[Darknet](https://pjreddie.com/darknet/) is an open source neural network framework written in C and CUDA." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## How to Use YOLO\n", | |
"\n", | |
"### Clone the Repository\n", | |
"```bash\n", | |
"git clone https://github.com/pjreddie/darknet\n", | |
"```\n", | |
"\n", | |
"### Compile The Source\n", | |
"We can directly compile the source using `make`. Just go the the directory where darknet is cloned and run the command:\n", | |
"```bash\n", | |
"https://github.com/pjreddie/darknet\n", | |
"```\n", | |
"*Remember: make makes use of the **Makefile** which consists of instructions to compile the C source files.* \n", | |
"After completion of the make process, you will get a file named **darknet** which is a binary and executable file. \n", | |
"You can use this binary executable to run the YOLO.\n", | |
" \n", | |
"### Make darknet Executable\n", | |
"While running the command `./darknet`, if you are getting `permission` error, it means the user does not have exeutable permission for running the binary. Just hit the following command\n", | |
"```bash\n", | |
"chmod u+x darknet\n", | |
"```\n", | |
"After this, you will be able to run the darknet executable" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## YOLO Structure\n", | |
"\n", | |
"### Configuration Files\n", | |
"YOLO is entirely plug-n-play, that is you can configure YOLO to detect any type of objects. In fact you can also modify \n", | |
"the CNN architecture itself and play around. YOLO does this by making use of configuration files under **cfg/**. \n", | |
"The configuration files end with `.cfg` extension which YOLO can parse. \n", | |
"These configuration files consists of mainly:\n", | |
"- CNN Architectures (layers and activations)\n", | |
"- Anchor Boxes\n", | |
"- Number of classes\n", | |
"- Learning Rate\n", | |
"- Optimization Technique\n", | |
"- input size\n", | |
"- probability score threshold\n", | |
"- batch sizes\n", | |
"\n", | |
"Based on different versions, there can be many configurations from V1 to V3 to full training to tiny layers.\n", | |
"You can download different configurations out of which following are the two:\n", | |
"- YOLOV3 (full) : https://github.com/pjreddie/darknet/blob/master/cfg/yolov3.cfg\n", | |
"- Tiny YOLO V3: https://github.com/pjreddie/darknet/blob/master/cfg/yolov3-tiny.cfg\n", | |
"\n", | |
"### Weights\n", | |
"Each configuration has corresponding pre-trained weights. \n", | |
"Here, only YOLO V3 is referenced. \n", | |
"<br/>\n", | |
"**Full Weight** \n", | |
"To get full weights for YOLO V3, download it from https://pjreddie.com/media/files/yolov3.weights. \n", | |
"This is the weight trained on full 9000+ classes. \n", | |
"<br/>\n", | |
"**Tiny Weight** \n", | |
"This is the weight trained on only 80 classes. You can get the weight for YOLO V3 from here https://pjreddie.com/media/files/yolov3-tiny.weights." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Test YOLO\n", | |
"As told earlier, everything is run using `darknet` exeutable file. \n", | |
"Suppose we have an image named **test.jpeg**, then we can try predicting the objects as: \n", | |
"```bash\n", | |
"./darknet detect yolov3-tiny.cfg yolov3-tiny.weights test.jpeg\n", | |
"```\n", | |
"<br/>\n", | |
"\n", | |
"Normally, `.cfg` are inside `cfg/` directory. \n", | |
"Suppose, you have the **yolov3-tiny** inside the directory `weights/` then, the command will be:\n", | |
"```bash\n", | |
"./darknet detect cfg/yolov3-tiny.cfg weights/yolov3-tiny.weights test.jpeg\n", | |
"```\n", | |
"\n", | |
"Once done, there will be an image named **predictions.jpeg** in the same directory as of `darknet` file. \n", | |
"You can view the prediction classes along with corresponding bounding boxes." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Train\n", | |
"The trianing is a bit more complex because we have to get things and configurations **right**. \n", | |
"Following command does everything:\n", | |
"```bash\n", | |
"./darknet detector train custom/cfg/obj.data custom/cfg/tiny-yolo.cfg custom/tiny-yolo_100.weights \n", | |
"```" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Training Command Breakdown\n", | |
"Here, *.cfg* and *.weights* are what they meant to be - configuration and weight files as mentioned earlier. \n", | |
"Everything happens using the **obj.data** file which has content like:\n", | |
"```bash\n", | |
"classes= 1 \n", | |
"train = custom/cfg/train.txt\n", | |
"valid = custom/cfg/test.txt\n", | |
"names = obj.names \n", | |
"backup = backup/\n", | |
"```\n", | |
"#### obj.names\n", | |
"This file consists of list of class names. Example:\n", | |
"```\n", | |
"cat\n", | |
"dog\n", | |
"background\n", | |
"bike\n", | |
"```\n", | |
"\n", | |
"#### train.txt\n", | |
"This file consists of list of training images that we are going to feed into the network. \n", | |
"The content is similar as:\n", | |
"```bash\n", | |
"custom/train-images/11.jpg\n", | |
"custom/train-images/12.jpg\n", | |
"custom/train-images/13.jpg\n", | |
"...\n", | |
"...\n", | |
"```\n", | |
"Here, **train-images/** consists of all the training images. \n", | |
"Along with the images, this directory also consists of text file for bounding box corresponding to the image. \n", | |
"So, you will have `custom/train-images/11.txt` whose content can be:\n", | |
"```bash\n", | |
"0 0.32502480158730157 0.3950066137566138 0.12896825396825398 0.09523809523809523\n", | |
"```\n", | |
"Here the first number represents the id of class of corresponding in **obj.names**. \n", | |
"The remaining numbers represent the bounding box. If there were multiple boxes of multiple classes, it'd be like:\n", | |
"```\n", | |
"0 0.32502480158730157 0.3950066137566138 0.12896825396825398 0.09523809523809523\n", | |
"0 0.52502480158730157 0.3950066137566138 0.12896825396825398 0.09523809523809523\n", | |
"1 0.32502480158730157 0.3950066137566138 0.12896825396825398 0.09523809523809523\n", | |
"```\n", | |
"\n", | |
"#### test.txt\n", | |
"This file consist of list of test images.\n", | |
"\n", | |
"\n", | |
"#### Note on .cfg\n", | |
"Note, in **.cfg**, you have to change the number of classes to the total found in **obj.names**." | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.7" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment