Object Detection Tutorial

In this tutorial, we will use Histogram of Oriented Gradient (HOG) feature descriptor based linear SVM to create a person detector. We will first create a person classifier and then use this classifier with a sliding window to identify and localize people in an image.

The key challenge in creating a classifier is that it needs to work with variations in illumination, pose and oclussions in the image. To achieve this, we will train the classifier on an intermediate representation of the image instead of the pixel-based representation. Our ideal representation (commonly called feature vector) captures information which is useful for classification but is invariant to to small changes in illumination and oclussions. HOG descriptor is a gradient-based representation which is invariant to local geometric and photometric changes (i.e. shape and illumination changes) and so is a good choice for our problem. Infact HOG descriptor are widely used for object detection.

We will use [this data](//Add link) for training our classifier. We will start by loading the data and computing HOG features of all the images.

using Images, ImageFeatures

pos_examples = "path_to_data/human/"
neg_examples = "path_to_data/not_humans/"

n_pos = length(readdir(pos_examples))   # number of positive training examples
n_neg = length(readdir(neg_examples))   # number of negative training examples
n = n_pos + n_neg                       # number of training examples 
data = Array{Float64}(n, 3780)          # Array to store HOG descriptor of each image. Each image in our training data has size 128x64 and so has a 3780 length HOG descriptor
labels = Vector{Int}(n)                 # Vector to store label (1=human, 0=not human) of each image.

i = 0
for file in [readdir(pos_examples); readdir(neg_examples)]
    i += 1
    filename = "$(i <= n_pos ? pos_examples : neg_examples )$file"
    img = load(filename)
    examples[i, :] = create_descriptor(img, HOG())
    labels[i] = (i <= n_pos ? 1 : 0)

Basically we now have an encoded version of images in our training data. This encoding captures useful information but discards extraneous information (illumination changes, pose variations etc). We will train a linear SVM on this data.

using LIBSVM

model = svmtrain(examples, labels);

Now let's test this classifier on some new data.

img = load("human_example.jpg")
des = create_descriptor(img, HOG())

predicted_label, _ = svmpredict(model, des);
print(predicted_label)                          # predicted_label = 1

Try testing our trained model on more images from test directory. You can see that it performs quite well.

#To do: Add images and predicted_label

Next we will use our trained classifier with a sliding window to localize persons in an image.

tejus-gupta/detection.md

timholy commented Aug 26, 2017