Skip to content

Instantly share code, notes, and snippets.

@tejus-gupta
Created August 26, 2017 15:51
Show Gist options
  • Save tejus-gupta/05e3392658461894f28ccd980328d9f8 to your computer and use it in GitHub Desktop.
Save tejus-gupta/05e3392658461894f28ccd980328d9f8 to your computer and use it in GitHub Desktop.
Object Detection Tutorial

In this tutorial, we will use Histogram of Oriented Gradient (HOG) feature descriptor based linear SVM to create a person detector. We will first create a person classifier and then use this classifier with a sliding window to identify and localize people in an image.

The key challenge in creating a classifier is that it needs to work with variations in illumination, pose and oclussions in the image. To achieve this, we will train the classifier on an intermediate representation of the image instead of the pixel-based representation. Our ideal representation (commonly called feature vector) captures information which is useful for classification but is invariant to to small changes in illumination and oclussions. HOG descriptor is a gradient-based representation which is invariant to local geometric and photometric changes (i.e. shape and illumination changes) and so is a good choice for our problem. Infact HOG descriptor are widely used for object detection.

We will use [this data](//Add link) for training our classifier. We will start by loading the data and computing HOG features of all the images.

using Images, ImageFeatures

pos_examples = "path_to_data/human/"
neg_examples = "path_to_data/not_humans/"

n_pos = length(readdir(pos_examples))   # number of positive training examples
n_neg = length(readdir(neg_examples))   # number of negative training examples
n = n_pos + n_neg                       # number of training examples 
data = Array{Float64}(n, 3780)          # Array to store HOG descriptor of each image. Each image in our training data has size 128x64 and so has a 3780 length HOG descriptor
labels = Vector{Int}(n)                 # Vector to store label (1=human, 0=not human) of each image.

i = 0
for file in [readdir(pos_examples); readdir(neg_examples)]
    i += 1
    filename = "$(i <= n_pos ? pos_examples : neg_examples )$file"
    img = load(filename)
    examples[i, :] = create_descriptor(img, HOG())
    labels[i] = (i <= n_pos ? 1 : 0)

Basically we now have an encoded version of images in our training data. This encoding captures useful information but discards extraneous information (illumination changes, pose variations etc). We will train a linear SVM on this data.

using LIBSVM

model = svmtrain(examples, labels);

Now let's test this classifier on some new data.

img = load("human_example.jpg")
des = create_descriptor(img, HOG())

predicted_label, _ = svmpredict(model, des);
print(predicted_label)                          # predicted_label = 1

Try testing our trained model on more images from test directory. You can see that it performs quite well.

#To do: Add images and predicted_label

Next we will use our trained classifier with a sliding window to localize persons in an image.

@timholy
Copy link

timholy commented Aug 26, 2017

This looks very good. To save rounds of review, here are a couple of things I noticed:

  • provide an external link explaining HOG algorithm
  • oclussions->occlusions (2 places)
  • Infact -> In fact,
  • in the same sentence, pluralize "descriptors"
  • In the demo code, the for loop needs an end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment