About:
This page is for the demonstration I had to give for my Computer Vision class in Spring 2009. It is a modified version R. Fergus's Simple Parts and Structure Object Detector.
It will learn to detect objects given a couple of manually trained examples, by using the alignment of individual parts. It will also perform Expectation Maximization to improve the model.
Download:
Usage:
To run, extract all the files into a directory, then edit the value BASE_PATH in "common/config.m" to point to the new directory.
Then execute "run_test" from within the "common" dir. This will prompt you to train the initial model on several training examples, and then run the "parts_overlap" detection method to find faces. The program will run for a few iterations, then plot the final results.
Most everything can be controlled from the config.m file. To test the various detection models, uncomment the appropriate EXPERIMENT_TYPE in config.m
Additions/changes:
- Iterative EM - run models iteratively, using best scores from previous iteration to train next iteration.
- HOG - get_hog.m computes a Histogram Oriented Gradient descriptor for any image. HOG similarity computes a similarity metric between two HOG descriptors.
- Use of HOG similarity to prevent part filters from degenerating over iterations.
- Overlap test - variant on a Hough transform, that shifts the response images and finds the best overlap of response peaks for the different filters. This method works best with EM.
- Support for Caltech datasets -- just make a new directory under images, and it will convert the groundtruth for you when it resizes the images.
- A few bug fixes, lots of abstraction and removal of redundant code.
- Better more/plots.
Screenshots:
Example of training face:

Heatmap responses, final plot shows face correctly detected by shifting all responses and taking max of mins across all responses.

Results ranked by score and colored by category:

Model part positions over iterations. Variance generally decreases with iterations:
Filters over iterations. Noisy filters converge into smoothed composites over iterations, with the largest change happening between iterations 1 and 2.

3D plot of RPC curves over iterations. Because only 1 guess is taken per image, recall doesn't necessarily reach one (since incorrect match may come first).


