EM of GMMs with GPU Acceleration (CUDA)

EM converging on correct parameters

Sample Expectation Maximization; red Xs represent initial guessed means.

 

About:

This is a parallel implementation of the Expectation Maximization algorithm for Gaussian Mixture Models, designed to run on NVidia graphics cards supporting CUDA.  On my machine*, it provides up to 170x performance increases versus a CPU reference version.  

See the report for more information.

The interesting code is all in gpugaumixmod.h and gpugaumixmod_kernel.h. The reference CPU implementation is included in cpugaumixmod.h.

It can be integrated into any C program on a CUDA enabled system.  Additionally, Matlab integration is provided in gmm.cu.

 

Expectation Maximization is a powerful method for converging to a local maximum.  K-means clustering is a special case of expectation maximization of Gaussian clusters.

 

It was created originally as a term project for my Computational Statistics with Application to Bioinformatics class.  See the report for more information.

 

E-mail me with any questions or comments!

*CUDA 2.1, GTX285, C2D E8400 @ 3GHZ, 4GB ram, Vista 64-bit

 

Downloads:

 

Links:

 

Implementation

The interesting code is all in gpugaumixmod.h and gpugaumixmod_kernel.h.
The reference CPU implementation is in gaumixmod.h, which is from Numerical Recipes 3.

It can be integrated into any C program on a CUDA-enabled system.  Additionally,
Matlab Mex integration is provided in gmm.cu.

 

Compiling

You may have trouble compiling as-is, as the config files are set up to
run on my Windows Vista 64bit machine, but it's just a standard Cuda kernel
underneath so it should be portable.  A precompiled Windows 64-bit version is
included.

See compile.m for the command I use to compile the CUDA/Mex files.

You'll need the CUDA drivers and toolkit from Nvidia:
http://www.nvidia.com/object/cuda_get.html

You'll also want to install the MATLAB plug-in for CUDA:
http://developer.nvidia.com/object/matlab_cuda.html

 

Running

Once compiled, start off by running gmm_example in Matlab to see it in action.

See experiment1, experiment2, experiment3 for ready-to-run experiments

  CUDA kernel performance

Updates:

Initial release May 6, 2009

Updated May 8, 2009:
-Fixed synchronization issues in kernel
-Cleaned up mex wrapper
-Cleaned up some potential memory issues
-Combined GpuGauMixmod and GauMixmod classes into single class

Updated May 10, 2009:
-Performance increases, doubling speed vs previous version.
 Best case it performs 170x reference version speed
 (16 dims, 16 clusters, 1000000 data points).
    *Unrolled some loop tails
    *Decomposed problem even more
    *Changed unnecessary double to float
    *Using fast exp and log functions now.
  *Memcopy to GPU is somewhat lazier.
-Updated plotting and experiment code
-Uncombined GpuGauMixmod and GauMixmod classes again.

Updated May 21, 2009:
-Now handles simultaneous random restarts.  Just make the
 matrix passed to it 3-dimensional, with the 3rd dimension containing different
 restarts.
  *The step function will return a list of log likelihoods for the different restarts.
  *The functions for querying response and mean/sigmas now take an optional second
   parameter for getting the data for the corresponding restart.