1. Initialize n x K cluster membership
probability matrix U
2. Maximization step: calculate
K N-component mean vectors,
K NxN covariance matrices,
K mixture coefficients
3. Expectation step: recalcuate
U and normalize.
4. If U hasn't changed significantly,
stop, else go to 2.
Step 3 requires inversion of the cluster covariance matrices, which has to be done on the CPU. Apart from that, I still get into trouble every time I use gpuWhere (don't know why yet). So, since I have to check U for zeroes, I have to do its normalization on the CPU as well. Here is my attempt at normalization on the GPU, which usually crashes IDL:
; normalize U
gpuMatrix_Multiply,U_gpu,onesKxK_gpu,den_gpu
gpuEq,den_gpu,fltarr(n,K),zeroes_gpu
gpuWhere,zeroes_gpu,ind_gpu,count
if count gt 0 then $
gpuSubscript,ind_gpu,den_gpu,$
fltarr(count)+1,/LHS
gpuDiv,U_gpu,den_gpu,U_gpu
gpuFree,[den_gpu,ind_gpu,zeroes_gpu]
Speedup (so far) on a 1000 x 1000 x 4 image clustered into 8 classes (i.e. n=1,000,000, N=4, K=8) is a factor of 2.8.
