Monday, 29 December 2008

Using GPULib 1.0

Now that I've got GPULib 1.0 up and running, I'm making use
of the new functional forms of the library routines to write
kernel-based algorithms. Given a set of m n-dimensional
observation vectors G_i

G_i = ( g_1, g_2 ... g_n )_i, i=1,2,...m,

in the form of a m x n data matrix:

G = ( G_1, G_2, ... G_m)^T

where ^T denotes transposition, I want to calculate the symmetric
m x m Gaussian kernel matrix

K_ij = exp(-gma ||G_i-G_j||^2 ), i,j = 1,2,...m.

In ordinary IDL:

function kernel_matrix, G, gma=gma
m = n_elements(G[0,*])
K = transpose(total(G^2,1))##(intarr(m)+1)
K = K + transpose(K) - 2*G##transpose(G)
return, exp(-gma*K)
end


WIth GPULib:

function gpukernel_matrix, G_gpu, gma=gma
m = G_gpu.dimensions[1]
n = G_gpu.dimensions[0]
onesm_gpu = gpuputarr(fltarr(m)+1)
onesn_gpu = gpuputarr(fltarr(1,n)+1)
G2_gpu = gpumult(G_gpu,G_gpu)
tmp1_gpu = gpumatrix_multiply(onesn_gpu,G2_gpu)
gpufree,G2_gpu
K_gpu = gpumatrix_multiply(onesm_gpu,tmp1_gpu)
tmp2_gpu = gpumatrix_multiply(tmp1_gpu, $
onesm_gpu,/atranspose,/btranspose)
gpufree,[onesm_gpu,onesn_gpu,tmp1_gpu]
K_gpu = gpuadd(K_gpu,tmp2_gpu,lhs=K_gpu)
gpufree,tmp2_gpu
tmp3_gpu = gpumatrix_multiply(G_gpu,G_gpu, $
/atranspose)
K_gpu = gpuadd(1.0,K_gpu,-2.0,tmp3_gpu,0.0,$
lhs=K_gpu)
gpufree,tmp3_gpu
K_gpu = gpuexp(1.0,-gma,K_gpu,0.0,0.0,$
lhs=K_gpu)
return, K_gpu
end

Since this routine is called many times in the
processing of a multispectral image, a memory
leak would be disasterous. Note the careful use
of the LHS keyword and of the GPUFREE procedure
to avoid this.

The code is now more readable than for the
procedural forms, but would be more readable
still if there were a GPUTRANSPOSE() function and
if GPUTOTAL() had a dimension parameter.

Friday, 12 December 2008

More on GPULib and Kernel Methods

My last post was way back in August and I'm writing this one really just to keep the blog alive, so to speak. So far I haven't been able to get GPULib 1.0 going on my system, so I have no idea if the programs on my website will still run under 1.0. I'm getting help from Tech-X and haven't given up hope.

Anyway, I'm very keen to apply CUDA (via GPULib) to the kind of nonlinear algorithms (kernel methods) I've described previously. These algorithms involve two steps: training and generalization. In remote sensing applications, the training data set can be kept fairly small (order 10^3), but generalization invariably means processing all the pixel vectors in an image, typically of the order 10^6-10^8. Since generalization can be safely done in single precision and involves manipulation of large matrices, it is a natural task for the GPU.