Monday, 29 December 2008

Using GPULib 1.0

Now that I've got GPULib 1.0 up and running, I'm making use
of the new functional forms of the library routines to write
kernel-based algorithms. Given a set of m n-dimensional
observation vectors G_i

G_i = ( g_1, g_2 ... g_n )_i, i=1,2,...m,

in the form of a m x n data matrix:

G = ( G_1, G_2, ... G_m)^T

where ^T denotes transposition, I want to calculate the symmetric
m x m Gaussian kernel matrix

K_ij = exp(-gma ||G_i-G_j||^2 ), i,j = 1,2,...m.

In ordinary IDL:

function kernel_matrix, G, gma=gma
m = n_elements(G[0,*])
K = transpose(total(G^2,1))##(intarr(m)+1)
K = K + transpose(K) - 2*G##transpose(G)
return, exp(-gma*K)
end


WIth GPULib:

function gpukernel_matrix, G_gpu, gma=gma
m = G_gpu.dimensions[1]
n = G_gpu.dimensions[0]
onesm_gpu = gpuputarr(fltarr(m)+1)
onesn_gpu = gpuputarr(fltarr(1,n)+1)
G2_gpu = gpumult(G_gpu,G_gpu)
tmp1_gpu = gpumatrix_multiply(onesn_gpu,G2_gpu)
gpufree,G2_gpu
K_gpu = gpumatrix_multiply(onesm_gpu,tmp1_gpu)
tmp2_gpu = gpumatrix_multiply(tmp1_gpu, $
onesm_gpu,/atranspose,/btranspose)
gpufree,[onesm_gpu,onesn_gpu,tmp1_gpu]
K_gpu = gpuadd(K_gpu,tmp2_gpu,lhs=K_gpu)
gpufree,tmp2_gpu
tmp3_gpu = gpumatrix_multiply(G_gpu,G_gpu, $
/atranspose)
K_gpu = gpuadd(1.0,K_gpu,-2.0,tmp3_gpu,0.0,$
lhs=K_gpu)
gpufree,tmp3_gpu
K_gpu = gpuexp(1.0,-gma,K_gpu,0.0,0.0,$
lhs=K_gpu)
return, K_gpu
end

Since this routine is called many times in the
processing of a multispectral image, a memory
leak would be disasterous. Note the careful use
of the LHS keyword and of the GPUFREE procedure
to avoid this.

The code is now more readable than for the
procedural forms, but would be more readable
still if there were a GPUTRANSPOSE() function and
if GPUTOTAL() had a dimension parameter.

6 comments:

toto said...

Hi,

I was excited by your "Kernel PCA" code under IDL and wanted to use it. Unfortunately I was not able to install the gpulib properly. This installation turns out to be painful and is a serious obstacle to use this code. Can anyone give me some advises ?

1. gpulib is difficult to download:
On the download page http://www.txcorp.com/products/GPULib/download-oss.php,the CAPTCHA security refused me several times before succes. I did not see why.

2. In the gpulib-1.0.6/README file, I learned that I would also need CUDA SDK and Toolkit from NVIDIA.

3. I downloaded CUDA SDK and Toolkit from NVIDIA for my Mac... and then I realised that it only works on leopard (the new Mac OS since few months). Bad luck: I work under Tiger (the previous Mac OS). That is weird: Tiger is still very popular

4. I made another try and installed CUDA SDK and Toolkit on a linux machine. Back to gpulib: During the installation, I get some error messages concerning "autoheader" and "automake" (and the installation is not complete)

Mort Canty said...

I've installed successfully under XP with a GEFORCE 8600GT 512 MB graphics card. No experience with Mac OS or Linux. You must install the nvidia CUDA driver and CUDA toolkit in that order. As far as I can tell, you don't need the SDK. It just consists of demos.

Good luck

toto said...

* another try on mac Tiger (10.4.11)--> definitely impossible (no NVIDIA packages)

* another try on a PC under XP --> I can not make it. The NVIDIA installation seems ok. but I do not know what to do with the gpulib-1.0.6 directory. I do not see any binary for window. How did you do ?

Mort Canty said...

OK, assuming you're running IDL 7 with the eclipse workbench on Windows XP and CUDA is correctly installed (I'm using CUDA 2.1 beta), you locate

gpulib.dlm
gpulib.dll

in the GPULIB1.0.6 folder and put them in your IDL DLM path. E.g., copy them to

...\ITT\IDL70\bin\bin.386.

Then point your IDL path to the GPULib routines in

...\GPULIB1.0.6\IDL\lib

Then run GPUINIT from the command line. It should respond with

% LOADED DLM: GPULIB

Then you're up and running.

Hope this helps.

PatC said...

Hi,

I'm testing out speeds using GPUlib vs native IDL functions and was wondering whether you see any speed differences with GPUlib and the gpuMatrix_Multiply function.

Mort Canty said...

I haven't timed individual functions, but I get speedup factors between 8 and 16 on the code listed in the blog.