Monday, 30 June 2008

GPULIB rude awakening

Turns out I had set the GPU into some kind of undefined state by incorrectly using the gpuWhere procedure in the session in the last post. Here's the correct output from test_gpuMatrix_Multiply:

Length : 1000
CPU time: 0.047000170
GPU time: 0.030999899
Speedup : 1.5161395
Length : 10000
CPU time: 0.34399986
GPU time: 0.26500010
Speedup : 1.2981122
Length : 100000
CPU time: 3.0780001
GPU time: 2.6250000
Speedup : 1.1725715
Length : 1000000
CPU time: 30.734000
GPU time: 26.282000
Speedup : 1.1693935

It was way too good to be true. Ah well.

OK, what about simple array products?

pro test_gpuMult
; initialize
gpuinit
; array of 1000000 3-element observation vectors (rows)
A = randomu(s,3,1000000)
gpuPutArr,A,A_gpu
; calculate square of A on the CPU
start = systime(2)
for j=0L,99 do C = A*A
CPUtime = systime(2)-start
print, 'CPU time: ',CPUtime
; now on the GPU
start = systime(2)
for j=0L,99 do gpuMult,A_gpu,A_gpu,C_gpu
GPUtime = systime(2)-start
print,'GPU time: ', GPUtime
print,'Speedup : ', CPUtime/GPUtime
gpuFree,A_gpu
; check that the results are the same
gpuGetArr,C_gpu,C1
print,'Check:'
print, total(C-C1)
gpuFree,C_gpu
end

This gives a speedup of 10:

% Compiled module: TEST_GPUMULT.
CPU time: 2.1100001
GPU time: 0.20300007
Speedup : 10.394086
Check:
0.000000

0 comments: