Length : 1000
CPU time: 0.047000170
GPU time: 0.030999899
Speedup : 1.5161395
Length : 10000
CPU time: 0.34399986
GPU time: 0.26500010
Speedup : 1.2981122
Length : 100000
CPU time: 3.0780001
GPU time: 2.6250000
Speedup : 1.1725715
Length : 1000000
CPU time: 30.734000
GPU time: 26.282000
Speedup : 1.1693935
It was way too good to be true. Ah well.
OK, what about simple array products?
pro test_gpuMult
; initialize
gpuinit
; array of 1000000 3-element observation vectors (rows)
A = randomu(s,3,1000000)
gpuPutArr,A,A_gpu
; calculate square of A on the CPU
start = systime(2)
for j=0L,99 do C = A*A
CPUtime = systime(2)-start
print, 'CPU time: ',CPUtime
; now on the GPU
start = systime(2)
for j=0L,99 do gpuMult,A_gpu,A_gpu,C_gpu
GPUtime = systime(2)-start
print,'GPU time: ', GPUtime
print,'Speedup : ', CPUtime/GPUtime
gpuFree,A_gpu
; check that the results are the same
gpuGetArr,C_gpu,C1
print,'Check:'
print, total(C-C1)
gpuFree,C_gpu
end
This gives a speedup of 10:
% Compiled module: TEST_GPUMULT.
CPU time: 2.1100001
GPU time: 0.20300007
Speedup : 10.394086
Check:
0.000000

0 comments:
Post a Comment