PeakStream CG example works on AMD, Intel, and NVIDIA
February 2, 2012
Posted by on
Fixed the bug with the PeakStream CG example. Root caused all symptoms. Now it is working on both AMD and NVIDIA GPUs as well as Intel CPUs.
As you might guess, there were several effects which obscured the cause.
I normally leave the scheduler configured to use any compatible compute device. GPU devices have a higher priority than CPUs. This can give the appearance of intermittent success on the GPU when traces are really being scheduled to an OpenCL CPU device. This is just careless.
The configuration comes from a simple text file. Here’s what it looks like right now on my develop-integration-test GPU server:
@HD5870@AMD Cypress Advanced Micro Devices
@GTX480@NVIDIA GeForce GTX 480 NVIDIA
@Corei7920@AMD@INTEL Intel Core i7 920
@Core2Duo@AMD Intel Core 2 Duo
@PentiumM@AMD Intel Pentium M
@HD5870 Evergreen FP64 Images
@GTX480 Evergreen FP64 ShutdownNOP
@Corei7920 Evergreen FP64
@Core2Duo Evergreen FP64
@PentiumM Evergreen FP64
@HD5870 Paranoid=0.1 SearchTrials=5 TimingTrials=10 Watchdog=60
@GTX480 Paranoid=0.1 SearchTrials=5 TimingTrials=10 Watchdog=60
@Corei7920 Paranoid=0.1 SearchTrials=5 TimingTrials=10 Watchdog=120
@Core2Duo Paranoid=0.1 SearchTrials=5 TimingTrials=10 Watchdog=60
@PentiumM Paranoid=0.1 SearchTrials=5 TimingTrials=10 Watchdog=60
Just remove all of the Intel CPU lines and the scheduler will only use GPUs.
The second part of the bug was more pernicious. There are separate code paths for synthesized and autotuned kernels. Memory was managed slightly differently.
Autotuned kernels were ignoring the static analysis done by the JIT that determines memory transfers. These kernels were always sending buffer data from the CPU to GPU, even when the CPU buffer is garbage and the GPU memory object is current.
I had nightmares of scheduler race conditions dancing in my imagination. It turned out to be simple logic error. This is wonderful.