there are so many cores

Just another WordPress.com site

PeakStream CG example works on AMD, Intel, and NVIDIA

Fixed the bug with the PeakStream CG example. Root caused all symptoms. Now it is working on both AMD and NVIDIA GPUs as well as Intel CPUs.

As you might guess, there were several effects which obscured the cause.

I normally leave the scheduler configured to use any compatible compute device. GPU devices have a higher priority than CPUs. This can give the appearance of intermittent success on the GPU when traces are really being scheduled to an OpenCL CPU device. This is just careless.

The configuration comes from a simple text file. Here’s what it looks like right now on my develop-integration-test GPU server:

#device_definition
@HD5870@AMD           Cypress Advanced Micro Devices
@GTX480@NVIDIA        GeForce GTX 480 NVIDIA
@Corei7920@AMD@INTEL  Intel Core i7 920
@Core2Duo@AMD         Intel Core 2 Duo
@PentiumM@AMD         Intel Pentium M

#device_capabilities
@HD5870        Evergreen FP64 Images
@GTX480        Evergreen FP64 ShutdownNOP
@Corei7920     Evergreen FP64
@Core2Duo      Evergreen FP64
@PentiumM      Evergreen FP64

#device_settings
@HD5870        Paranoid=0.1 SearchTrials=5 TimingTrials=10 Watchdog=60
@GTX480        Paranoid=0.1 SearchTrials=5 TimingTrials=10 Watchdog=60
@Corei7920     Paranoid=0.1 SearchTrials=5 TimingTrials=10 Watchdog=120
@Core2Duo      Paranoid=0.1 SearchTrials=5 TimingTrials=10 Watchdog=60
@PentiumM      Paranoid=0.1 SearchTrials=5 TimingTrials=10 Watchdog=60
@Core2Duo@AMD  PragmaFP64=cl_amd_fp64
@PentiumM@AMD  PragmaFP64=cl_amd_fp64

Just remove all of the Intel CPU lines and the scheduler will only use GPUs.

The second part of the bug was more pernicious. There are separate code paths for synthesized and autotuned kernels. Memory was managed slightly differently.

Autotuned kernels were ignoring the static analysis done by the JIT that determines memory transfers. These kernels were always sending buffer data from the CPU to GPU, even when the CPU buffer is garbage and the GPU memory object is current.

I had nightmares of scheduler race conditions dancing in my imagination. It turned out to be simple logic error. This is wonderful.

Leave a comment