there are so many cores

Just another WordPress.com site

PeakStream CG example works on AMD, Intel, and NVIDIA

Leave a comment Posted by fastkor on February 2, 2012

Fixed the bug with the PeakStream CG example. Root caused all symptoms. Now it is working on both AMD and NVIDIA GPUs as well as Intel CPUs.

As you might guess, there were several effects which obscured the cause.

I normally leave the scheduler configured to use any compatible compute device. GPU devices have a higher priority than CPUs. This can give the appearance of intermittent success on the GPU when traces are really being scheduled to an OpenCL CPU device. This is just careless.

The configuration comes from a simple text file. Here’s what it looks like right now on my develop-integration-test GPU server:

#device_definition
@HD5870@AMD           Cypress Advanced Micro Devices
@GTX480@NVIDIA        GeForce GTX 480 NVIDIA
@Corei7920@AMD@INTEL  Intel Core i7 920
@Core2Duo@AMD         Intel Core 2 Duo
@PentiumM@AMD         Intel Pentium M

#device_capabilities
@HD5870        Evergreen FP64 Images
@GTX480        Evergreen FP64 ShutdownNOP
@Corei7920     Evergreen FP64
@Core2Duo      Evergreen FP64
@PentiumM      Evergreen FP64

#device_settings
@HD5870        Paranoid=0.1 SearchTrials=5 TimingTrials=10 Watchdog=60
@GTX480        Paranoid=0.1 SearchTrials=5 TimingTrials=10 Watchdog=60
@Corei7920     Paranoid=0.1 SearchTrials=5 TimingTrials=10 Watchdog=120
@Core2Duo      Paranoid=0.1 SearchTrials=5 TimingTrials=10 Watchdog=60
@PentiumM      Paranoid=0.1 SearchTrials=5 TimingTrials=10 Watchdog=60
@Core2Duo@AMD  PragmaFP64=cl_amd_fp64
@PentiumM@AMD  PragmaFP64=cl_amd_fp64

Just remove all of the Intel CPU lines and the scheduler will only use GPUs.

The second part of the bug was more pernicious. There are separate code paths for synthesized and autotuned kernels. Memory was managed slightly differently.

Autotuned kernels were ignoring the static analysis done by the JIT that determines memory transfers. These kernels were always sending buffer data from the CPU to GPU, even when the CPU buffer is garbage and the GPU memory object is current.

I had nightmares of scheduler race conditions dancing in my imagination. It turned out to be simple logic error. This is wonderful.

Uncategorized

← Heterogeneous computing is hard for compilers Big compute on small iron →

	Pavel on Example: OpenCL boilerplate
	fastkor on Next post will be the alpha…
	Jan Tore Korneliusse… on Next post will be the alpha…
	tcnghia on Uploaded project source code t…
	Carl Friedrich Bolz on Read about PyPy and Psyco the…

there are so many cores

PeakStream CG example works on AMD, Intel, and NVIDIA

Leave a comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta

there are so many cores

PeakStream CG example works on AMD, Intel, and NVIDIA

Share this:

Related

Leave a comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta