ATI and NVIDIA OpenCL chimera binaries
November 30, 2011
Posted by on
If ATI and NVIDIA OpenCL SDKs are installed on a host, it is very easy to accidentally create a chimera binary. An application may show correct linkage to shared object libraries using ldd. The problem is that linkage also happens at runtime. To be safe, it’s necessary to take precautions with shared library paths.
Specifically, a NVIDIA OpenCL application will somehow find, dynamically load, and link ATI shared libraries if they are in the LD_CONFIG_PATH. It doesn’t matter that the CUDA paths come before ATI. It’s necessary to remove the ATI paths entirely from LD_CONFIG_PATH.
Going in the other direction, an ATI OpenCL application that (mistakenly, due to a bug) tries to use an NVIDIA GPU compute device can link NVIDIA CUDA shared libraries with the running process.
These runtime chimeras will often run (although I am not sure what they are doing) but evidence mysterious failures involving memory buffer management, usually terminating with a segmentation fault or abort signal.
I’ve been very foolish. This should have been obvious. Mixing happens very easily and silently unless there are deliberate mechanisms to detect incompatibility. Both vendor SDKs assume they are the only one present on the host. So naturally they don’t check for linkage errors.
Everything is working now. ATI works. NVIDIA works. They are both using the same JIT and autotuned kernel templates (in the sense of parameterized design, not C++ code). There are still JIT bugs. But the calculated results are consistent between the reference interpreter and the GPUs for both ATI and NVIDIA.
This is great as the whole point of having a managed platform is:
- higher level application code
- automated performance tuning
- write once, run on ATI and NVIDIA
So it’s looking like all three points are working out.