I was overly optimistic with the PeakStream conjugate gradient sample code. It only works for some OpenCL compute devices and depends on the vendor SDK used. It works on a Core 2 Duo with the AMD SDK. It works on a Core i7 920 with the Intel SDK. It doesn’t work properly on any other combination I’ve tried. Specifically, it’s not working on the Radeon HD 5870 and GTX 480. I have a few more discrete GPUs to test with too.
This makes me suspicious of my code.
This last week was more cleanup. The build is much better. The configure script is quite a bit more robust. Logging in the virtual machine is done through an object (instead of print statements scattered everywhere) and can be added or removed as a preprocessor option. I need this to follow what’s going on.
Now that I’ve started regularly testing with the big three vendors (AMD, Intel, NVIDIA), I begin to suspect that GPGPU compilers cheat. I know I did. There are certainly happy paths through my JIT. It’s easy to write code that will cause the compiler to fail. It will generate something wrong.
Programming with OpenCL on a compute device with one vendor SDK gives a different impression than developing for half a dozen compute devices with three vendor SDKs. When you see a single failure, you just think it’s an isolated bug. When you see a pattern of failures, then there is something systematic going on.
It is difficult to write an OpenCL compiler that generates fast and correct code in all cases. Every vendor has the same problem. It’s hard for everyone.
Changing the subject…
Yesterday, I had a moment of inspiration. I saw the next evolutionary step for this project (besides making it work right).
The virtual machine and JIT must themselves be configurable and generated. This doesn’t necessarily mean metaprogramming. It may be something more like Buildroot. It certainly does not mean metacircularity.
This is necessary to support the market trend towards CPU/GPU hybrids (e.g. AMD Fusion APUs). GPUs are starting to become standard in low-end portable and mobile devices. Intel processors incorporate graphics now. Power and memory efficiency are important. That’s why customized virtual machines and JITs that strip out everything an application doesn’t need make sense (if nothing else, this reduces the memory footprint).
The functional guys would say this is nothing new. Everything is language, including the software platform itself. I’m not trying to go that far.
Why didn’t Java do this? At my first job out of grad school, there was a top secret project for one of the big telecoms companies on a Java phone. Remember, this was back in 1997! So the prototype was a mobile handset tethered to a workstation under a desk. People made jokes about this. (Ten years later, that workstation really did fit inside the handset!)
There was a lot of interest in embedded Java back then. Actually, that was where Java started, with Oak. It was originally intended to be an embedded platform.
The issue with Java, then as now, is that the footprint is large. Back in 1997, it was too fat to fit in a phone. So there were cut-down versions of Java available for embedded applications. They were so reduced from the full language that it made more sense just to write in C/C++ (which is probably what everyone did).
What they should have done is create a “Buildjava” that allowed configuring a runtime platform tailor made to a user’s embedded application. Maybe that was just too hard. Also, it works against the “write once, run anywhere” model. Everyone would speak their own dialect of Java.
So people just waited a few years until the hardware became strong enough to support their software. Still, ten years is a long time to wait.