there are so many cores

Just another site

Feature complete

Fixed all of the problems with the auto-tuned GEMV. It wasn’t as bad as expected. Yesterday, I was rattled because every kernel specialization was failing. Not reassuring.

I’ve learned to be paranoid about numerical correctness. Automated testing is incorporated into the auto-tuning process. A kernel specialization is stress tested with random data before accepted as good.

What makes this trickier is that extensive auto-tuning, when hundreds or thousands of kernel variations are tested, meets limitations in vendor runtimes. The GPU driver might crash or the device enter a bad state. The OpenCL compiler may hang, segfault, or fail with internal error messages. Despite coming from the same design template, some specializations work perfectly on a device while others fail.

All of this adds enough ambiguity that distinguishing your bugs from toolchain and platform issues is difficult. My experience so far is: My code often has more bugs than I think it does. It’s probably a cognitive bias to blame known vendor bugs as responsible for other, as yet undiagnosed, bugs.

There are still some serious bugs in the JIT. However, even with those, I see output that agrees between generated OpenCL on ATI, NVIDIA, x86 and a reference CPU interpreter. The numbers are all the same. That gives me confidence it is really working and not garbage output.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: