Vectorized gather support added
April 13, 2012
Posted by on
Gathering supports vectorized memory access now. This only works for memory access patterns with simple stencil translation. The offsets to the current item ID must be literal constants. For example:
Arrayf32 X = index_f32(0, N, N);
Arrayf32 Y = index_f32(1, N, N);
B += gather2_floor(A, X, Y);
B += gather2_floor(A, X - 1, Y);
B += gather2_floor(A, X + 1, Y);
B += gather2_floor(A, X, Y - 1);
B += gather2_floor(A, X, Y + 1);
The stencils can be arbitrarily large, though.
I haven’t tested this with images and texture sampling yet. I’m currently working out of a hospital on a laptop with wireless. That’s also why I haven’t updated this blog for almost two weeks. Real life stuff happens sometimes. Interesting trivia: I used to think of “code” as source code. For a doctor, “full code” and “no code” have very different meanings.
After I get this working with images, the next thing to add is random number generation on the GPU. I’ve realized the JIT really needs to be redesigned. It’s too inflexible to support the translations and optimizations I would like.
So my priority is to be feature complete for the beta in a few months from a language viewpoint. The production release will include the JIT redesign and should generate higher quality GPGPU code.