there are so many cores

Just another site

Vectorized gather support added

Gathering supports vectorized memory access now. This only works for memory access patterns with simple stencil translation. The offsets to the current item ID must be literal constants. For example:

Arrayf32 X = index_f32(0, N, N);
Arrayf32 Y = index_f32(1, N, N);

B += gather2_floor(A, X, Y);
B += gather2_floor(A, X - 1, Y);
B += gather2_floor(A, X + 1, Y);
B += gather2_floor(A, X, Y - 1);
B += gather2_floor(A, X, Y + 1);

The stencils can be arbitrarily large, though.

I haven’t tested this with images and texture sampling yet. I’m currently working out of a hospital on a laptop with wireless. That’s also why I haven’t updated this blog for almost two weeks. Real life stuff happens sometimes. Interesting trivia: I used to think of “code” as source code. For a doctor, “full code” and “no code” have very different meanings.

After I get this working with images, the next thing to add is random number generation on the GPU. I’ve realized the JIT really needs to be redesigned. It’s too inflexible to support the translations and optimizations I would like.

So my priority is to be feature complete for the beta in a few months from a language viewpoint. The production release will include the JIT redesign and should generate higher quality GPGPU code.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: