there are so many cores

Just another WordPress.com site

AMD supports images on CPU devices

The AMD OpenCL SDK supports images on CPU devices. This is a surprise. Two years ago, images only worked on GPU devices. I never bothered to retest. I only noticed by accident when testing gather operations with images.

Here is some sample code:

const size_t N = 12;

void filter(float* cpuA, float* cpuB, const size_t N)
{
    Arrayf32 B;
    {
        Arrayf32 X = index_f32(0, N, N);
        Arrayf32 Y = index_f32(1, N, N);

        Arrayf32 A = make2(N, N, cpuA);

        B = zeros_f32(N, N);
        B += gather2_floor(A, X, Y);
        B += gather2_floor(A, X - 1, Y);
        B += gather2_floor(A, X + 1, Y);
        B += gather2_floor(A, X, Y - 1);
        B += gather2_floor(A, X, Y + 1);
    }
    B.read1(cpuB, N * N * sizeof(float));
}

When images are configured as a device capability, the JIT generates:

__kernel void f_9583626628478369588_1(
  __read_only image2d_t a1,
  __global float4 * a2)
{
  const sampler_t sampler = CLK_FILTER_NEAREST | CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE;
  const float4 a10 = read_imagef(a1, sampler, (int2)(((get_global_id(0) + 0) % 3), ((get_global_id(1) + 0) % 12)));
  const float4 a11 = read_imagef(a1, sampler, (int2)(((get_global_id(0) + 2) % 3), ((get_global_id(1) + 0) % 12)));
  const float4 a12 = read_imagef(a1, sampler, (int2)(((get_global_id(0) + 1) % 3), ((get_global_id(1) + 0) % 12)));
  const float4 a13 = read_imagef(a1, sampler, (int2)(((get_global_id(0) + 0) % 3), ((get_global_id(1) + 11) % 12)));
  const float4 a14 = read_imagef(a1, sampler, (int2)(((get_global_id(0) + 0) % 3), ((get_global_id(1) + 1) % 12)));
  float4 r1;
  r1= 0 ;
  r1= (r1 + (float4)(a10.s0, a10.s1, a10.s2, a10.s3)) ;
  r1= (r1 + (float4)(a11.s3, a10.s0, a10.s1, a10.s2)) ;
  r1= (r1 + (float4)(a10.s1, a10.s2, a10.s3, a12.s0)) ;
  r1= (r1 + (float4)(a13.s0, a13.s1, a13.s2, a13.s3)) ;
  r1= (r1 + (float4)(a14.s0, a14.s1, a14.s2, a14.s3)) ;
  a2[(get_global_id(0)) + get_global_id(1) * 3] = r1;
}

When the device is configured for memory buffers only, the JIT generates:

__kernel void f_9583626628478369588_1(
  __global const float4 * a1,
  __global float4 * a2)
{
  const float4 a10 = a1[((get_global_id(0) + 0) % 3) + ((get_global_id(1) + 0) % 12) * 3];
  const float4 a11 = a1[((get_global_id(0) + 2) % 3) + ((get_global_id(1) + 0) % 12) * 3];
  const float4 a12 = a1[((get_global_id(0) + 1) % 3) + ((get_global_id(1) + 0) % 12) * 3];
  const float4 a13 = a1[((get_global_id(0) + 0) % 3) + ((get_global_id(1) + 11) % 12) * 3];
  const float4 a14 = a1[((get_global_id(0) + 0) % 3) + ((get_global_id(1) + 1) % 12) * 3];
  float4 r1;
  r1= 0 ;
  r1= (r1 + (float4)(a10.s0, a10.s1, a10.s2, a10.s3)) ;
  r1= (r1 + (float4)(a11.s3, a10.s0, a10.s1, a10.s2)) ;
  r1= (r1 + (float4)(a10.s1, a10.s2, a10.s3, a12.s0)) ;
  r1= (r1 + (float4)(a13.s0, a13.s1, a13.s2, a13.s3)) ;
  r1= (r1 + (float4)(a14.s0, a14.s1, a14.s2, a14.s3)) ;
  a2[(get_global_id(0)) + get_global_id(1) * 3] = r1;
}

Right now, image support is configured per device in a file read by the virtual machine on startup. For gather operations, the JIT uses images or memory buffers based on the configuration file entry for the compute device.

GEMM and GEMV are auto-tuned. The JIT will pick between images and memory buffers based on kernel execution time. Note this is also not quite right. There is a tradeoff between: image creation overhead; the speed of texture sampling; the length/depth of the calculation graph. The optimal choice of images or memory buffers depends on more than the speed of one kernel. It depends on the total performance across a sequence of kernels and memory transfers.

All of this is less than ideal. At this point, I will leave it alone as I intend to rewrite much of the JIT anyway in a few months. For now, I’m just adding features. Next up is random number generation.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: