there are so many cores

Just another WordPress.com site

Finally working: PeakStream conjugate gradient demo code

I’ve been working on the memory management and JIT interactions this last week. The PeakStream conjugate gradient demo code works correctly now (compiled by the JIT to OpenCL and not only interpreted as before). So do the rest of my (quite meager) set of samples. It’s all working.

// Sample code from a presentation given by Matthew Papakipos at Stanford in
// 2007 (page 13 of "Public Google PeakStream.ppt")
// http://www.stanford.edu/class/ee380/Abstracts/070926-PeakStream.pdf
int Conj_Grad_GPU_PS(int N, float *cpuA, float *cpux, float *cpub)
{
    int iter;
    Arrayf32 x = Arrayf32::zeros(N);
    {
        Arrayf32 A = Arrayf32::make2(N, N, cpuA);
        Arrayf32 b = Arrayf32::make1(N, cpub);
        Arrayf32 residuals = b - matmul(A, x);
        Arrayf32 p = residuals;
        Arrayf32 newRR = dot_product(residuals, residuals);

        for (iter = 0; iter < N; iter++) {
            Arrayf32 oldRR = newRR;
            Arrayf32 newX, newP, newResiduals;
            Arrayf32 Ap = matmul(A, p);
            Arrayf32 dp = dot_product(p, Ap);
            newX = x + p * oldRR / dp;
            newResiduals = residuals - Ap * oldRR / dp;
            newRR = dot_product(newResiduals, newResiduals);
            newP = newResiduals + p * newRR / oldRR;

            p = newP;
            residuals = newResiduals;

            float oldRRcpu = oldRR.read_scalar();
            if (oldRRcpu <= TOLERANCE) {
                break;
            }
            x = newX;
        }
    }
    x.read1(cpux, N * sizeof(float));

    return iter;
}

This example is relatively complicated and caused me some grief. I’ve learned lessons about code generation for GPUs. These are still not clear to me. So I won’t go into details. I can say that the solutions I ended up with are unexpected.

So maybe that is one lesson I can relate. Have an open mind about solutions. What is unorthodox may just not appear immediately sensible. If a solution is cheap to try, it is worth investigation.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: