there are so many cores

Just another WordPress.com site

Monthly Archives: May 2011

Fixed bugs causing multi-threaded failures

Leave a comment Posted by fastkor on May 2, 2011

There were ~~four~~ five bugs.

reference counted array memory not thread-safe
zero elapsed kernel run time treated as compute device failure
scheduler assumed there is always a fastest known compute device
memory manager confused by incremental trace with too much history
interpreter left uninitialized data in arrays of ones and zeros

It’s kind of shocking, actually.

The last bug in the interpreter took the most work to find. Yet, it was the dumbest, a for-loop that looked like this:

for (int j = 0; i < W * H; i++)
    m->floatPtr()[i] = _floatValue;

That is so obviously wrong. It so happens this works just fine in the single threaded case.

Some lessons from this:

there is no magic, only bugs and stuff not understood yet
a stress testing and validation suite is really necessary
if the initial release includes bugs like this, it will be dead on arrival

Now I can get back to the JIT.

What prompted the discovery of so many bugs were the middle-end JIT optimizations I’m working on. One of them is “lifting” BLAS level 2 operations (GEMV) to BLAS level 3 (GEMM). If there are N threads/traces each doing a matrix/vector multiply with the same matrix, that can be transformed to a matrix/matrix multiply. This is a huge optimization. For a discrete GPU that is I/O bound by data transfer over the PCIe bus, it can easily mean a 100x increase in throughput.

BEFORE WITH N TRACES, WHERE i = 0 .. N-1    AFTER WITH 1 VECTORIZED TRACE

Arrayf64 A  = Arrayf64::make2(N, N, cpuA);  Arrayf64 A  = Arrayf64::make2(N, N, cpuA);
Arrayf64 p  = Arrayf64::make1(N, cpuP[i]);  Arrayf64 P  = Arrayf64::make2(N, N, cpuP);
Arrayf64 Ap = matmul(A, p);                 Arrayf64 AP = matmul(A, P);

It actually works a little differently than the PeakStream code above suggests. But the transformation is conceptually the same.

Uncategorized

	Pavel on Example: OpenCL boilerplate
	fastkor on Next post will be the alpha…
	Jan Tore Korneliusse… on Next post will be the alpha…
	tcnghia on Uploaded project source code t…
	Carl Friedrich Bolz on Read about PyPy and Psyco the…

there are so many cores

Monthly Archives: May 2011

Fixed bugs causing multi-threaded failures

Recent Posts

Recent Comments

Archives

Categories

Meta