there are so many cores

Just another WordPress.com site

Next post will be the alpha release

I think I’m done with this alpha release. The next post will include a link to a GitHub download. The source will be checked in there too. I’ll probably create a new WordPress page just for releases to avoid forcing people to search through this blog narrative.

This release has shifted my mindset from self absorbed virtual machine language builder to considering users. So although it’s not much and some of it makes me look bad, I decided to be more open about the state of this project and technology. I’m taking a long view – if this technology has merit and is good, then it will live. Otherwise, it will die. It doesn’t matter too much how I portray it.

So…

Here’s my test matrix. It’s a meagre collection of small demo code applications on every compute device I have.

OK - everything works
AF - autotuning fails for batched kernels, scheduled to interpreter
IR - interpreter support only (compute device RNG not implemented)
IS - interpreter support only (single precision only compute device)
OF - everything works except may fail rarely (enough to be noticed?)
SV - segmentation fault (also bus error)
KF - segmentation fault in kernel generated by shader compiler
UC - usually correct results, sometimes wrong
UW - results generally wrong or out of order

Sample        PentiumM  Core2Duo  Corei7_920  5440  5670  5770  5870  480
Application   AMD       AMD       AMD  INTEL  AMD   AMD   AMD   AMD   NVIDIA
============  ========  ========  ===  =====  ====  ====  ====  ===   ======
cg            KF        OK        OK   OK     OK    OK    OK    OK    OK
cg64          KF        OK        OK   OK     IS    IS    IS    OK    OK
index         OK        OK        OK   OK     OK    OK    OK    OK    OK
kirch         SV        SV        SV   SV     OK    OK    OK    OK    OK
loopsum_omp   KF        UC        SV   SV     UC    UC    UW    OK    OK
loopsum_pth   KF        UC        SV   SV     UC    UC    UC    OK    OK
loopsum_uni   KF        OK        OK   OK     OK    OK    OK    OK    OK
loopsum_vec   KF        UW        UW   UW     UW    UC    UW    OK    OK
matmul_omp    KF        AF        AF   AF     OK    OK    OK    OK    OK
matmul_pth    KF        AF        AF   AF     OK    OK    OK    OK    OK
matmul_uni    KF        OK        OK   OK     OK    OK    OK    OK    OK
matmul_vec    KF        AF        AF   AF     OK    OK    OK    OK    OK
matmul64_omp  KF        AF        AF   AF     IS    IS    IS    OK    OK
matmul64_pth  KF        AF        AF   AF     IS    IS    IS    OK    OK
matmul64_uni  KF        OK        OK   OK     IS    IS    IS    OK    OK
matmul64_vec  KF        AF        AF   AF     IS    IS    IS    OK    OK
mingle        KF        OK        OK   OK     OK    OK    OK    OK    OK
mingle64      KF        OK        OK   OK     IS    IS    IS    OK    OK
monte         IR        IR        IR   IR     IR    IR    IR    IR    IR
pi            IR        IR        IR   IR     IR    IR    IR    IR    IR
sum_omp       KF        OK        SV   SV     OF    OF    OF    OF    OK
sum_pth       KF        OK        SV   SV     OF    OF    OF    OF    OK
sum_uni       KF        OK        OK   OK     OK    OK    OK    OK    OK
sum_vec       KF        OK        OK   OK     OK    OK    OK    OK    OK

You can see some of the skeletons in the virtual machine closet through this table.

Some of that is the inherently complex nature of the technology. Vendors have to market GPGPU as a solved problem and mature technology. If that were true, it wouldn’t be interesting (easy stuff generally lacks mystique).

Some of that is also bugs, especially race conditions and logic errors. I’ve come to rely on testing as an integral part of development. That is just being realistic. I can’t keep the entire design in my head at the same time any more.

Other stuff…

  1. Tomorrow, I’ll read The Java Native Interface Programmer’s Guide and Specification by Sheng Liang. I’ve done JNI before. That was ten years ago. I haven’t done any Java for over two years either. I am very rusty. My thinking is to create a simple bridge from Java to native code libraries, analogous to how BerkeleyDB works.
  2. I am flying out to Savannah for SIAM Parallel Processing 2012 this coming Tuesday (Valentine’s Day).
Advertisements

2 responses to “Next post will be the alpha release

  1. Jan Tore Korneliussen February 13, 2012 at 12:25 pm

    This seems like impressive work, congratulations on the alpha release.

    Much of what was said about the rationale for PeakStream in 2007 still applies today. I quote a slide from a presentation given on PeakStream post-google-aquisition:

    “””Automatic Stream Kernel Synthesis
    » Identifying the streaming kernel
    • What’s the granularity of the inner loop?
    • How many GPU passes are optimal?
    » It’s inappropriate for the application to pick
    • It is very processor-dependent
    • Depends on processor family, model, memory, …
    » This is a good task for compilers
    • This is what the PeakStream JIT compiler does
    • Ensures portability of application code
    • Ensures scalable performance over many processors”””

    Programming directly to OpenCL or CUDA there is a lot of the mentioned inappropriate picking if you care about high performance.

    Question: will Chai support per-element specification of computational kernels, or is it specified as a series of array operations only? The first is quite a natural way to express image processing kernels, for instance.

  2. fastkor February 13, 2012 at 4:01 pm

    Answer: I agree. There should be auto-tuned convolution support in 1D, 2D, and perhaps 3D (image pyramids). This sort of spatial domain filtering is a basic signal processing operation.

    I hope this answers your question. My viewpoint is to subsume per-element specification into array operations as auto-tuned kernels rather than directly expose them (e.g. MATLAB style).

    One of the motivations in leaving stealth mode is to let the market decide what is important. The biggest risk I see is building the wrong thing, not failing to solve engineering design problems. It’s hard to know what is important.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: