there are so many cores

Just another site

Ready to start on the GPU back-end

The C++ API front-end and CPU interpreter back-end are solid now. Next up is the interesting stuff with scheduling, JIT and GPU back-end.

  • 3253 lines of C++
  • 54 files
  • keyword template appears 37 times
  • preprocessor directive #define appears 58 times
  • bytecode stack virtual machine with 57 instructions
  • reference counted array buffers
  • tunable lazy evaluation (very important!)

The last point about lazy evaluation is significant. The advantage of a JIT comes from the ability to optimize and schedule code at runtime. Code does not always execute immediately. It is queued, dynamically compiled and scheduled. Then metrics are collected to inform the system, improve management and enable higher performance over time.

There is a balance between eager and lazy evaluation. Eagerness permits less optimization but also has lower overhead. In the limit, this becomes an interpreter. Laziness allows more optimization but has higher overhead from boxing computations. In the limit, this becomes Haskell!

Before dealing with this eager/lazy trade-off, the first thing is a back-end for the GPU. This will be a non-optimizing (no auto-tuning, that comes later) JIT that fuses streams and synthesizes OpenCL kernels dynamically. New virtual machine instructions will be added to the interpreter dispatch table corresponding to the synthesized kernels. So one way of thinking about the JIT is as an accelerator for the interpreter.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: