Ready to start on the GPU back-end
December 29, 2010
Posted by on
The C++ API front-end and CPU interpreter back-end are solid now. Next up is the interesting stuff with scheduling, JIT and GPU back-end.
- 3253 lines of C++
- 54 files
- keyword template appears 37 times
- preprocessor directive #define appears 58 times
- bytecode stack virtual machine with 57 instructions
- reference counted array buffers
- tunable lazy evaluation (very important!)
The last point about lazy evaluation is significant. The advantage of a JIT comes from the ability to optimize and schedule code at runtime. Code does not always execute immediately. It is queued, dynamically compiled and scheduled. Then metrics are collected to inform the system, improve management and enable higher performance over time.
There is a balance between eager and lazy evaluation. Eagerness permits less optimization but also has lower overhead. In the limit, this becomes an interpreter. Laziness allows more optimization but has higher overhead from boxing computations. In the limit, this becomes Haskell!
Before dealing with this eager/lazy trade-off, the first thing is a back-end for the GPU. This will be a non-optimizing (no auto-tuning, that comes later) JIT that fuses streams and synthesizes OpenCL kernels dynamically. New virtual machine instructions will be added to the interpreter dispatch table corresponding to the synthesized kernels. So one way of thinking about the JIT is as an accelerator for the interpreter.