<gobbledygook> Last week, I realized that syntactic analysis is not entirely free of operation code semantics. </gobbledygook>
Or: Last week, I realized I am ignorant of prior art and wasting time reinventing wheels.
So I stopped coding and started reading about PyPy and Psyco which both use tracing JITs. As Mr. Spock would say, “Fascinating.” Knowing what something is not helps to understand what it is.
PeakStream is not a dynamic language. Obvious? The API suggests static typing:
RNGf32 G(SP_RNG_DEFAULT, 271828); // create an RNG
Arrayf32 X = rng_uniform_make(G, NSET, 1, 0.0, 1.0);
Arrayf64 A = make1(100, cpuA);
Every array variable is typed by precision. Single precision is “f32” while double precision is “f64“. There’s no variable polymorphism or need for type inference. All variable types are known statically. There is also no way to modify the platform runtime from within user code. Everything necessary is known from the source code.
Despite being not dynamic, PeakStream uses a JIT to compile shader kernels at runtime. Why? There are two reasons.
- Compiling kernels statically would require a special toolchain (as in BrookGPU). Embedding the platform in libraries allows standard C++ development tools. This is mostly a market constraint rather than a technical decision.
- Foreign C++ freely intermingles with the PeakStream DSL. The runtime is ignorant of program control flow. Only the execution trace is available.
The tracing JITs of Psyco and PyPy rely on control flow hints passed through the interpreter at backward jumps. The major optimization is identifying hot loops and compiling these. Hooks between the interpreter and JIT are at points where control flow changes.
PeakStream does not have control flow information like a conventional tracing JIT. Any loops and even common subexpressions must be detected using heuristics and hashing (transpositions). PeakStream knows the computational graph but has no direct knowledge of the user source code that generated it. The runtime must work backwards to infer loops, i.e. “loop rolling”.
From an abstract point of view, the PeakStream JIT is a specializer performing dynamic partial evaluation like Psyco and PyPy. The JIT accelerates hot spots found when interpreting. However, this is misleading as optimization is completely different for CPUs and GPUs. My experience auto-tuning GPGPU kernels with GATLAS is specialization requires search of a neighborhood around each problem. It’s not a straightforward series of transformations that we would do as human programmers.
This project seems much smaller to me now. PeakStream is a simple language. It is not dynamic. The base language, including a reasonable JIT and scheduler, should not be that difficult to implement. Most development costs will come later with good compiler optimizations across accelerator devices (GPUs) and high performance math kernel libraries.