there are so many cores

Just another WordPress.com site

Scheduling connects all major subsystems

Prior to this rewrite, I had focused on interpreting the PeakStream language and then on JIT compilation using a OpenCL back-end. I had to stop when it became obvious scheduling was upstream of the JIT. In fact, it turns out the scheduler is the hub that connects all major subsystems.

I normally think of scheduling like a checkout line from which customers are directed to one of many cashiers. The objective in this case is some combination of high throughput, minimal latency or optimal utilization. Note how each customer is serviced individually and remains isolated from others.

The PeakStream scheduler is not like this.

It is more like passengers waiting at an airport gate to fly somewhere. The objective is in this case is to have planes as full as possible. Empty seats lose money. Passengers are packaged together in planes and serviced together as a group when the plane takes off and flies to a destination.

Scheduling for PeakStream is a way of collecting stream computations and packaging them together in GPU kernels. Together with the scheduler, the JIT finds an efficient way to do this. Computations are performed when the kernel executes. Memory and data layouts are a side-effect of the packaging solutions found by the scheduler and JIT.

The upshot is scheduling, JIT compilation and memory management are closely related in the PeakStream approach.

At this point, I’ve written working prototypes for all of the major subsystems, the minimal set necessary for end-to-end functionality. However, I have not written them all at the same time within the same code base. This project has undergone several complete rewrites.

  1. API
  2. interpreter
  3. JIT with OpenCL
  4. scheduler
  5. memory manager
  6. executor

With the latest major rewrite (that I expect is the last one, at least for the core platform), there is now a connected API, scheduler, memory manager and executor. The next effort is merging back in the interpreter and then the very primitive JIT prototype I had written earlier.

This is not good enough to be useful.

The JIT must have auto-tuning capability with a kernel cache (did this before in GATLAS) to be worth anything. There must be a math kernel library even if contains nothing else but GEMM.

One thing I am rather ignorant of is random number generation, on GPUs or otherwise. I have to admit that I have no experience with Monte Carlo methods in real life (which is often a lot dirtier, hand-crafted and heuristic than in theory). This forces me to again rely heavily on PeakStream’s trail of artifacts: Pseudorandom Number Generation on the GPU.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: