there are so many cores

Just another WordPress.com site

Download

The project is on GitHub.

Alpha5 release, July 14, 2012 – Source code restructuring and cleanup motivated by future embedded platform support (configurable build in Buildroot style? i.e. Buildchai). Interpreter and JIT are separate paths. Significant false sharing of common code removed – recognizes the interpreter is mostly for debugging as full language API through
the JIT is supported.

Alpha4 release, June 24, 2012 – Mixed language programming with OpenCL kernels and PeakStream DSL. Very large .cpp files for API, enqueue trace, and memory manager broken up into more manageably sized pieces. Fixed one unhappy path: array variable width can be arbitrary, does not need to be a multiple of the underlying vector length.

Alpha3 release, May 25, 2012 – Gather operations with constant translation stencils (many image processing filters) are optimized to use images when possible. This takes advantage of the high speed L1 texture cache. Random number generation on the GPU is supported with the Random123 counter based PRNG. This includes uniform and normal distributions (using the Box-Muller transform).

Alpha2 release, March 25, 2012 – JIT code is reorganized into several subdirectories. The main new feature is unsigned and signed integer array type support. Mixed integer and floating point calculation is supported (includes autotuned GEMM and GEMV). Generated kernels make better use of private registers in generated code. Gathering now works in the kirch.cpp sample. The md5.cpp sample performs vectorized MD5 hash code calculation on the GPU.

Alpha release, February 12, 2012 – Code is in a working state with useful end-to-end functionality. Happy path functionality appears to be reliable. Autotuned GEMM and GEMV work with dynamically generated kernels from the JIT. Single threaded data parallel vectorization works for GPU compute devices. Multi-threaded gather/scatter vectorized scheduling works but is unstable (depends on compute device). Trace continuation (typically loops) with multiple readouts is working. Autotuning cold and warm start times are excessive.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: