there are so many cores

Just another WordPress.com site

Almost at feature freeze for first release

Added this week:

  • index arrays (index_f32, index_f64 – both 1D and 2D variants along width or height)
  • gather data shuffling (gather1_floor, gather2_floor – allows array subscripts with ordinates from data instead of loop indexes and work-item IDs)
  • outer product matrix multiply in generated kernel

The last remaining major feature is auto-tuned GEMV (matrix-vector multiply). That shouldn’t be too hard as I’ve done this before (have old code) and GEMM is already integrated. I want to get this done tomorrow.

As mentioned earlier, there’s no time to work on GPU random number support in the immediate future. Any execution traces that use the RNG API will be scheduled on the CPU interpreter.

The JIT does work but has many dark corners. That’s why it is important to stop adding new features and start fixing bugs. For the first release, not everything will work – which makes it more important to know what does work.

Correctness is the prime requirement. After that is stable and consistent behavior. That’s an issue with managed platforms sometimes (e.g. unpleasant surprises with a database execution plan optimizer). Failures are o.k. if known and not silent.

That’s the mindset I have for this as technology. It must be useful. It doesn’t have to be perfect. I want to add value, not uncertainty.

Leave a comment