There are now four basic types in the following promotion hierarchy (operations on mixed types are promoted to the highest one).
This is definitely moving beyond PeakStream which supported single and double precision floating point only. The Chai API now has “u32” and “i32” functions along with the PeakStream legacy “f32” and “f64” calls. The OpenCL built-in integer functions are also added to the API. My intention is first-class support for integer, floating point, and mixed precision and type calculation. So I am trying to do everything.
Implementing this turned out to be a miniature nightmare. There are exactly 7900 more lines of code since the alpha release three weeks ago. That’s much more than anticipated.
I still plan on a monthly release cycle – the alpha2 code should be uploaded in about a week. Given the amount of change, the next week will be spent testing.
I originally assumed data could only be two types: single and double precision floating point. This cross-cutting assumption was scattered through the code. The concepts of vector length, floating point precision, and data type were aliased and confused.
The worst part was the interpreter. With only two types, the Cartesian product of types for a binary operator has four combinations (2 x 2 = 4). With four types, there are now 16 combinations (4 x 4 = 16). This is worse when an operator accepts three arguments. Now there are 4 x 4 x 4 = 64 combinations. This is even worse as operations like matrix multiply have four cases: vector * vector; vector * matrix; matrix * vector; matrix * matrix. So that makes 4 x 64 = 256 combinations.
The JIT turned out to be much less work. That was a surprise.
One issue that has become obvious after doing this is code bloat. The API has grown large enough where a single monolithic header file and library is impractical. It needs to be factored into modular components that applications can selectively use. Even if the code bloat is removed, the Chai platform language is big enough where it should be partitioned.
During the long years when C++ was a draft standard, I read of a proposal to have a single massive header file for the entire language. Applications would have a single #include and get the STL and everything else. I don’t know if this could work – but I think everyone has the same gut feeling I do – it seems very wrong.
It’s already March.
My plan for this year is a beta release sometime this summer and a production release by the end of the year. That is not much time at all. The beta should include every major feature in the production 1.0 release. From the summer beta release to the production release, the focus should be on bug fixing, stability and quality.
So from now until the beta, I will be throwing new features in rapidly.
These features include:
- auto-tuned filter kernels with pre-fetching into local memory (original motivation behind adding integer type support – so the JIT could distinguish constant subscripts in gather operations)
- (pseudo) random number generation
- modularized platform