there are so many cores

Just another WordPress.com site

Chai and Lua have similar roles

Interoperability with OpenCL works. This allows freely mixing Chai managed code with third party OpenCL libraries, even if closed source (e.g. AMD’s OpenCL BLAS). Here’s an example.

// regular OpenCL C API code
void directOpenCL(cl_mem argA, float argS, const size_t indexN)
{
    const char *src[1];
    src[0] = "__kernel void add_scalar(__global float* a, float s)"
             "{"
             "  a[get_global_id(0)] += s;"
             "}";

    cl_int status;

    const cl_program prog = clCreateProgramWithSource(
                                GetContext(), // cl_context from Chai VM
                                1,
                                src,
                                NULL,
                                &status );

    cl_device_id device = GetDevice(); // cl_device_id from Chai VM

    clBuildProgram(prog, 1, &device, "", NULL, NULL);

    const cl_kernel krnl = clCreateKernel(prog, "add_scalar", &status);

    clSetKernelArg(krnl, 0, sizeof(cl_mem), &argA);
    clSetKernelArg(krnl, 1, sizeof(float), &argS);

    size_t global[1], local[1];
    global[0] = indexN;
    local[0] = 1;

    cl_event event;

    clEnqueueNDRangeKernel( GetCommandQueue(), // cl_command_queue from Chai VM
                            krnl,
                            1,
                            NULL,
                            global,
                            local,
                            NumEvents(), // event list size from Chai VM
                            GetEventList(), // cl_event* from Chai VM
                            &event );

    clWaitForEvents(1, &event);

    clReleaseEvent(event);
    clReleaseKernel(krnl);
    clReleaseProgram(prog);
}

// Chai managed code
Arrayf32 C;
{
    Arrayf32 A = make1(N, cpuA);
    Arrayf32 B = make1(N, cpuB);
    C = A + B;

    ForceSchedule(); // schedule to compute device without read back
    directOpenCL(C, 1.5f, N);

    C += A - 3.3f;
}
C.read1(cpuC, N * sizeof(float));

directOpenCL(C, 1.5f, N);

C -= 5;
C.read1(cpuC, N * sizeof(float));

The next and possibly last major feature is OpenGL integration (i.e. buffer sharing for primitive restart). Then I will spend the next three to four months working on documentation and quality (which includes optimizing generated code from the JIT).

It’s become clear to me that Chai is really like Lua – a lightweight embeddable managed language. As Lua is used for dynamic scripting, Chai could be used for game physics and signal processing.

Five years ago, I made this autonomous robot. It used an ARM9 single board computer running Lua as the control loop. The computer vision and motor control was done in C. However, just like a computer game, being able to easily change high level system configuration without recompiling (in this case, cross-compiling) made life much easier.

Here’s another robot from five years ago. It was the winner of the 2007 Robothon in Seattle. It relied on stereo cameras for terminal guidance to the orange cones that marked waypoints. The onboard computer ran interpreted MATLAB.

Today, that onboard computer would be an embedded SoC design capable of GPGPU. Signal processing and control requires crunching numbers on the GPU. Lua makes sense as a dynamic scripting language for the CPU. Chai makes sense as an array programming language for the GPU.

Leave a comment