there are so many cores

Just another site

PeakStream’s Monte Carlo Finance demo working

Chai has been designed to run PeakStream demo code. In fact, language grammar and platform semantics were deduced from the example code in marketing presentations and whitepapers. That’s all I had to go on.

  • Conjugate gradient
  • Monte Carlo PI
  • Kirchoff migration
  • Monte Carlo Finance (with slight modification)

The PeakStream language is cunningly designed. There’s usually a good reason the API is the way it is. So I have tried to follow it very closely.

I just got the options pricing demo working. However, to do this, I had to change the code slightly.

float MonteCarloAntithetic(float price,
                           float strike,
                           float vol,
                           float rate,
                           float div,
                           float T)
    float deltat        = T / N;
    float muDeltat      = (rate - div - 0.5 * vol * vol) * deltat;
    float volSqrtDeltat = vol * sqrt(deltat);
    float meanCPU       = 0.0f;
    Arrayf32 meanSP; // result
    {                // a new scope to hold temporary arrays
        RNGf32 rng_hndl(RNG_DEFAULT);
        Arrayf32 U = zeros_f32(M);
        for (int i = 0; i < N; i++) {
            U += rng_normal_make(rng_hndl, M);
        Arrayf32 values;
            Arrayf32 lnS1 = log(price) + N * muDeltat + volSqrtDeltat * U;
            Arrayf32 lnS2 = log(price) + N * muDeltat + volSqrtDeltat * (-U);
            Arrayf32 S1 = exp(lnS1);
            Arrayf32 S2 = exp(lnS2);
            values = (0.5 * (max(0, S1 - strike) + max(0, S2 - strike))
                          * exp(-rate * T));
        meanSP = mean(values);
    }                // all temporaries released as we exit scope
    meanCPU = meanSP.read_scalar();

    return meanCPU; 

The curly braces in bold italics are original PeakStream code. They cause problems for Chai. There’s something about the random array U going out of scope before readout that affects memory management.

Generated code quality is not good enough either. At present, Chai has a lot of special case code transformations for different kinds of loops. For example, loop rolling and reductions have independent representations. There’s no way to nest a rolled loop (sum of normals in array variable U) inside a reduction loop (implied by the mean() operation) for the demo code above. As a result, the code doesn’t use enough registers and relies on global buffer memory.

Another issue is the need for more PRNG types. At present, there are only two, the Random123 Philox and Threefry generators. I have found that Threefry does not work with the Box-Muller transform. While Threefry does appear uniformly distributed, the random vector components are not independent (enough). The output from Box-Muller is clearly wrong.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: