PRK has a nstream kernel, which is Triad with a += update.
This means there are 3 reads and a write, which is a higher
read/write ratio. In addition, non-temporal stores for the
write on CPUs will not be beneficial, and so compilers should
take care to emit these for the other kernels, but not these.
This allows each model to initialise their arrays with a parallel
approach, which yields the first touch required for good performance
on NUMA architectures.