The vector of devices is a global object, which destruction order is
undefined. In some platforms, the OpenCL library has been unloaded
before this destructor is hit, which causes a segmentation fault after
the program ends. By clearing the global vector of devices on
destruction of the OpenCL and SYCL Stream benchmarks we avoid the
problem.
This allows each model to initialise their arrays with a parallel
approach, which yields the first touch required for good performance
on NUMA architectures.
Using integers for maths gets unstable past 38 interations even
in double precision. Using the original values/10 is safe up to
the default 100 iterations.