Commit Graph

19 Commits

Author SHA1 Message Date
Tom Deakin
05e3e5a127 Add CUDA nstream kernel 2021-02-02 12:32:33 +00:00
Tom Deakin
693a7e7478 use signed array size for CUDA 2021-01-12 10:20:44 +00:00
Tom Deakin
3bd65a0716 Merge branch 'master' into cuda-memory 2017-05-11 11:28:33 +01:00
James Price
94e0900377 Use static shared memory in dot for CUDA and HIP 2017-02-28 13:24:45 +00:00
Tom Deakin
8d66a27131 [CUDA] If using managed memory, use device pointer for host reduction 2016-12-19 05:08:19 -07:00
Tom Deakin
62860284b2 [CUDA] Add Managed memory and Page fault options
To use managed memory, compile the code defining MANAGED
To use CUDA 8 page-fault memory, compile the code defining PAGEFAULT
2016-12-19 05:00:15 -07:00
Tom Deakin
b9c514fd9b [CUDA] Free the sum device buffer 2016-12-19 11:42:45 +00:00
Tom Deakin
d42bcd4675 Merge remote-tracking branch 'origin/init-arrays' into devel 2016-11-04 09:17:54 +00:00
James Price
7f4761ae52 Replace write_arrays with init_arrays
This allows each model to initialise their arrays with a parallel
approach, which yields the first touch required for good performance
on NUMA architectures.
2016-11-02 11:22:01 +00:00
James Price
dfc79eeb4d Improve performance of CUDA dot implementation 2016-10-24 21:42:39 +01:00
Tom Deakin
f32cf3bad3 Merge branch 'master' into kernel-dot
Conflicts:
	main.cpp
2016-10-24 13:53:58 +01:00
Tom Deakin
5b1e67f666 [CUDA] Use new value of scalar 2016-10-24 13:19:54 +01:00
James Price
8a8f44b4ce Fix CUDA host code for dot kernel
Wrong number of blocks was being copied and summed.
2016-10-24 12:47:25 +01:00
Tom Deakin
d3b497a9ca Add a CUDA dot kernel 2016-10-14 17:51:40 +01:00
James Price
f94e36f320 [CUDA] Fix device name output (OpenCL->CUDA) 2016-07-06 17:16:35 +01:00
Tom Deakin
31cb567e21 Switch data from 1.0, 2.0 and 3.0 to 0.1, 0.2, and 0.3 resp.
Using integers for maths gets unstable past 38 interations even
in double precision. Using the original values/10 is safe up to
the default 100 iterations.
2016-05-11 15:51:19 +01:00
Tom Deakin
2462023ed9 Set thread block size in CUDA with a #define, and check that array size is multiple of it 2016-05-11 12:21:29 +01:00
Tom Deakin
530b2adda2 Add License text to all files 2016-05-03 12:32:03 +01:00
Tom Deakin
a355acf2ee Move source files to top level directory 2016-05-03 11:43:25 +01:00