Tom Deakin
f5ba77f4bd
List CUDA devices function
2016-04-28 23:20:10 +01:00
Tom Deakin
d1f8cd1b48
Implement some CUDA routines for device info
2016-04-28 23:06:06 +01:00
Tom Deakin
a1cab96c57
Define the implementaiton strings in each implementation header
2016-04-28 17:20:40 +01:00
Tom Deakin
7006871cbe
Get version from CMake configued header and only build implementations which have the runtime around
2016-04-28 17:10:14 +01:00
Tom Deakin
b9e70e11ab
Add CMakeLists.txt file with CUDA and OCL builds
2016-04-28 16:58:32 +01:00
Tom Deakin
088778977b
Add OCL copy functions
2016-04-28 15:11:02 +01:00
Tom Deakin
b514969193
Create OCL device buffers
2016-04-28 15:08:12 +01:00
Tom Deakin
77f6df856c
Call kernels in OCL
2016-04-28 15:05:01 +01:00
Tom Deakin
eeaf9358ab
Create OCL kernel functors
2016-04-28 15:01:43 +01:00
Tom Deakin
38e1e3b704
Add starts of OpenCL implementation
2016-04-28 12:59:14 +01:00
Tom Deakin
a745ffc724
Add more keywords to CUDA header
2016-04-28 12:07:09 +01:00
Tom Deakin
59fe9738b6
Add a templated run function to make double/float switch easy
2016-04-28 12:03:50 +01:00
Tom Deakin
8d88afdedb
Tidy up timing printing to reduce code duplication
2016-04-28 11:57:09 +01:00
Tom Deakin
377b348748
Move implementation string to the common header file
2016-04-28 11:15:25 +01:00
Tom Deakin
daa7f643b9
Print out timing results
2016-04-27 13:18:06 +01:00
Tom Deakin
3d5a49317e
Free CUDA buffers in destructor
2016-04-27 12:11:19 +01:00
Tom Deakin
c28e70ae70
Add timers and run multiple times
2016-04-27 12:08:49 +01:00
Tom Deakin
40c787d040
Check bufers fit on CUDA device
2016-04-27 11:52:15 +01:00
Tom Deakin
9aa27cd91d
Print out average error on check if there is an error
2016-04-27 11:42:23 +01:00
Tom Deakin
6225ae90a7
Add start of check results function
2016-04-27 11:35:12 +01:00
Tom Deakin
6522d9114a
Add new line at end of file
2016-04-27 11:35:04 +01:00
Tom Deakin
9730cd071e
Overridden functions should have more keywords
2016-04-27 11:34:42 +01:00
pensun
a8ebdc1438
change the warning, stating the rounding error on float does not apply to AMD devices
2016-04-26 14:21:52 -05:00
pensun
9989852401
Remove CLUMP_SIZE options; update warning messege regarding round errors on float that does not apply to HIP version
2016-04-26 14:10:32 -05:00
Tom Deakin
9c673317a7
Store array size in class so can use it for kernel launches
2016-04-26 16:09:51 +01:00
Tom Deakin
319e11011c
Add triad kernel
2016-04-26 16:07:32 +01:00
Tom Deakin
7a3a546a6e
Add mul CUDA kernel
2016-04-26 16:06:17 +01:00
Tom Deakin
dec0237353
Add mul kernel
2016-04-26 16:03:28 +01:00
Tom Deakin
c22b74ba47
Add read_arrays definition for CUDA
2016-04-26 15:30:37 +01:00
Tom Deakin
8e534daf8b
Add methods to copy data between host and device
2016-04-26 15:02:41 +01:00
Tom Deakin
ae679a5775
Fix indentation in Stream.h
2016-04-26 14:50:58 +01:00
Tom Deakin
ee4820b5e4
Create CUDA device buffers
2016-04-26 14:50:22 +01:00
Tom Deakin
03b01e190f
Add cuda constructor declaration and error checking function
2016-04-26 14:49:04 +01:00
Tom Deakin
6169bdb7b5
Add some global variables
2016-04-26 14:40:49 +01:00
Tom Deakin
0bf68f9909
Make a copy kernel using the private variables
2016-04-26 14:34:25 +01:00
Tom Deakin
1a259d4fc8
Add a copy kernel
2016-04-26 14:24:04 +01:00
Tom Deakin
2234841b16
Initial commit of new design with classes
2016-04-26 14:08:59 +01:00
pensun
066f667e4a
Merge branch 'pull-request-HIP' of https://github.com/sunway513/GPU-STREAM into pull-request-HIP
2016-04-03 06:53:34 -05:00
pensun
e16123222d
Add results of HIP on Nvidia Titan X device.
2016-04-03 06:52:31 -05:00
pensun
ef48e0448a
Add result of hip on amd FIJI Nano.
2016-04-03 06:51:51 -05:00
pensun
d73917ec85
Add cuda results for titan x device.
2016-04-03 06:50:53 -05:00
pensun
207701219a
Add looper optimization for cuda-stream.cu, remove result files
2016-04-03 06:49:56 -05:00
pensun
8e9ab4d20a
Submit results for NV Titan X with CUDA, AMD FIJI NANO and NV Titan X with HIP
2016-03-23 05:29:10 -05:00
pensun
89fec9c8d2
Remove results submission for seperate commits
2016-03-23 05:26:34 -05:00
sunway513
11053798ff
Improved GPU-STREAM benchmark for HIP version:
...
1. Add optional looper kernels to take command line input for the number of groups and groupSize
2. Add GEOMEAN value calculation of the kernels
3. Instructions on configure HIP environment in the README.md
4. Add results for HIP on FIJI Nano, TITAN X; CUDA on TITAN X
5. Run script to optionally run HIP version with groups and groupSize options
2016-03-15 07:56:32 -05:00
Tom Deakin
bbee439985
Add citation information to README
2016-03-15 09:17:46 +00:00
sunway513
fdeb20601f
Pull request for HIP version
2016-03-14 11:44:30 -05:00
Tom Deakin
71d5813484
Update to latest OpenCL C++ header from Khronos
2016-02-25 20:50:27 +00:00
Tom Deakin
b575332b4c
Specify CUDA needs to be 6.5 or greater in README
2015-10-20 16:29:21 +01:00
Tom Deakin
70330c7b9b
Display CUDA driver version in output header
...
This mimics the OpenCL change in issue #4 .
2015-09-24 12:03:44 +01:00