Got this on GH200, actual value should be ~3TB/s:
BabelStream
Version: 5.0
Implementation: Julia; src/CUDAStream.jl
Running kernels 100 times
Precision: double
Array size: 268.4 MB(=0.3 GB)
Total size: 805.3 MB(=0.8 GB)
Using CUDA device: GH200 120GB (CuDevice(0))
Kernel parameters: <<<32768,1024>>>
Init: 4.85317 s (=165.93425 MBytes/sec)
Read: 1.90727 s (=422.2296 MBytes/sec)
Function MBytes/sec Min (sec) Max Average
Copy 2.979008268e60.00018 0.34157 0.00513
Mul 2.890738861e60.00019 0.16855 0.00352
Add 3.301423655e60.00024 0.15224 0.00315
Triad 3.314891033e60.00024 0.18389 0.00335
Dot 1.837655013e60.00029 1.26299 0.01428
The bandwidth and min column is too close so 2.979008268e6 looked like 2.979008268e60, it shouldn't use scientific notation as well.
Got this on GH200, actual value should be ~3TB/s:
The bandwidth and min column is too close so
2.979008268e6looked like2.979008268e60, it shouldn't use scientific notation as well.