Hello everyone! I have received access to a really fancy cluster of 8 x H100 GPUs from my Uni to test and benchmark for a few days, kindly give me ideas/suggestions of what I should test or what you’d like to see tested! (IDK why Fastfetch shows them as GA103 GPUs lol)
Here is a run of the NVidia HPC Benchmark tool.
nvidia@TRY-60384-gpu01:~/Downloads/benchmark4$ docker run --gpus all --shm-size=1g nvcr.io/nvidia/hpc-benchmarks:25.02 \
mpirun --bind-to none -np 8 \
./hpl.sh --dat /workspace/hpl-linux-x86_64/sample-dat/HPL-8GPUs.dat
=========================================================
================= NVIDIA HPC Benchmarks =================
=========================================================
NVIDIA Release 25.02
Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
WARNING: No InfiniBand devices detected.
Multi-node communication performance may be reduced.
Ensure /dev/infiniband is mounted to this container.
================================================================================
HPL-NVIDIA 25.2.0 -- NVIDIA accelerated HPL benchmark -- NVIDIA
================================================================================
HPLinpack 2.1 -- High-Performance Linpack benchmark -- October 26, 2012
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 264192
NB : 1024
PMAP : Column-major process mapping
P : 4
Q : 2
PFACT : Left
NBMIN : 2
NDIV : 2
RFACT : Left
BCAST : 2ringM
DEPTH : 1
SWAP : Spread-roll (long)
L1 : no-transposed form
U : transposed form
EQUIL : no
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
HPL-NVIDIA ignores the following parameters from input file:
* Broadcast parameters
* Panel factorization parameters
* Look-ahead value
* L1 layout
* U layout
* Equilibration parameter
* Memory alignment parameter
HPL-NVIDIA settings from environment variables:
--- DEVICE INFO ---
Peak clock frequency: 1785 MHz
SM version : 90
Number of SMs : 132
-------------------
[HPL TRACE] cuda_nvshmem_init: max=5.9740 (3) min=5.9740 (6)
[HPL TRACE] ncclCommInitRank: max=12.3365 (0) min=12.3293 (5)
[HPL TRACE] cugetrfs_mp_init: max=12.4442 (5) min=12.4441 (1)
--- MEMORY INFO ---
DEVICE
System = 2.82895 GiB (MIN) 3.06723 GiB (MAX) 2.90610 GiB (AVG)
HPL buffers = 69.67312 GiB (MIN) 70.68094 GiB (MAX) 70.17703 GiB (AVG)
Used = 72.50207 GiB (MIN) 73.67004 GiB (MAX) 73.08313 GiB (AVG)
Total = 93.58398 GiB (MIN) 93.58398 GiB (MAX) 93.58398 GiB (AVG)
HOST
HPL buffers = 0.00057 GiB (MIN) 0.00057 GiB (MAX) 0.00057 GiB (AVG)
-------------------
... Testing HPL components ...
**** Factorization, m = 66560, policy = 0 ****
avg time = 25.97 ms, avg = 2687.80. min = 2676.74 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max = 2698.94 GFLOPS
**** Factorization, m = 66560, policy = 1 ****
avg time = 35.00 ms, avg = 1993.91. min = 1930.29 [rank 0, host 7ea74c28468b, gpuID 0000:2D:00.0], max = 2061.84 GFLOPS
**** Factorization, m = 32768, policy = 0 ****
avg time = 21.19 ms, avg = 1621.14. min = 1616.19 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max = 1626.09 GFLOPS
**** Factorization, m = 32768, policy = 1 ****
avg time = 31.00 ms, avg = 1108.42. min = 1098.93 [rank 0, host 7ea74c28468b, gpuID 0000:2D:00.0], max = 1118.07 GFLOPS
**** Factorization, m = 16384, policy = 0 ****
avg time = 18.88 ms, avg = 909.85. min = 905.14 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max = 914.58 GFLOPS
**** Factorization, m = 16384, policy = 1 ****
avg time = 28.72 ms, avg = 598.21. min = 597.28 [rank 0, host 7ea74c28468b, gpuID 0000:2D:00.0], max = 599.12 GFLOPS
**** Factorization, m = 1024, policy = 0 ****
avg time = 16.49 ms, avg = 65.12. min = 64.68 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max = 65.57 GFLOPS
**** Factorization, m = 1024, policy = 1 ****
avg time = 26.44 ms, avg = 40.61. min = 40.53 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max = 40.68 GFLOPS
**** ncclBcast( Row ) ****
avg time = 65.46 ms, avg = 8.33. min = 7.70 [rank 5, host 7ea74c28468b, gpuID 0000:AE:00.0], max = 10.73 GBS
**** ncclAllGather( Col ) ****
avg time = 4.01 ms, avg = 33.50. min = 24.63 [rank 6, host 7ea74c28468b, gpuID 0000:BD:00.0], max = 52.38 GBS
**** Latency ncclAllGather, m = 1 ****
avg time = 11.56 ms, avg = 0.00. min = 0.00 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max = 0.00 GBS
**** Latency ncclAllGather, m = 2 ****
avg time = 11.48 ms, avg = 0.01. min = 0.01 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max = 0.01 GBS
**** Latency ncclAllGather, m = 32 ****
avg time = 11.70 ms, avg = 0.09. min = 0.09 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max = 0.09 GBS
**** Latency ncclAllGather, m = 1024 ****
avg time = 14.00 ms, avg = 2.40. min = 2.37 [rank 1, host 7ea74c28468b, gpuID 0000:3A:00.0], max = 2.42 GBS
**** Latency ncclAllGather, m = 2048 ****
avg time = 14.58 ms, avg = 4.60. min = 4.59 [rank 0, host 7ea74c28468b, gpuID 0000:2D:00.0], max = 4.61 GBS
**** Latency Host MPI_Allgather, m = 1 ****
avg time = 2.41 ms, avg = 0.01. min = 0.01 [rank 2, host 7ea74c28468b, gpuID 0000:3B:00.0], max = 0.01 GBS
**** Latency Host MPI_Allgather, m = 2 ****
avg time = 2.54 ms, avg = 0.03. min = 0.02 [rank 1, host 7ea74c28468b, gpuID 0000:3A:00.0], max = 0.03 GBS
**** Latency Host MPI_Allgather, m = 32 ****
avg time = 4.41 ms, avg = 0.24. min = 0.23 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max = 0.25 GBS
**** Latency Host MPI_Allgather, m = 1024 ****
avg time = 19.67 ms, avg = 1.71. min = 1.69 [rank 2, host 7ea74c28468b, gpuID 0000:3B:00.0], max = 1.72 GBS
**** Latency Host MPI_Allgather, m = 2048 ****
avg time = 27.27 ms, avg = 2.46. min = 2.38 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max = 2.54 GBS
**** Latency ncclBcast, m = 1 ****
avg time = 7.66 ms, avg = 0.00. min = 0.00 [rank 2, host 7ea74c28468b, gpuID 0000:3B:00.0], max = 0.00 GBS
**** Latency ncclBcast, m = 32 ****
avg time = 7.73 ms, avg = 0.03. min = 0.03 [rank 3, host 7ea74c28468b, gpuID 0000:3C:00.0], max = 0.03 GBS
**** Latency ncclBcast, m = 1024 ****
avg time = 11.93 ms, avg = 0.70. min = 0.70 [rank 3, host 7ea74c28468b, gpuID 0000:3C:00.0], max = 0.71 GBS
**** Latency Host MPI_Bcast, m = 1 ****
avg time = 1.56 ms, avg = 0.01. min = 0.01 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max = 0.01 GBS
**** Latency Host MPI_Bcast, m = 32 ****
avg time = 1.59 ms, avg = 0.16. min = 0.15 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max = 0.18 GBS
**** Latency Host MPI_Bcast, m = 1024 ****
avg time = 4.46 ms, avg = 1.88. min = 1.79 [rank 5, host 7ea74c28468b, gpuID 0000:AE:00.0], max = 1.98 GBS
**** GEMM ****
avg time = 56.65 ms, avg = 39426.61. min = 38903.21 [rank 0, host 7ea74c28468b, gpuID 0000:2D:00.0], max = 39889.91 GFLOPS
... End of Testing HPL components ...
[HPL TRACE] HPL_pdmatgen_gpu: max=0.0216 (4) min=0.0216 (1)
Prog= 2.31% N_left= 262144 Time= 1.08 Time_left= 45.74 iGF= 262561.48 GF= 262561.48 iGF_per= 32820.18 GF_per= 32820.18
Prog= 3.45% N_left= 261120 Time= 1.57 Time_left= 43.93 iGF= 287004.13 GF= 270171.41 iGF_per= 35875.52 GF_per= 33771.43
Prog= 4.58% N_left= 260096 Time= 2.08 Time_left= 43.26 iGF= 274231.60 GF= 271163.35 iGF_per= 34278.95 GF_per= 33895.42
Prog= 6.82% N_left= 258048 Time= 3.04 Time_left= 41.59 iGF= 284659.59 GF= 275448.33 iGF_per= 35582.45 GF_per= 34431.04
Prog= 7.92% N_left= 257024 Time= 3.52 Time_left= 40.87 iGF= 286556.89 GF= 276946.01 iGF_per= 35819.61 GF_per= 34618.25
Prog= 9.02% N_left= 256000 Time= 4.01 Time_left= 40.42 iGF= 274994.27 GF= 276707.26 iGF_per= 34374.28 GF_per= 34588.41
Prog= 11.18% N_left= 253952 Time= 4.95 Time_left= 39.28 iGF= 283494.09 GF= 277996.39 iGF_per= 35436.76 GF_per= 34749.55
Prog= 12.25% N_left= 252928 Time= 5.40 Time_left= 38.70 iGF= 286836.08 GF= 278746.59 iGF_per= 35854.51 GF_per= 34843.32
Prog= 13.31% N_left= 251904 Time= 5.88 Time_left= 38.29 iGF= 272997.45 GF= 278279.40 iGF_per= 34124.68 GF_per= 34784.92
Prog= 14.37% N_left= 250880 Time= 6.33 Time_left= 37.72 iGF= 289894.21 GF= 279098.84 iGF_per= 36236.78 GF_per= 34887.36
Prog= 16.45% N_left= 248832 Time= 7.24 Time_left= 36.77 iGF= 281197.09 GF= 279362.47 iGF_per= 35149.64 GF_per= 34920.31
Prog= 17.47% N_left= 247808 Time= 7.70 Time_left= 36.36 iGF= 273014.58 GF= 278981.14 iGF_per= 34126.82 GF_per= 34872.64
Prog= 18.49% N_left= 246784 Time= 8.14 Time_left= 35.86 iGF= 287792.89 GF= 279452.52 iGF_per= 35974.11 GF_per= 34931.56
Prog= 20.51% N_left= 244736 Time= 9.02 Time_left= 34.97 iGF= 279198.81 GF= 279427.60 iGF_per= 34899.85 GF_per= 34928.45
Prog= 21.50% N_left= 243712 Time= 9.47 Time_left= 34.58 iGF= 272617.97 GF= 279105.39 iGF_per= 34077.25 GF_per= 34888.17
Prog= 22.48% N_left= 242688 Time= 9.89 Time_left= 34.10 iGF= 287530.95 GF= 279464.26 iGF_per= 35941.37 GF_per= 34933.03
Prog= 23.46% N_left= 241664 Time= 10.33 Time_left= 33.69 iGF= 275791.13 GF= 279309.34 iGF_per= 34473.89 GF_per= 34913.67
Prog= 25.39% N_left= 239616 Time= 11.18 Time_left= 32.85 iGF= 278216.21 GF= 279225.97 iGF_per= 34777.03 GF_per= 34903.25
Prog= 26.34% N_left= 238592 Time= 11.59 Time_left= 32.39 iGF= 287384.33 GF= 279512.85 iGF_per= 35923.04 GF_per= 34939.11
Prog= 27.29% N_left= 237568 Time= 12.01 Time_left= 32.00 iGF= 274681.43 GF= 279342.82 iGF_per= 34335.18 GF_per= 34917.85
Prog= 29.15% N_left= 235520 Time= 12.84 Time_left= 31.19 iGF= 277199.42 GF= 279204.76 iGF_per= 34649.93 GF_per= 34900.60
Prog= 30.07% N_left= 234496 Time= 13.23 Time_left= 30.76 iGF= 286869.53 GF= 279433.19 iGF_per= 35858.69 GF_per= 34929.15
Prog= 30.98% N_left= 233472 Time= 13.64 Time_left= 30.38 iGF= 274423.71 GF= 279283.12 iGF_per= 34302.96 GF_per= 34910.39
Prog= 31.89% N_left= 232448 Time= 14.03 Time_left= 29.96 iGF= 284848.21 GF= 279437.90 iGF_per= 35606.03 GF_per= 34929.74
Prog= 33.67% N_left= 230400 Time= 14.82 Time_left= 29.19 iGF= 277893.21 GF= 279355.61 iGF_per= 34736.65 GF_per= 34919.45
Prog= 34.55% N_left= 229376 Time= 15.21 Time_left= 28.82 iGF= 273602.64 GF= 279206.02 iGF_per= 34200.33 GF_per= 34900.75
Prog= 35.43% N_left= 228352 Time= 15.59 Time_left= 28.42 iGF= 283774.68 GF= 279316.79 iGF_per= 35471.84 GF_per= 34914.60
Prog= 37.15% N_left= 226304 Time= 16.35 Time_left= 27.66 iGF= 279011.09 GF= 279302.61 iGF_per= 34876.39 GF_per= 34912.83
Prog= 38.00% N_left= 225280 Time= 16.74 Time_left= 27.31 iGF= 271227.95 GF= 279116.87 iGF_per= 33903.49 GF_per= 34889.61
Prog= 38.84% N_left= 224256 Time= 17.10 Time_left= 26.93 iGF= 283318.20 GF= 279206.59 iGF_per= 35414.77 GF_per= 34900.82
Prog= 39.67% N_left= 223232 Time= 17.48 Time_left= 26.58 iGF= 269655.26 GF= 278998.85 iGF_per= 33706.91 GF_per= 34874.86
Prog= 41.32% N_left= 221184 Time= 18.20 Time_left= 25.84 iGF= 282532.82 GF= 279137.87 iGF_per= 35316.60 GF_per= 34892.23
Prog= 42.13% N_left= 220160 Time= 18.55 Time_left= 25.47 iGF= 285784.49 GF= 279262.94 iGF_per= 35723.06 GF_per= 34907.87
Prog= 42.93% N_left= 219136 Time= 18.91 Time_left= 25.13 iGF= 271904.37 GF= 279121.52 iGF_per= 33988.05 GF_per= 34890.19
Prog= 44.52% N_left= 217088 Time= 19.60 Time_left= 24.43 iGF= 282086.12 GF= 279226.01 iGF_per= 35260.77 GF_per= 34903.25
Prog= 45.30% N_left= 216064 Time= 19.94 Time_left= 24.07 iGF= 285715.98 GF= 279335.46 iGF_per= 35714.50 GF_per= 34916.93
Prog= 46.07% N_left= 215040 Time= 20.29 Time_left= 23.75 iGF= 270660.64 GF= 279185.13 iGF_per= 33832.58 GF_per= 34898.14
Prog= 47.60% N_left= 212992 Time= 20.95 Time_left= 23.07 iGF= 281768.43 GF= 279267.22 iGF_per= 35221.05 GF_per= 34908.40
Prog= 48.35% N_left= 211968 Time= 21.28 Time_left= 22.73 iGF= 285270.59 GF= 279358.67 iGF_per= 35658.82 GF_per= 34919.83
Prog= 49.10% N_left= 210944 Time= 21.61 Time_left= 22.41 iGF= 272509.13 GF= 279252.18 iGF_per= 34063.64 GF_per= 34906.52
Prog= 49.83% N_left= 209920 Time= 21.93 Time_left= 22.07 iGF= 289370.89 GF= 279396.80 iGF_per= 36171.36 GF_per= 34924.60
Prog= 51.29% N_left= 207872 Time= 22.57 Time_left= 21.43 iGF= 278385.35 GF= 279368.03 iGF_per= 34798.17 GF_per= 34921.00
Prog= 52.01% N_left= 206848 Time= 22.89 Time_left= 21.13 iGF= 272721.96 GF= 279274.28 iGF_per= 34090.25 GF_per= 34909.29
Prog= 52.71% N_left= 205824 Time= 23.20 Time_left= 20.81 iGF= 285315.45 GF= 279353.87 iGF_per= 35664.43 GF_per= 34919.23
Prog= 54.11% N_left= 203776 Time= 23.81 Time_left= 20.19 iGF= 281031.27 GF= 279396.94 iGF_per= 35128.91 GF_per= 34924.62
Prog= 54.80% N_left= 202752 Time= 24.12 Time_left= 19.90 iGF= 269905.39 GF= 279273.58 iGF_per= 33738.17 GF_per= 34909.20
Prog= 55.48% N_left= 201728 Time= 24.41 Time_left= 19.59 iGF= 286958.24 GF= 279365.46 iGF_per= 35869.78 GF_per= 34920.68
Prog= 56.16% N_left= 200704 Time= 24.72 Time_left= 19.30 iGF= 274304.85 GF= 279303.57 iGF_per= 34288.11 GF_per= 34912.95
Prog= 57.48% N_left= 198656 Time= 25.30 Time_left= 18.71 iGF= 278325.07 GF= 279280.88 iGF_per= 34790.63 GF_per= 34910.11
Prog= 58.14% N_left= 197632 Time= 25.58 Time_left= 18.42 iGF= 285865.03 GF= 279353.27 iGF_per= 35733.13 GF_per= 34919.16
Prog= 58.79% N_left= 196608 Time= 25.88 Time_left= 18.14 iGF= 273838.41 GF= 279291.33 iGF_per= 34229.80 GF_per= 34911.42
Prog= 60.06% N_left= 194560 Time= 26.46 Time_left= 17.59 iGF= 269280.71 GF= 279071.17 iGF_per= 33660.09 GF_per= 34883.90
Prog= 60.69% N_left= 193536 Time= 26.74 Time_left= 17.32 iGF= 276458.68 GF= 279043.91 iGF_per= 34557.34 GF_per= 34880.49
Prog= 61.31% N_left= 192512 Time= 27.02 Time_left= 17.05 iGF= 271795.89 GF= 278968.59 iGF_per= 33974.49 GF_per= 34871.07
Prog= 61.92% N_left= 191488 Time= 27.29 Time_left= 16.78 iGF= 274192.22 GF= 278920.40 iGF_per= 34274.03 GF_per= 34865.05
Prog= 63.13% N_left= 189440 Time= 27.84 Time_left= 16.26 iGF= 273677.15 GF= 278818.13 iGF_per= 34209.64 GF_per= 34852.27
Prog= 63.73% N_left= 188416 Time= 28.10 Time_left= 16.00 iGF= 273051.70 GF= 278763.20 iGF_per= 34131.46 GF_per= 34845.40
Prog= 64.31% N_left= 187392 Time= 28.37 Time_left= 15.74 iGF= 273693.91 GF= 278715.98 iGF_per= 34211.74 GF_per= 34839.50
Prog= 65.47% N_left= 185344 Time= 28.89 Time_left= 15.24 iGF= 272497.02 GF= 278603.59 iGF_per= 34062.13 GF_per= 34825.45
Prog= 66.04% N_left= 184320 Time= 29.15 Time_left= 14.99 iGF= 272047.93 GF= 278545.75 iGF_per= 34005.99 GF_per= 34818.22
Prog= 66.60% N_left= 183296 Time= 29.40 Time_left= 14.74 iGF= 269150.24 GF= 278463.60 iGF_per= 33643.78 GF_per= 34807.95
Prog= 67.16% N_left= 182272 Time= 29.66 Time_left= 14.50 iGF= 266710.87 GF= 278361.94 iGF_per= 33338.86 GF_per= 34795.24
Prog= 68.25% N_left= 180224 Time= 30.16 Time_left= 14.03 iGF= 269886.52 GF= 278221.83 iGF_per= 33735.82 GF_per= 34777.73
Prog= 68.79% N_left= 179200 Time= 30.40 Time_left= 13.79 iGF= 270189.84 GF= 278157.16 iGF_per= 33773.73 GF_per= 34769.64
Prog= 69.32% N_left= 178176 Time= 30.65 Time_left= 13.56 iGF= 267155.77 GF= 278069.30 iGF_per= 33394.47 GF_per= 34758.66
Prog= 70.37% N_left= 176128 Time= 31.13 Time_left= 13.11 iGF= 266406.93 GF= 277888.53 iGF_per= 33300.87 GF_per= 34736.07
Prog= 70.88% N_left= 175104 Time= 31.37 Time_left= 12.89 iGF= 262025.62 GF= 277766.64 iGF_per= 32753.20 GF_per= 34720.83
Prog= 71.39% N_left= 174080 Time= 31.61 Time_left= 12.67 iGF= 260964.31 GF= 277639.49 iGF_per= 32620.54 GF_per= 34704.94
Prog= 72.39% N_left= 172032 Time= 32.06 Time_left= 12.23 iGF= 270659.32 GF= 277540.82 iGF_per= 33832.41 GF_per= 34692.60
Prog= 72.88% N_left= 171008 Time= 32.29 Time_left= 12.02 iGF= 265777.52 GF= 277458.24 iGF_per= 33222.19 GF_per= 34682.28
Prog= 73.36% N_left= 169984 Time= 32.52 Time_left= 11.81 iGF= 262130.23 GF= 277351.19 iGF_per= 32766.28 GF_per= 34668.90
Prog= 73.84% N_left= 168960 Time= 32.73 Time_left= 11.59 iGF= 276633.57 GF= 277346.52 iGF_per= 34579.20 GF_per= 34668.32
Prog= 74.78% N_left= 166912 Time= 33.16 Time_left= 11.18 iGF= 270679.62 GF= 277260.71 iGF_per= 33834.95 GF_per= 34657.59
Prog= 75.24% N_left= 165888 Time= 33.37 Time_left= 10.98 iGF= 264894.90 GF= 277181.39 iGF_per= 33111.86 GF_per= 34647.67
Prog= 75.70% N_left= 164864 Time= 33.57 Time_left= 10.78 iGF= 275626.44 GF= 277171.98 iGF_per= 34453.31 GF_per= 34646.50
Prog= 76.59% N_left= 162816 Time= 33.99 Time_left= 10.39 iGF= 265218.12 GF= 277026.17 iGF_per= 33152.26 GF_per= 34628.27
Prog= 77.03% N_left= 161792 Time= 34.20 Time_left= 10.20 iGF= 260268.52 GF= 276924.59 iGF_per= 32533.56 GF_per= 34615.57
Prog= 77.47% N_left= 160768 Time= 34.39 Time_left= 10.00 iGF= 278002.77 GF= 276930.60 iGF_per= 34750.35 GF_per= 34616.33
Prog= 77.89% N_left= 159744 Time= 34.59 Time_left= 9.82 iGF= 257433.52 GF= 276815.45 iGF_per= 32179.19 GF_per= 34601.93
Prog= 78.73% N_left= 157696 Time= 34.99 Time_left= 9.45 iGF= 262627.59 GF= 276656.11 iGF_per= 32828.45 GF_per= 34582.01
Prog= 79.14% N_left= 156672 Time= 35.17 Time_left= 9.27 iGF= 276424.53 GF= 276654.90 iGF_per= 34553.07 GF_per= 34581.86
Prog= 79.55% N_left= 155648 Time= 35.37 Time_left= 9.09 iGF= 251189.70 GF= 276511.74 iGF_per= 31398.71 GF_per= 34563.97
Prog= 80.35% N_left= 153600 Time= 35.75 Time_left= 8.74 iGF= 257652.19 GF= 276311.22 iGF_per= 32206.52 GF_per= 34538.90
Prog= 80.74% N_left= 152576 Time= 35.91 Time_left= 8.57 iGF= 289663.26 GF= 276372.82 iGF_per= 36207.91 GF_per= 34546.60
Prog= 81.12% N_left= 151552 Time= 36.10 Time_left= 8.40 iGF= 247043.21 GF= 276217.10 iGF_per= 30880.40 GF_per= 34527.14
Prog= 81.50% N_left= 150528 Time= 36.27 Time_left= 8.23 iGF= 275750.33 GF= 276214.92 iGF_per= 34468.79 GF_per= 34526.86
Prog= 82.25% N_left= 148480 Time= 36.62 Time_left= 7.90 iGF= 264114.90 GF= 276100.38 iGF_per= 33014.36 GF_per= 34512.55
Prog= 82.61% N_left= 147456 Time= 36.80 Time_left= 7.75 iGF= 244652.27 GF= 275943.77 iGF_per= 30581.53 GF_per= 34492.97
Prog= 82.97% N_left= 146432 Time= 36.95 Time_left= 7.58 iGF= 292980.20 GF= 276013.36 iGF_per= 36622.53 GF_per= 34501.67
Prog= 83.68% N_left= 144384 Time= 37.29 Time_left= 7.27 iGF= 259554.89 GF= 275866.08 iGF_per= 32444.36 GF_per= 34483.26
Prog= 84.02% N_left= 143360 Time= 37.47 Time_left= 7.12 iGF= 239990.34 GF= 275696.93 iGF_per= 29998.79 GF_per= 34462.12
Prog= 84.36% N_left= 142336 Time= 37.61 Time_left= 6.97 iGF= 285063.45 GF= 275733.44 iGF_per= 35632.93 GF_per= 34466.68
Prog= 84.70% N_left= 141312 Time= 37.79 Time_left= 6.83 iGF= 233749.29 GF= 275537.64 iGF_per= 29218.66 GF_per= 34442.21
Prog= 85.35% N_left= 139264 Time= 38.10 Time_left= 6.54 iGF= 259374.10 GF= 275405.78 iGF_per= 32421.76 GF_per= 34425.72
Prog= 85.67% N_left= 138240 Time= 38.23 Time_left= 6.39 iGF= 290666.94 GF= 275459.93 iGF_per= 36333.37 GF_per= 34432.49
Prog= 85.99% N_left= 137216 Time= 38.40 Time_left= 6.26 iGF= 228060.58 GF= 275249.69 iGF_per= 28507.57 GF_per= 34406.21
Prog= 86.61% N_left= 135168 Time= 38.70 Time_left= 5.99 iGF= 253507.83 GF= 275081.34 iGF_per= 31688.48 GF_per= 34385.17
Prog= 86.91% N_left= 134144 Time= 38.83 Time_left= 5.85 iGF= 297759.29 GF= 275154.18 iGF_per= 37219.91 GF_per= 34394.27
Prog= 87.21% N_left= 133120 Time= 39.00 Time_left= 5.72 iGF= 214776.21 GF= 274890.55 iGF_per= 26847.03 GF_per= 34361.32
Prog= 88.63% N_left= 128000 Time= 39.67 Time_left= 5.09 iGF= 259173.41 GF= 274623.71 iGF_per= 32396.68 GF_per= 34327.96
Prog= 90.19% N_left= 121856 Time= 40.48 Time_left= 4.40 iGF= 237461.08 GF= 273882.14 iGF_per= 29682.63 GF_per= 34235.27
Prog= 91.37% N_left= 116736 Time= 41.15 Time_left= 3.89 iGF= 217697.08 GF= 272968.00 iGF_per= 27212.13 GF_per= 34121.00
Prog= 92.46% N_left= 111616 Time= 41.75 Time_left= 3.41 iGF= 220994.53 GF= 272216.00 iGF_per= 27624.32 GF_per= 34027.00
Prog= 93.45% N_left= 106496 Time= 42.38 Time_left= 2.97 iGF= 195981.86 GF= 271097.87 iGF_per= 24497.73 GF_per= 33887.23
Prog= 94.35% N_left= 101376 Time= 42.93 Time_left= 2.57 iGF= 200846.20 GF= 270196.33 iGF_per= 25105.78 GF_per= 33774.54
Prog= 95.16% N_left= 96256 Time= 43.48 Time_left= 2.21 iGF= 179380.96 GF= 269031.94 iGF_per= 22422.62 GF_per= 33628.99
Prog= 95.90% N_left= 91136 Time= 43.98 Time_left= 1.88 iGF= 181854.89 GF= 268051.81 iGF_per= 22731.86 GF_per= 33506.48
Prog= 96.67% N_left= 84992 Time= 44.56 Time_left= 1.53 iGF= 164618.60 GF= 266707.47 iGF_per= 20577.32 GF_per= 33338.43
Prog= 97.24% N_left= 79872 Time= 45.03 Time_left= 1.28 iGF= 148945.53 GF= 265485.25 iGF_per= 18618.19 GF_per= 33185.66
Prog= 99.18% N_left= 53248 Time= 46.95 Time_left= 0.39 iGF= 124000.80 GF= 259676.25 iGF_per= 15500.10 GF_per= 32459.53
Prog= 99.89% N_left= 27648 Time= 48.23 Time_left= 0.06 iGF= 67945.90 GF= 254611.47 iGF_per= 8493.24 GF_per= 31826.43
================================================================================
T/V N NB P Q Time Gflops ( per GPU)
--------------------------------------------------------------------------------
WC0 264192 1024 4 2 49.22 2.498e+05 ( 3.122e+04)
HPL_pdgesv() start time Wed Apr 2 19:14:02 2025
HPL_pdgesv() end time Wed Apr 2 19:14:51 2025
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.000195009244 ...... PASSED
||Ax-b||_oo . . . . . . . . . . . . . . . . . = 0.0000000025425467
||A||_oo . . . . . . . . . . . . . . . . . . . = 66378.1881866107869428
||x||_oo . . . . . . . . . . . . . . . . . . . = 6.6966483875822718
||b||_oo . . . . . . . . . . . . . . . . . . . = 0.9741752868709390
================================================================================
Finished 1 tests with the following results:
1 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------
End of Tests.
================================================================================
1 Like
LLM Benchmarks Using Ollama and OpenWebUI
1B parameter model - FP16 (Llama 3.1 - Instruct)
Uses only one GPU for inferencing due to its small size.
Ollama optimises for HWACCEL, thereby defaulting to FP16 compute.
response_token/s: 100
prompt_token/s: 15517.24
total_duration: 9573743183
load_duration: 9520186580
prompt_eval_count: 450
prompt_eval_duration: 29000000
eval_count: 1
completion_tokens: 1
eval_duration: 100000000
approximate_total: "0h0m9s"
total_tokens: 451
completion_tokens_details: {
reasoning_tokens: 0
accepted_prediction_tokens: 0
rejected_prediction_tokens: 0
}
7B Parameter Model
response_token/s: 268.13
prompt_token/s: 3230.77
total_duration: 1757000482
load_duration: 59201876
prompt_eval_count: 42
prompt_tokens: 42
prompt_eval_duration: 13000000
eval_count: 451
completion_tokens: 451
eval_duration: 1682000000
approximate_total: "OhOm1s"
total tokens: 493
completion_tokens_details: {
reasoning_tokens: 0
accepted_prediction_tokens: 0
rejected_prediction_tokens: 0
}
Notably, for a 7x sized model, we only use have 1/4th the performance, demonstrating exceptional scaling across the architectures.
Reasoning Models - Qwen 2.5 QwQ32B
response_token/s: 37.93
prompt_token/s: 481.93
total_duration: 34069477685
load_duration: 54297700
prompt_eval_count: 40
prompt_tokens: 40
prompt_eval_duration: 83000000
eval
_count: 1287
completion_tokens: 1287
eval_duration: 33928000000
approximate_total: "0h0m34s"
total
_tokens: 1327
completion_tokens_details: {
reasoning_tokens: 0
accepted_prediction_tokens: 0
rejected_prediction_tokens: 0
}
Reasoning Models - r1-1776:70b
response_token/s: 23.31
prompt_token/s: 239.73
total_duration: 50218727742
load_duration: 59170660
prompt_eval_count: 35
prompt_tokens: 35
prompt_eval_duration: 146000000
eval_count: 1166
completion_tokens: 1166
eval_duration: 50011000000
approximate_total: "0h0m50s"
total_tokens: 1201
completion_tokens_details: {
reasoning_tokens: 0
accepted_prediction_tokens: 0
rejected_prediction_tokens: 0
}
Reasoning Models - r1-1776:671b
response_token/s: 8.7
prompt_token/s: 50.72
total_duration: 117589459047
load_duration: 48891022
prompt_eval_count: 35
prompt_tokens: 35
prompt_eval_duration: 690000000
eval_count: 1016
completion_tokens: 1016
eval_duration: 116848000000
approximate_total: "Oh1m57s"
total_tokens: 1051
completion_tokens_details: {
reasoning_tokens: 0
accepted_prediction_tokens: 0
rejected_prediction_tokens: 0
}
1 Like