Have a fancy H100 Cluster for 2-3 days, give me ideas to test

Hello everyone! I have received access to a really fancy cluster of 8 x H100 GPUs from my Uni to test and benchmark for a few days, kindly give me ideas/suggestions of what I should test or what you’d like to see tested! (IDK why Fastfetch shows them as GA103 GPUs lol)

Here is a run of the NVidia HPC Benchmark tool.

nvidia@TRY-60384-gpu01:~/Downloads/benchmark4$ docker run --gpus all --shm-size=1g nvcr.io/nvidia/hpc-benchmarks:25.02 \
     mpirun --bind-to none -np 8 \
     ./hpl.sh --dat /workspace/hpl-linux-x86_64/sample-dat/HPL-8GPUs.dat

=========================================================
================= NVIDIA HPC Benchmarks =================
=========================================================
NVIDIA Release 25.02
Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: No InfiniBand devices detected.
         Multi-node communication performance may be reduced.
         Ensure /dev/infiniband is mounted to this container.


================================================================================
HPL-NVIDIA 25.2.0  -- NVIDIA accelerated HPL benchmark -- NVIDIA
================================================================================
HPLinpack 2.1  --  High-Performance Linpack benchmark  --   October 26, 2012
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :  264192 
NB     :    1024 
PMAP   : Column-major process mapping
P      :       4 
Q      :       2 
PFACT  :    Left 
NBMIN  :       2 
NDIV   :       2 
RFACT  :    Left 
BCAST  :  2ringM 
DEPTH  :       1 
SWAP   : Spread-roll (long)
L1     : no-transposed form
U      : transposed form
EQUIL  : no
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0


HPL-NVIDIA ignores the following parameters from input file:
        * Broadcast parameters
        * Panel factorization parameters
        * Look-ahead value
        * L1 layout
        * U layout
        * Equilibration parameter
        * Memory alignment parameter

HPL-NVIDIA settings from environment variables:
--- DEVICE INFO ---
  Peak clock frequency: 1785 MHz
  SM version          : 90
  Number of SMs       : 132
-------------------
[HPL TRACE] cuda_nvshmem_init: max=5.9740 (3) min=5.9740 (6)
[HPL TRACE] ncclCommInitRank: max=12.3365 (0) min=12.3293 (5)
[HPL TRACE] cugetrfs_mp_init: max=12.4442 (5) min=12.4441 (1)
--- MEMORY INFO ---
DEVICE
  System           =      2.82895 GiB (MIN)      3.06723 GiB (MAX)      2.90610 GiB (AVG)
  HPL buffers      =     69.67312 GiB (MIN)     70.68094 GiB (MAX)     70.17703 GiB (AVG)
  Used             =     72.50207 GiB (MIN)     73.67004 GiB (MAX)     73.08313 GiB (AVG)
  Total            =     93.58398 GiB (MIN)     93.58398 GiB (MAX)     93.58398 GiB (AVG)
HOST
  HPL buffers      =      0.00057 GiB (MIN)      0.00057 GiB (MAX)      0.00057 GiB (AVG)
-------------------

 ... Testing HPL components ... 

 **** Factorization, m = 66560, policy = 0 **** 
avg time =    25.97 ms, avg =  2687.80. min =  2676.74 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max =  2698.94 GFLOPS

 **** Factorization, m = 66560, policy = 1 **** 
avg time =    35.00 ms, avg =  1993.91. min =  1930.29 [rank 0, host 7ea74c28468b, gpuID 0000:2D:00.0], max =  2061.84 GFLOPS

 **** Factorization, m = 32768, policy = 0 **** 
avg time =    21.19 ms, avg =  1621.14. min =  1616.19 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max =  1626.09 GFLOPS

 **** Factorization, m = 32768, policy = 1 **** 
avg time =    31.00 ms, avg =  1108.42. min =  1098.93 [rank 0, host 7ea74c28468b, gpuID 0000:2D:00.0], max =  1118.07 GFLOPS

 **** Factorization, m = 16384, policy = 0 **** 
avg time =    18.88 ms, avg =   909.85. min =   905.14 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max =   914.58 GFLOPS

 **** Factorization, m = 16384, policy = 1 **** 
avg time =    28.72 ms, avg =   598.21. min =   597.28 [rank 0, host 7ea74c28468b, gpuID 0000:2D:00.0], max =   599.12 GFLOPS

 **** Factorization, m = 1024, policy = 0 **** 
avg time =    16.49 ms, avg =    65.12. min =    64.68 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max =    65.57 GFLOPS

 **** Factorization, m = 1024, policy = 1 **** 
avg time =    26.44 ms, avg =    40.61. min =    40.53 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max =    40.68 GFLOPS

 **** ncclBcast( Row ) **** 
avg time =    65.46 ms, avg =     8.33. min =     7.70 [rank 5, host 7ea74c28468b, gpuID 0000:AE:00.0], max =    10.73 GBS

 **** ncclAllGather( Col ) **** 
avg time =     4.01 ms, avg =    33.50. min =    24.63 [rank 6, host 7ea74c28468b, gpuID 0000:BD:00.0], max =    52.38 GBS

 **** Latency ncclAllGather, m = 1 **** 
avg time =    11.56 ms, avg =     0.00. min =     0.00 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max =     0.00 GBS

 **** Latency ncclAllGather, m = 2 **** 
avg time =    11.48 ms, avg =     0.01. min =     0.01 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max =     0.01 GBS

 **** Latency ncclAllGather, m = 32 **** 
avg time =    11.70 ms, avg =     0.09. min =     0.09 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max =     0.09 GBS

 **** Latency ncclAllGather, m = 1024 **** 
avg time =    14.00 ms, avg =     2.40. min =     2.37 [rank 1, host 7ea74c28468b, gpuID 0000:3A:00.0], max =     2.42 GBS

 **** Latency ncclAllGather, m = 2048 **** 
avg time =    14.58 ms, avg =     4.60. min =     4.59 [rank 0, host 7ea74c28468b, gpuID 0000:2D:00.0], max =     4.61 GBS

 **** Latency Host MPI_Allgather, m = 1 **** 
avg time =     2.41 ms, avg =     0.01. min =     0.01 [rank 2, host 7ea74c28468b, gpuID 0000:3B:00.0], max =     0.01 GBS

 **** Latency Host MPI_Allgather, m = 2 **** 
avg time =     2.54 ms, avg =     0.03. min =     0.02 [rank 1, host 7ea74c28468b, gpuID 0000:3A:00.0], max =     0.03 GBS

 **** Latency Host MPI_Allgather, m = 32 **** 
avg time =     4.41 ms, avg =     0.24. min =     0.23 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max =     0.25 GBS

 **** Latency Host MPI_Allgather, m = 1024 **** 
avg time =    19.67 ms, avg =     1.71. min =     1.69 [rank 2, host 7ea74c28468b, gpuID 0000:3B:00.0], max =     1.72 GBS

 **** Latency Host MPI_Allgather, m = 2048 **** 
avg time =    27.27 ms, avg =     2.46. min =     2.38 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max =     2.54 GBS

 **** Latency ncclBcast, m = 1 **** 
avg time =     7.66 ms, avg =     0.00. min =     0.00 [rank 2, host 7ea74c28468b, gpuID 0000:3B:00.0], max =     0.00 GBS

 **** Latency ncclBcast, m = 32 **** 
avg time =     7.73 ms, avg =     0.03. min =     0.03 [rank 3, host 7ea74c28468b, gpuID 0000:3C:00.0], max =     0.03 GBS

 **** Latency ncclBcast, m = 1024 **** 
avg time =    11.93 ms, avg =     0.70. min =     0.70 [rank 3, host 7ea74c28468b, gpuID 0000:3C:00.0], max =     0.71 GBS

 **** Latency Host MPI_Bcast, m = 1 **** 
avg time =     1.56 ms, avg =     0.01. min =     0.01 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max =     0.01 GBS

 **** Latency Host MPI_Bcast, m = 32 **** 
avg time =     1.59 ms, avg =     0.16. min =     0.15 [rank 4, host 7ea74c28468b, gpuID 0000:AD:00.0], max =     0.18 GBS

 **** Latency Host MPI_Bcast, m = 1024 **** 
avg time =     4.46 ms, avg =     1.88. min =     1.79 [rank 5, host 7ea74c28468b, gpuID 0000:AE:00.0], max =     1.98 GBS

 **** GEMM **** 
avg time =    56.65 ms, avg = 39426.61. min = 38903.21 [rank 0, host 7ea74c28468b, gpuID 0000:2D:00.0], max = 39889.91 GFLOPS

 ... End of Testing HPL components ... 

[HPL TRACE] HPL_pdmatgen_gpu: max=0.0216 (4) min=0.0216 (1)
 Prog= 2.31%    N_left=   262144        Time=   1.08    Time_left=  45.74       iGF= 262561.48  GF= 262561.48   iGF_per= 32820.18       GF_per= 32820.18
 Prog= 3.45%    N_left=   261120        Time=   1.57    Time_left=  43.93       iGF= 287004.13  GF= 270171.41   iGF_per= 35875.52       GF_per= 33771.43
 Prog= 4.58%    N_left=   260096        Time=   2.08    Time_left=  43.26       iGF= 274231.60  GF= 271163.35   iGF_per= 34278.95       GF_per= 33895.42
 Prog= 6.82%    N_left=   258048        Time=   3.04    Time_left=  41.59       iGF= 284659.59  GF= 275448.33   iGF_per= 35582.45       GF_per= 34431.04
 Prog= 7.92%    N_left=   257024        Time=   3.52    Time_left=  40.87       iGF= 286556.89  GF= 276946.01   iGF_per= 35819.61       GF_per= 34618.25
 Prog= 9.02%    N_left=   256000        Time=   4.01    Time_left=  40.42       iGF= 274994.27  GF= 276707.26   iGF_per= 34374.28       GF_per= 34588.41
 Prog= 11.18%   N_left=   253952        Time=   4.95    Time_left=  39.28       iGF= 283494.09  GF= 277996.39   iGF_per= 35436.76       GF_per= 34749.55
 Prog= 12.25%   N_left=   252928        Time=   5.40    Time_left=  38.70       iGF= 286836.08  GF= 278746.59   iGF_per= 35854.51       GF_per= 34843.32
 Prog= 13.31%   N_left=   251904        Time=   5.88    Time_left=  38.29       iGF= 272997.45  GF= 278279.40   iGF_per= 34124.68       GF_per= 34784.92
 Prog= 14.37%   N_left=   250880        Time=   6.33    Time_left=  37.72       iGF= 289894.21  GF= 279098.84   iGF_per= 36236.78       GF_per= 34887.36
 Prog= 16.45%   N_left=   248832        Time=   7.24    Time_left=  36.77       iGF= 281197.09  GF= 279362.47   iGF_per= 35149.64       GF_per= 34920.31
 Prog= 17.47%   N_left=   247808        Time=   7.70    Time_left=  36.36       iGF= 273014.58  GF= 278981.14   iGF_per= 34126.82       GF_per= 34872.64
 Prog= 18.49%   N_left=   246784        Time=   8.14    Time_left=  35.86       iGF= 287792.89  GF= 279452.52   iGF_per= 35974.11       GF_per= 34931.56
 Prog= 20.51%   N_left=   244736        Time=   9.02    Time_left=  34.97       iGF= 279198.81  GF= 279427.60   iGF_per= 34899.85       GF_per= 34928.45
 Prog= 21.50%   N_left=   243712        Time=   9.47    Time_left=  34.58       iGF= 272617.97  GF= 279105.39   iGF_per= 34077.25       GF_per= 34888.17
 Prog= 22.48%   N_left=   242688        Time=   9.89    Time_left=  34.10       iGF= 287530.95  GF= 279464.26   iGF_per= 35941.37       GF_per= 34933.03
 Prog= 23.46%   N_left=   241664        Time=  10.33    Time_left=  33.69       iGF= 275791.13  GF= 279309.34   iGF_per= 34473.89       GF_per= 34913.67
 Prog= 25.39%   N_left=   239616        Time=  11.18    Time_left=  32.85       iGF= 278216.21  GF= 279225.97   iGF_per= 34777.03       GF_per= 34903.25
 Prog= 26.34%   N_left=   238592        Time=  11.59    Time_left=  32.39       iGF= 287384.33  GF= 279512.85   iGF_per= 35923.04       GF_per= 34939.11
 Prog= 27.29%   N_left=   237568        Time=  12.01    Time_left=  32.00       iGF= 274681.43  GF= 279342.82   iGF_per= 34335.18       GF_per= 34917.85
 Prog= 29.15%   N_left=   235520        Time=  12.84    Time_left=  31.19       iGF= 277199.42  GF= 279204.76   iGF_per= 34649.93       GF_per= 34900.60
 Prog= 30.07%   N_left=   234496        Time=  13.23    Time_left=  30.76       iGF= 286869.53  GF= 279433.19   iGF_per= 35858.69       GF_per= 34929.15
 Prog= 30.98%   N_left=   233472        Time=  13.64    Time_left=  30.38       iGF= 274423.71  GF= 279283.12   iGF_per= 34302.96       GF_per= 34910.39
 Prog= 31.89%   N_left=   232448        Time=  14.03    Time_left=  29.96       iGF= 284848.21  GF= 279437.90   iGF_per= 35606.03       GF_per= 34929.74
 Prog= 33.67%   N_left=   230400        Time=  14.82    Time_left=  29.19       iGF= 277893.21  GF= 279355.61   iGF_per= 34736.65       GF_per= 34919.45
 Prog= 34.55%   N_left=   229376        Time=  15.21    Time_left=  28.82       iGF= 273602.64  GF= 279206.02   iGF_per= 34200.33       GF_per= 34900.75
 Prog= 35.43%   N_left=   228352        Time=  15.59    Time_left=  28.42       iGF= 283774.68  GF= 279316.79   iGF_per= 35471.84       GF_per= 34914.60
 Prog= 37.15%   N_left=   226304        Time=  16.35    Time_left=  27.66       iGF= 279011.09  GF= 279302.61   iGF_per= 34876.39       GF_per= 34912.83
 Prog= 38.00%   N_left=   225280        Time=  16.74    Time_left=  27.31       iGF= 271227.95  GF= 279116.87   iGF_per= 33903.49       GF_per= 34889.61
 Prog= 38.84%   N_left=   224256        Time=  17.10    Time_left=  26.93       iGF= 283318.20  GF= 279206.59   iGF_per= 35414.77       GF_per= 34900.82
 Prog= 39.67%   N_left=   223232        Time=  17.48    Time_left=  26.58       iGF= 269655.26  GF= 278998.85   iGF_per= 33706.91       GF_per= 34874.86
 Prog= 41.32%   N_left=   221184        Time=  18.20    Time_left=  25.84       iGF= 282532.82  GF= 279137.87   iGF_per= 35316.60       GF_per= 34892.23
 Prog= 42.13%   N_left=   220160        Time=  18.55    Time_left=  25.47       iGF= 285784.49  GF= 279262.94   iGF_per= 35723.06       GF_per= 34907.87
 Prog= 42.93%   N_left=   219136        Time=  18.91    Time_left=  25.13       iGF= 271904.37  GF= 279121.52   iGF_per= 33988.05       GF_per= 34890.19
 Prog= 44.52%   N_left=   217088        Time=  19.60    Time_left=  24.43       iGF= 282086.12  GF= 279226.01   iGF_per= 35260.77       GF_per= 34903.25
 Prog= 45.30%   N_left=   216064        Time=  19.94    Time_left=  24.07       iGF= 285715.98  GF= 279335.46   iGF_per= 35714.50       GF_per= 34916.93
 Prog= 46.07%   N_left=   215040        Time=  20.29    Time_left=  23.75       iGF= 270660.64  GF= 279185.13   iGF_per= 33832.58       GF_per= 34898.14
 Prog= 47.60%   N_left=   212992        Time=  20.95    Time_left=  23.07       iGF= 281768.43  GF= 279267.22   iGF_per= 35221.05       GF_per= 34908.40
 Prog= 48.35%   N_left=   211968        Time=  21.28    Time_left=  22.73       iGF= 285270.59  GF= 279358.67   iGF_per= 35658.82       GF_per= 34919.83
 Prog= 49.10%   N_left=   210944        Time=  21.61    Time_left=  22.41       iGF= 272509.13  GF= 279252.18   iGF_per= 34063.64       GF_per= 34906.52
 Prog= 49.83%   N_left=   209920        Time=  21.93    Time_left=  22.07       iGF= 289370.89  GF= 279396.80   iGF_per= 36171.36       GF_per= 34924.60
 Prog= 51.29%   N_left=   207872        Time=  22.57    Time_left=  21.43       iGF= 278385.35  GF= 279368.03   iGF_per= 34798.17       GF_per= 34921.00
 Prog= 52.01%   N_left=   206848        Time=  22.89    Time_left=  21.13       iGF= 272721.96  GF= 279274.28   iGF_per= 34090.25       GF_per= 34909.29
 Prog= 52.71%   N_left=   205824        Time=  23.20    Time_left=  20.81       iGF= 285315.45  GF= 279353.87   iGF_per= 35664.43       GF_per= 34919.23
 Prog= 54.11%   N_left=   203776        Time=  23.81    Time_left=  20.19       iGF= 281031.27  GF= 279396.94   iGF_per= 35128.91       GF_per= 34924.62
 Prog= 54.80%   N_left=   202752        Time=  24.12    Time_left=  19.90       iGF= 269905.39  GF= 279273.58   iGF_per= 33738.17       GF_per= 34909.20
 Prog= 55.48%   N_left=   201728        Time=  24.41    Time_left=  19.59       iGF= 286958.24  GF= 279365.46   iGF_per= 35869.78       GF_per= 34920.68
 Prog= 56.16%   N_left=   200704        Time=  24.72    Time_left=  19.30       iGF= 274304.85  GF= 279303.57   iGF_per= 34288.11       GF_per= 34912.95
 Prog= 57.48%   N_left=   198656        Time=  25.30    Time_left=  18.71       iGF= 278325.07  GF= 279280.88   iGF_per= 34790.63       GF_per= 34910.11
 Prog= 58.14%   N_left=   197632        Time=  25.58    Time_left=  18.42       iGF= 285865.03  GF= 279353.27   iGF_per= 35733.13       GF_per= 34919.16
 Prog= 58.79%   N_left=   196608        Time=  25.88    Time_left=  18.14       iGF= 273838.41  GF= 279291.33   iGF_per= 34229.80       GF_per= 34911.42
 Prog= 60.06%   N_left=   194560        Time=  26.46    Time_left=  17.59       iGF= 269280.71  GF= 279071.17   iGF_per= 33660.09       GF_per= 34883.90
 Prog= 60.69%   N_left=   193536        Time=  26.74    Time_left=  17.32       iGF= 276458.68  GF= 279043.91   iGF_per= 34557.34       GF_per= 34880.49
 Prog= 61.31%   N_left=   192512        Time=  27.02    Time_left=  17.05       iGF= 271795.89  GF= 278968.59   iGF_per= 33974.49       GF_per= 34871.07
 Prog= 61.92%   N_left=   191488        Time=  27.29    Time_left=  16.78       iGF= 274192.22  GF= 278920.40   iGF_per= 34274.03       GF_per= 34865.05
 Prog= 63.13%   N_left=   189440        Time=  27.84    Time_left=  16.26       iGF= 273677.15  GF= 278818.13   iGF_per= 34209.64       GF_per= 34852.27
 Prog= 63.73%   N_left=   188416        Time=  28.10    Time_left=  16.00       iGF= 273051.70  GF= 278763.20   iGF_per= 34131.46       GF_per= 34845.40
 Prog= 64.31%   N_left=   187392        Time=  28.37    Time_left=  15.74       iGF= 273693.91  GF= 278715.98   iGF_per= 34211.74       GF_per= 34839.50
 Prog= 65.47%   N_left=   185344        Time=  28.89    Time_left=  15.24       iGF= 272497.02  GF= 278603.59   iGF_per= 34062.13       GF_per= 34825.45
 Prog= 66.04%   N_left=   184320        Time=  29.15    Time_left=  14.99       iGF= 272047.93  GF= 278545.75   iGF_per= 34005.99       GF_per= 34818.22
 Prog= 66.60%   N_left=   183296        Time=  29.40    Time_left=  14.74       iGF= 269150.24  GF= 278463.60   iGF_per= 33643.78       GF_per= 34807.95
 Prog= 67.16%   N_left=   182272        Time=  29.66    Time_left=  14.50       iGF= 266710.87  GF= 278361.94   iGF_per= 33338.86       GF_per= 34795.24
 Prog= 68.25%   N_left=   180224        Time=  30.16    Time_left=  14.03       iGF= 269886.52  GF= 278221.83   iGF_per= 33735.82       GF_per= 34777.73
 Prog= 68.79%   N_left=   179200        Time=  30.40    Time_left=  13.79       iGF= 270189.84  GF= 278157.16   iGF_per= 33773.73       GF_per= 34769.64
 Prog= 69.32%   N_left=   178176        Time=  30.65    Time_left=  13.56       iGF= 267155.77  GF= 278069.30   iGF_per= 33394.47       GF_per= 34758.66
 Prog= 70.37%   N_left=   176128        Time=  31.13    Time_left=  13.11       iGF= 266406.93  GF= 277888.53   iGF_per= 33300.87       GF_per= 34736.07
 Prog= 70.88%   N_left=   175104        Time=  31.37    Time_left=  12.89       iGF= 262025.62  GF= 277766.64   iGF_per= 32753.20       GF_per= 34720.83
 Prog= 71.39%   N_left=   174080        Time=  31.61    Time_left=  12.67       iGF= 260964.31  GF= 277639.49   iGF_per= 32620.54       GF_per= 34704.94
 Prog= 72.39%   N_left=   172032        Time=  32.06    Time_left=  12.23       iGF= 270659.32  GF= 277540.82   iGF_per= 33832.41       GF_per= 34692.60
 Prog= 72.88%   N_left=   171008        Time=  32.29    Time_left=  12.02       iGF= 265777.52  GF= 277458.24   iGF_per= 33222.19       GF_per= 34682.28
 Prog= 73.36%   N_left=   169984        Time=  32.52    Time_left=  11.81       iGF= 262130.23  GF= 277351.19   iGF_per= 32766.28       GF_per= 34668.90
 Prog= 73.84%   N_left=   168960        Time=  32.73    Time_left=  11.59       iGF= 276633.57  GF= 277346.52   iGF_per= 34579.20       GF_per= 34668.32
 Prog= 74.78%   N_left=   166912        Time=  33.16    Time_left=  11.18       iGF= 270679.62  GF= 277260.71   iGF_per= 33834.95       GF_per= 34657.59
 Prog= 75.24%   N_left=   165888        Time=  33.37    Time_left=  10.98       iGF= 264894.90  GF= 277181.39   iGF_per= 33111.86       GF_per= 34647.67
 Prog= 75.70%   N_left=   164864        Time=  33.57    Time_left=  10.78       iGF= 275626.44  GF= 277171.98   iGF_per= 34453.31       GF_per= 34646.50
 Prog= 76.59%   N_left=   162816        Time=  33.99    Time_left=  10.39       iGF= 265218.12  GF= 277026.17   iGF_per= 33152.26       GF_per= 34628.27
 Prog= 77.03%   N_left=   161792        Time=  34.20    Time_left=  10.20       iGF= 260268.52  GF= 276924.59   iGF_per= 32533.56       GF_per= 34615.57
 Prog= 77.47%   N_left=   160768        Time=  34.39    Time_left=  10.00       iGF= 278002.77  GF= 276930.60   iGF_per= 34750.35       GF_per= 34616.33
 Prog= 77.89%   N_left=   159744        Time=  34.59    Time_left=   9.82       iGF= 257433.52  GF= 276815.45   iGF_per= 32179.19       GF_per= 34601.93
 Prog= 78.73%   N_left=   157696        Time=  34.99    Time_left=   9.45       iGF= 262627.59  GF= 276656.11   iGF_per= 32828.45       GF_per= 34582.01
 Prog= 79.14%   N_left=   156672        Time=  35.17    Time_left=   9.27       iGF= 276424.53  GF= 276654.90   iGF_per= 34553.07       GF_per= 34581.86
 Prog= 79.55%   N_left=   155648        Time=  35.37    Time_left=   9.09       iGF= 251189.70  GF= 276511.74   iGF_per= 31398.71       GF_per= 34563.97
 Prog= 80.35%   N_left=   153600        Time=  35.75    Time_left=   8.74       iGF= 257652.19  GF= 276311.22   iGF_per= 32206.52       GF_per= 34538.90
 Prog= 80.74%   N_left=   152576        Time=  35.91    Time_left=   8.57       iGF= 289663.26  GF= 276372.82   iGF_per= 36207.91       GF_per= 34546.60
 Prog= 81.12%   N_left=   151552        Time=  36.10    Time_left=   8.40       iGF= 247043.21  GF= 276217.10   iGF_per= 30880.40       GF_per= 34527.14
 Prog= 81.50%   N_left=   150528        Time=  36.27    Time_left=   8.23       iGF= 275750.33  GF= 276214.92   iGF_per= 34468.79       GF_per= 34526.86
 Prog= 82.25%   N_left=   148480        Time=  36.62    Time_left=   7.90       iGF= 264114.90  GF= 276100.38   iGF_per= 33014.36       GF_per= 34512.55
 Prog= 82.61%   N_left=   147456        Time=  36.80    Time_left=   7.75       iGF= 244652.27  GF= 275943.77   iGF_per= 30581.53       GF_per= 34492.97
 Prog= 82.97%   N_left=   146432        Time=  36.95    Time_left=   7.58       iGF= 292980.20  GF= 276013.36   iGF_per= 36622.53       GF_per= 34501.67
 Prog= 83.68%   N_left=   144384        Time=  37.29    Time_left=   7.27       iGF= 259554.89  GF= 275866.08   iGF_per= 32444.36       GF_per= 34483.26
 Prog= 84.02%   N_left=   143360        Time=  37.47    Time_left=   7.12       iGF= 239990.34  GF= 275696.93   iGF_per= 29998.79       GF_per= 34462.12
 Prog= 84.36%   N_left=   142336        Time=  37.61    Time_left=   6.97       iGF= 285063.45  GF= 275733.44   iGF_per= 35632.93       GF_per= 34466.68
 Prog= 84.70%   N_left=   141312        Time=  37.79    Time_left=   6.83       iGF= 233749.29  GF= 275537.64   iGF_per= 29218.66       GF_per= 34442.21
 Prog= 85.35%   N_left=   139264        Time=  38.10    Time_left=   6.54       iGF= 259374.10  GF= 275405.78   iGF_per= 32421.76       GF_per= 34425.72
 Prog= 85.67%   N_left=   138240        Time=  38.23    Time_left=   6.39       iGF= 290666.94  GF= 275459.93   iGF_per= 36333.37       GF_per= 34432.49
 Prog= 85.99%   N_left=   137216        Time=  38.40    Time_left=   6.26       iGF= 228060.58  GF= 275249.69   iGF_per= 28507.57       GF_per= 34406.21
 Prog= 86.61%   N_left=   135168        Time=  38.70    Time_left=   5.99       iGF= 253507.83  GF= 275081.34   iGF_per= 31688.48       GF_per= 34385.17
 Prog= 86.91%   N_left=   134144        Time=  38.83    Time_left=   5.85       iGF= 297759.29  GF= 275154.18   iGF_per= 37219.91       GF_per= 34394.27
 Prog= 87.21%   N_left=   133120        Time=  39.00    Time_left=   5.72       iGF= 214776.21  GF= 274890.55   iGF_per= 26847.03       GF_per= 34361.32
 Prog= 88.63%   N_left=   128000        Time=  39.67    Time_left=   5.09       iGF= 259173.41  GF= 274623.71   iGF_per= 32396.68       GF_per= 34327.96
 Prog= 90.19%   N_left=   121856        Time=  40.48    Time_left=   4.40       iGF= 237461.08  GF= 273882.14   iGF_per= 29682.63       GF_per= 34235.27
 Prog= 91.37%   N_left=   116736        Time=  41.15    Time_left=   3.89       iGF= 217697.08  GF= 272968.00   iGF_per= 27212.13       GF_per= 34121.00
 Prog= 92.46%   N_left=   111616        Time=  41.75    Time_left=   3.41       iGF= 220994.53  GF= 272216.00   iGF_per= 27624.32       GF_per= 34027.00
 Prog= 93.45%   N_left=   106496        Time=  42.38    Time_left=   2.97       iGF= 195981.86  GF= 271097.87   iGF_per= 24497.73       GF_per= 33887.23
 Prog= 94.35%   N_left=   101376        Time=  42.93    Time_left=   2.57       iGF= 200846.20  GF= 270196.33   iGF_per= 25105.78       GF_per= 33774.54
 Prog= 95.16%   N_left=    96256        Time=  43.48    Time_left=   2.21       iGF= 179380.96  GF= 269031.94   iGF_per= 22422.62       GF_per= 33628.99
 Prog= 95.90%   N_left=    91136        Time=  43.98    Time_left=   1.88       iGF= 181854.89  GF= 268051.81   iGF_per= 22731.86       GF_per= 33506.48
 Prog= 96.67%   N_left=    84992        Time=  44.56    Time_left=   1.53       iGF= 164618.60  GF= 266707.47   iGF_per= 20577.32       GF_per= 33338.43
 Prog= 97.24%   N_left=    79872        Time=  45.03    Time_left=   1.28       iGF= 148945.53  GF= 265485.25   iGF_per= 18618.19       GF_per= 33185.66
 Prog= 99.18%   N_left=    53248        Time=  46.95    Time_left=   0.39       iGF= 124000.80  GF= 259676.25   iGF_per= 15500.10       GF_per= 32459.53
 Prog= 99.89%   N_left=    27648        Time=  48.23    Time_left=   0.06       iGF= 67945.90   GF= 254611.47   iGF_per=  8493.24       GF_per= 31826.43
================================================================================
T/V                N    NB     P     Q         Time          Gflops (   per GPU)
--------------------------------------------------------------------------------
WC0           264192  1024     4     2        49.22       2.498e+05 ( 3.122e+04)

HPL_pdgesv() start time Wed Apr  2 19:14:02 2025
HPL_pdgesv() end time   Wed Apr  2 19:14:51 2025

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   0.000195009244 ...... PASSED
||Ax-b||_oo  . . . . . . . . . . . . . . . . . = 0.0000000025425467
||A||_oo . . . . . . . . . . . . . . . . . . . = 66378.1881866107869428
||x||_oo . . . . . . . . . . . . . . . . . . . = 6.6966483875822718
||b||_oo . . . . . . . . . . . . . . . . . . . = 0.9741752868709390
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================
1 Like

LLM Benchmarks Using Ollama and OpenWebUI

1B parameter model - FP16 (Llama 3.1 - Instruct)

Uses only one GPU for inferencing due to its small size.
Ollama optimises for HWACCEL, thereby defaulting to FP16 compute.
response_token/s: 100
prompt_token/s: 15517.24
total_duration: 9573743183
load_duration: 9520186580
prompt_eval_count: 450
prompt_eval_duration: 29000000
eval_count: 1
completion_tokens: 1
eval_duration: 100000000
approximate_total: "0h0m9s"
total_tokens: 451
completion_tokens_details: {
	reasoning_tokens: 0
	accepted_prediction_tokens: 0
	rejected_prediction_tokens: 0
}

7B Parameter Model

response_token/s: 268.13
prompt_token/s: 3230.77
total_duration: 1757000482
load_duration: 59201876
prompt_eval_count: 42
prompt_tokens: 42
prompt_eval_duration: 13000000
eval_count: 451
completion_tokens: 451
eval_duration: 1682000000
approximate_total: "OhOm1s"
total tokens: 493
completion_tokens_details: {
	reasoning_tokens: 0
	accepted_prediction_tokens: 0
	rejected_prediction_tokens: 0
}

Notably, for a 7x sized model, we only use have 1/4th the performance, demonstrating exceptional scaling across the architectures.

Reasoning Models - Qwen 2.5 QwQ32B

response_token/s: 37.93
prompt_token/s: 481.93
total_duration: 34069477685
load_duration: 54297700
prompt_eval_count: 40
prompt_tokens: 40
prompt_eval_duration: 83000000
eval
_count: 1287
completion_tokens: 1287
eval_duration: 33928000000
approximate_total: "0h0m34s"
total
_tokens: 1327
completion_tokens_details: {
	reasoning_tokens: 0
	accepted_prediction_tokens: 0
	rejected_prediction_tokens: 0
}

Reasoning Models - r1-1776:70b

response_token/s: 23.31
prompt_token/s: 239.73
total_duration: 50218727742
load_duration: 59170660
prompt_eval_count: 35
prompt_tokens: 35
prompt_eval_duration: 146000000
eval_count: 1166
completion_tokens: 1166
eval_duration: 50011000000
approximate_total: "0h0m50s"
total_tokens: 1201
completion_tokens_details: {
	reasoning_tokens: 0
	accepted_prediction_tokens: 0
	rejected_prediction_tokens: 0
}

Reasoning Models - r1-1776:671b

response_token/s: 8.7
prompt_token/s: 50.72
total_duration: 117589459047
load_duration: 48891022
prompt_eval_count: 35
prompt_tokens: 35
prompt_eval_duration: 690000000
eval_count: 1016
completion_tokens: 1016
eval_duration: 116848000000
approximate_total: "Oh1m57s"
total_tokens: 1051
completion_tokens_details: {
	reasoning_tokens: 0
	accepted_prediction_tokens: 0
	rejected_prediction_tokens: 0
}
1 Like