Big Navi ROCm Support

Praetorian · December 13, 2020, 4:14pm

Hello everyone,

My first post here. Does anyone have any information about Big Navi and if it will be supported by ROCm? Thank you very much!

merry · December 13, 2020, 4:43pm

It’s meant to be supported in 4.0, then they surprised us with a 3.10 release so it may be a little while. Supposedly the OpenCL part of ROCm works although not officially supported yet. Some people have managed to get the 5000 series working with recent ROCm OpenCL which hasn’t been the case until recently so that’s a good sign.

Praetorian · December 13, 2020, 9:01pm

Thank you for a quick reply. I looked all overt the internet but all I found is speculation about ROCm support for Big Navi. What is your source?

I scored a 6900 XT and I am not sure if I should return it and just go NVIDIA since AMD isn’t focusing to much on ML stuff. I must say that my Radeon VII works quite well with ROCm.

Praetorian · December 13, 2020, 10:14pm

Found my answer, you were right. Now I need my 6900 XT! Thank you!

merry · December 14, 2020, 2:39pm

Apologies, I should have cited a source but am such a news glutton that often the source gets forgotten.

You scored a 6900XT as in you’re holding it in your hand, are you sure it’s not merely a trick of the light? If you get ROCm working and have some time to spare I’d appreciate a few quick benchmarks if you fancy it, it’d let me know how important (if at all) it is to try and get one myself: Benchmark Request Thread

Praetorian · December 14, 2020, 3:41pm

My ASRock Radeon RX 6900 XT PHANTOM GAMING D was already shipped from Newegg. Once I set it up I will be more then happy to provide more info. I will try to get a reference 6900 XT and will get rid of the ASRock one.

Praetorian · December 14, 2020, 3:47pm

One more thing I have ROCm set up for my Radeon VII if you want any benchmarks for that card it should be easy enough to do it before I change to 6900 XT.

merry · December 14, 2020, 4:54pm

Thanks for the offer but there’s no need for Radeon VII benchmarks, my niche is already saturated with Radeon VII’s as it’s by far the most performant affordable card out there. Radeon VII’s are getting hard to come across for a decent price so we’re on the lookout for a good replacement that will eventually have plentiful supply. The RTX 3000 series has already been shown to be unsuitable so I have high hopes for Big Navi.

Praetorian · December 14, 2020, 5:34pm

Got it. Will keep you posted once I get my card.

Praetorian · December 16, 2020, 3:56pm

Happy to report that it arrived! Will do some testing this weekend. I need to disassemble my loop before I can do anything. Stay tuned for more

merry · December 16, 2020, 4:00pm

Christmas has come early. If that’s a normal tree that box is massive.

Praetorian · December 16, 2020, 4:02pm

That is a normal tree lol.

merry · December 16, 2020, 4:03pm

Probably most of the volume of my entire desktop build.

merry · December 21, 2020, 2:01pm

ROCm 4.0 is out, unfortunately the readme makes no mention of big navi with the headline instead being that MI100 is supported in all of its unattainable glory. Still, worth a try. I definitely wouldn’t get rid of the card even if it doesn’t work just yet, that Navi 1 works unofficially is a good sign that they have been paving the way for big navi support. I wouldn’t be surprised if big navi was their next official new feature, hopefully ready before AMD supposedly sort out the supply issues. Maybe that’s being too optimistic.

CybeastRaystriker · December 21, 2020, 2:42pm

Well, Big Navi has 40 less CUs than the MI100, plus AMD have supposedly diverged the architecture a bit. I’d expect the former to perform 50-60% as well as the latter at best. Hope AMD really brings ROCm 4.0 to Big Navi, pretty interested in it myself.

merry · December 21, 2020, 11:45pm

CDNA and RDNA2 CU’s are quite different, CDNA is an evolution of Vega20 and RDNA/2 is streamlined for gaming/rendering. They shed much of the compute potential creating RDNA in order to better compete with Nvidia in the consumer space, which unfortunately transferred many of the drawbacks consumer Nvidia usually has but luckily not to the same degree where it counts.

Which architecture is better really depends on workload. CDNA is the doomslayer when it comes to workloads that scale by FP64, memory bandwidth and/or memory capacity, I don’t know how much catch up if any AMD has to do for ML. RDNA2 is built for rendering/gaming but I am surprised how well a 5700XT does with gpuowl (not the same class as a Radeon VII but not a million miles off), the good memory bandwidth and AMD-standard 1:16 FP64 ratio seems to allow it to punch above its weight in this particular workload. Even if a 6900XT ends up “just” performing close to twice as well as a 5700XT it’s worth a closer look, if it is as power efficient as it appears. The Infinity Cache could conceivably give an unnaturally large boost to this workload (low memory capacity but high bandwidth requirements) and tip it into the same class as a Radeon VII, but my optimism is leaking.

Praetorian · December 22, 2020, 5:19pm

Hi Everyone,

Beginning to test the ROCm support for the RX 6900 XT now. If it sucks I will sell the GPU at the price I paid. If anyone is interested and, if not I will eBay it. Probably will get a 3090 after that.

Praetorian · December 22, 2020, 7:10pm

Unfortunately, I have nothing good to report so far. GPU isn’t detected as a RX 6900 XT (see below) and rocm-smi and rocm-bandwidth-test give a command not found. I tried to run a tensorflow benchmark but I get an error:

/src/external/hip-on-vdi/rocclr/hip_code_object.cpp:120: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")
Fatal Python error: Aborted

Current thread 0x00007f4878539740 (most recent call first):
  File "/home/rys/vrocm20/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 711 in __init__
  File "/home/rys/vrocm20/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1596 in __init__
  File "/home/rys/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 3538 in setup
  File "tf_cnn_benchmarks.py", line 61 in main
  File "/home/rys/vrocm20/lib/python3.8/site-packages/absl/app.py", line 251 in _run_main
  File "/home/rys/vrocm20/lib/python3.8/site-packages/absl/app.py", line 303 in run
  File "tf_cnn_benchmarks.py", line 73 in <module>
Aborted (core dumped)

Agent 2                  
*******                  
  Name:                    gfx1030                            
  Uuid:                    GPU-XX                             
  Marketing Name:          Device 73bf                        
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 29631(0x73bf)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2660                               
  BDFID:                   768                                
  Internal Node ID:        1                                  
  Compute Unit:            80                                 
  SIMDs per CU:            4                                  
  Shader Engines:          8                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        64(0x40)                           
  Max Work-item Per CU:    2048(0x800)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16760832(0xffc000) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1030         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32

It feels like AMD will not support this card. I don’t want to own this card just for gaming it is just not what I would do. Ehh… will see what else is out there…

Found some more information here: https://github.com/RadeonOpenCompute/ROCm/issues/1180

Praetorian · December 22, 2020, 7:48pm

Update 1
It seems that I was wrong, gfx1030 is the RX 6900 XT and the post above confirms future RDNA 2 support. Thus I think I will wait and see how everything develops and continue using my Radeon VII for ML tasks.

Stay tuned for more.

Agustin_Aguilar · January 24, 2021, 5:53pm

@Praetorian did you have any luck with Rocm 4.0? I managed to install it and it seems to work with OpenCL, but I get the same hipErrorNoBinaryForGpu error trying to run tensorflow