I am struggling to understand where I am going wrong with the install of rocm on Ubuntu 24. I am following the AMD guide for installing rocm but I seem to be cursed as the mi100 cards are not detected. All of the install seems to complete without errors but when I run rocminfo it returns an error that “rock module not loaded” and none of the MI100 cards are detected. Where this gets weird is that I had this working before with all the cards functioning properly but I was having OS issues with remote desktop so I did a clean install and now I can not get the OS working again.
I am not a Ubuntu native so I may be missing something crucial but for the life of me I am not sure what it would be. I am happy to provide more data but at this point I am not sure what information would be helpful.
It may be worth trying another version of ubuntu (you can live-boot) and see if you can see the device properly or another distro. What does lsblk
return? It may not be loaded in the kernel that you’re running too, you can try the same method and try different kernels and maybe you get lucky?
Hope you get your mi100 equipment running!
Thanks for the reply . I was able to load an old version of ubuntu in the live mode and it seemed to work but I ran into installation issues that I have not been able to work out so I cant say for sure.
I have too many variables right now. If I could get ubuntu to install consistently that would really help my troubleshooting.
I think I may have found my issue with reinstalling ubuntu. It seems that it would not properly wipe the drive during the install and error out. I rebuilt the array and I was able to get installed again. After getting the OS installed I was able to get rocm installed as well. I have no idea what changed as I followed the official docs and for unknown reasons it worked this time.
My major issue with the OS right now is that the remote desktop is not working so I am having to do everything from the lom >.<
Glad you got it working, minus the remote desktop issue.