Hello. I’m sorry if this is in the wrong category, but I was wondering if there was a way to do GPU VRAM error reporting in Linux. my GPU keeps kicking out errors, but I have no alternative to HWinfo, since it is windows only. Is there a solution for this, or will I have to build my own?
And if so, can some of the more experienced coders assist me? I don’t have loads of experience with this sort of thing.
Thanks in advance!
type this in terminal
dmesg | grep gpu
where gpu
is the name of your gpu manufacturer. I.E amd or nvidia.
Other places logs are stored is within /var/log
For example, this is what I get.
luke@Mercury ~> dmesg | grep amd
[ 0.678709] [drm] amdgpu kernel modesetting enabled.
[ 0.681527] fb: switching to amdgpudrmfb from EFI VGA
[ 0.681723] amdgpu 0000:01:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
[ 0.681949] amdgpu 0000:01:00.0: VRAM: 4096M 0x0000000000000000 - 0x00000000FFFFFFFF (4096M used)
[ 0.681950] amdgpu 0000:01:00.0: GTT: 4096M 0x0000000100000000 - 0x00000001FFFFFFFF
[ 0.681992] [drm] amdgpu: 4096M of VRAM memory ready
[ 0.681992] [drm] amdgpu: 4096M of GTT memory ready.
[ 0.683166] amdgpu 0000:01:00.0: amdgpu: using MSI.
[ 0.683176] [drm] amdgpu: irq initialized.
[ 0.686924] amdgpu: powerplay initialized
[ 0.687634] amdgpu 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000100000008, cpu addr 0xffff9a42438f8008
[ 0.687651] amdgpu 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000100000018, cpu addr 0xffff9a42438f8018
[ 0.687663] amdgpu 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000100000028, cpu addr 0xffff9a42438f8028
[ 0.687679] amdgpu 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000100000038, cpu addr 0xffff9a42438f8038
[ 0.687692] amdgpu 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000100000048, cpu addr 0xffff9a42438f8048
[ 0.687743] amdgpu 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000100000058, cpu addr 0xffff9a42438f8058
[ 0.687756] amdgpu 0000:01:00.0: fence driver on ring 6 use gpu addr 0x0000000100000068, cpu addr 0xffff9a42438f8068
[ 0.687770] amdgpu 0000:01:00.0: fence driver on ring 7 use gpu addr 0x0000000100000078, cpu addr 0xffff9a42438f8078
[ 0.687784] amdgpu 0000:01:00.0: fence driver on ring 8 use gpu addr 0x0000000100000088, cpu addr 0xffff9a42438f8088
[ 0.687820] amdgpu 0000:01:00.0: fence driver on ring 9 use gpu addr 0x0000000100000098, cpu addr 0xffff9a42438f8098
[ 0.687834] amdgpu 0000:01:00.0: fence driver on ring 10 use gpu addr 0x00000001000000a8, cpu addr 0xffff9a42438f80a8
[ 0.688382] amdgpu 0000:01:00.0: fence driver on ring 11 use gpu addr 0x0000000000881e90, cpu addr 0xffffbc104343fe90
[ 0.688461] amdgpu 0000:01:00.0: fence driver on ring 12 use gpu addr 0x00000001000000c8, cpu addr 0xffff9a42438f80c8
[ 0.688475] amdgpu 0000:01:00.0: fence driver on ring 13 use gpu addr 0x00000001000000d8, cpu addr 0xffff9a42438f80d8
[ 0.688490] amdgpu 0000:01:00.0: fence driver on ring 14 use gpu addr 0x00000001000000e8, cpu addr 0xffff9a42438f80e8
[ 1.044424] fbcon: amdgpudrmfb (fb0) is primary device
[ 1.044470] amdgpu 0000:01:00.0: fb0: amdgpudrmfb frame buffer device
[ 1.059805] [drm] Initialized amdgpu 3.9.0 20150101 for 0000:01:00.0 on minor 0
First of all, thanks for the reply! I did that and got what you showed, but I didn’t seem to find any GPU mem errors. Are my cards just not throwing any errors, or is there another place I should look?