GPU VRAM Error Reporting

Hello. I’m sorry if this is in the wrong category, but I was wondering if there was a way to do GPU VRAM error reporting in Linux. my GPU keeps kicking out errors, but I have no alternative to HWinfo, since it is windows only. Is there a solution for this, or will I have to build my own?
And if so, can some of the more experienced coders assist me? I don’t have loads of experience with this sort of thing.

Thanks in advance!

type this in terminal

dmesg | grep gpu

where gpu is the name of your gpu manufacturer. I.E amd or nvidia.

Other places logs are stored is within /var/log


For example, this is what I get.

[email protected] ~> dmesg | grep amd
[    0.678709] [drm] amdgpu kernel modesetting enabled.
[    0.681527] fb: switching to amdgpudrmfb from EFI VGA
[    0.681723] amdgpu 0000:01:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
[    0.681949] amdgpu 0000:01:00.0: VRAM: 4096M 0x0000000000000000 - 0x00000000FFFFFFFF (4096M used)
[    0.681950] amdgpu 0000:01:00.0: GTT: 4096M 0x0000000100000000 - 0x00000001FFFFFFFF
[    0.681992] [drm] amdgpu: 4096M of VRAM memory ready
[    0.681992] [drm] amdgpu: 4096M of GTT memory ready.
[    0.683166] amdgpu 0000:01:00.0: amdgpu: using MSI.
[    0.683176] [drm] amdgpu: irq initialized.
[    0.686924] amdgpu: powerplay initialized
[    0.687634] amdgpu 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000100000008, cpu addr 0xffff9a42438f8008
[    0.687651] amdgpu 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000100000018, cpu addr 0xffff9a42438f8018
[    0.687663] amdgpu 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000100000028, cpu addr 0xffff9a42438f8028
[    0.687679] amdgpu 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000100000038, cpu addr 0xffff9a42438f8038
[    0.687692] amdgpu 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000100000048, cpu addr 0xffff9a42438f8048
[    0.687743] amdgpu 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000100000058, cpu addr 0xffff9a42438f8058
[    0.687756] amdgpu 0000:01:00.0: fence driver on ring 6 use gpu addr 0x0000000100000068, cpu addr 0xffff9a42438f8068
[    0.687770] amdgpu 0000:01:00.0: fence driver on ring 7 use gpu addr 0x0000000100000078, cpu addr 0xffff9a42438f8078
[    0.687784] amdgpu 0000:01:00.0: fence driver on ring 8 use gpu addr 0x0000000100000088, cpu addr 0xffff9a42438f8088
[    0.687820] amdgpu 0000:01:00.0: fence driver on ring 9 use gpu addr 0x0000000100000098, cpu addr 0xffff9a42438f8098
[    0.687834] amdgpu 0000:01:00.0: fence driver on ring 10 use gpu addr 0x00000001000000a8, cpu addr 0xffff9a42438f80a8
[    0.688382] amdgpu 0000:01:00.0: fence driver on ring 11 use gpu addr 0x0000000000881e90, cpu addr 0xffffbc104343fe90
[    0.688461] amdgpu 0000:01:00.0: fence driver on ring 12 use gpu addr 0x00000001000000c8, cpu addr 0xffff9a42438f80c8
[    0.688475] amdgpu 0000:01:00.0: fence driver on ring 13 use gpu addr 0x00000001000000d8, cpu addr 0xffff9a42438f80d8
[    0.688490] amdgpu 0000:01:00.0: fence driver on ring 14 use gpu addr 0x00000001000000e8, cpu addr 0xffff9a42438f80e8
[    1.044424] fbcon: amdgpudrmfb (fb0) is primary device
[    1.044470] amdgpu 0000:01:00.0: fb0: amdgpudrmfb frame buffer device
[    1.059805] [drm] Initialized amdgpu 3.9.0 20150101 for 0000:01:00.0 on minor 0

First of all, thanks for the reply! I did that and got what you showed, but I didn’t seem to find any GPU mem errors. Are my cards just not throwing any errors, or is there another place I should look?

Thanks!