Hi. I’m not on Threadripper, but I was getting the same BadTLP, BadDLLP, etc. errors on a Ryzen 2700X build, with a Asus Prime B450 Plus motherboard, and Nvidia GTX 1660 graphics.
I tried setting pcie_aspm=off in grub and while this did get rid of these errors, it caused a much worse problem that my M.2 NVMe drive controller would get stuck in a low power state or something, at which point it would quit X and remount my root partition as readonly, causing a whole lot of headache.
I found this problem was triggered for me mostly when running some mprime “P-1” workloads with a large percentage of memory allocated (8 of 16GB). This uses a lot of memory bandwidth, and for whatever reason seemed to be interfering with the NVMe like this.
Here is some dmesg output of what it looks like when the NVMe controller went down for me:
[ 989.409598] perf: interrupt took too long (4979 > 4912), lowering kernel.perf_event_max_sample_rate to 40000
[ 1195.031765] fuse: init (API version 7.31)
[ 1327.328770] perf: interrupt took too long (6268 > 6223), lowering kernel.perf_event_max_sample_rate to 31750
[ 2238.284260] perf: interrupt took too long (7846 > 7835), lowering kernel.perf_event_max_sample_rate to 25250
[ 9117.462381] perf: interrupt took too long (9831 > 9807), lowering kernel.perf_event_max_sample_rate to 20250
[ 9261.476036] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
[ 9261.603999] pci_raw_set_power_state: 19 callbacks suppressed
[ 9261.604009] nvme 0000:01:00.0: Refused to change power state, currently in D3
[ 9261.604430] nvme nvme0: Removing after probe failure status: -19
[ 9261.632241] print_req_error: I/O error, dev nvme0n1, sector 15247304 flags 100001
[ 9261.632255] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
[ 9261.729511] nvme nvme0: failed to set APST feature (-19)
[ 9261.739582] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
[ 9261.739591] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
[ 9261.739595] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
[ 9261.756670] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 4, rd 1, flush 0, corrupt 0, gen 0
[ 9261.756951] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 4, rd 2, flush 0, corrupt 0, gen 0
[ 9261.758061] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 4, rd 3, flush 0, corrupt 0, gen 0
[ 9261.758368] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 4, rd 4, flush 0, corrupt 0, gen 0
[ 9261.759112] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 4, rd 5, flush 0, corrupt 0, gen 0
[ 9261.759138] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 4, rd 6, flush 0, corrupt 0, gen 0
[ 9262.276359] Core dump to |/bin/false pipe failed
[ 9262.336595] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
[ 9262.336817] caller _nv000939rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs
[ 9262.975980] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD data byte 62
[ 9263.012987] Core dump to |/bin/false pipe failed
[ 9263.015801] Core dump to |/bin/false pipe failed
[ 9263.035986] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD data byte 1
[ 9263.134288] Core dump to |/bin/false pipe failed
[ 9265.580609] BTRFS: error (device nvme0n1p2) in btrfs_commit_transaction:2234: errno=-5 IO failure (Error while writing out transaction)
[ 9265.580610] BTRFS info (device nvme0n1p2): forced readonly
[ 9265.580611] BTRFS warning (device nvme0n1p2): Skipping commit of aborted transaction.
[ 9265.580612] BTRFS: error (device nvme0n1p2) in cleanup_transaction:1794: errno=-5 IO failure
[ 9265.580613] BTRFS info (device nvme0n1p2): delayed_refs has NO entry
[ 9292.708719] btrfs_dev_stat_print_on_error: 320 callbacks suppressed
[ 9292.708723] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 123, rd 208, flush 0, corrupt 0, gen 0
[ 9368.485780] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 124, rd 208, flush 0, corrupt 0, gen 0
[ 9577.728458] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 124, rd 209, flush 0, corrupt 0, gen 0
[ 9577.728508] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 124, rd 210, flush 0, corrupt 0, gen 0
[ 9577.728715] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 124, rd 211, flush 0, corrupt 0, gen 0
[ 9577.728768] Core dump to |/bin/false pipe failed
[ 9578.059425] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 124, rd 212, flush 0, corrupt 0, gen 0
[ 9578.059466] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 124, rd 213, flush 0, corrupt 0, gen 0
[ 9578.059531] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 124, rd 214, flush 0, corrupt 0, gen 0
[ 9578.059555] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 124, rd 215, flush 0, corrupt 0, gen 0
[ 9578.059574] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 124, rd 216, flush 0, corrupt 0, gen 0
[ 9578.059590] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 124, rd 217, flush 0, corrupt 0, gen 0
[ 9578.059604] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 124, rd 218, flush 0, corrupt 0, gen 0
[ 9608.872774] btrfs_dev_stat_print_on_error: 1 callbacks suppressed
[ 9608.872777] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 125, rd 219, flush 0, corrupt 0, gen 0
[ 9608.872797] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 126, rd 219, flush 0, corrupt 0, gen 0
[ 9608.872805] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 127, rd 219, flush 0, corrupt 0, gen 0
[11308.648706] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 127, rd 220, flush 0, corrupt 0, gen 0
[11308.648753] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 127, rd 221, flush 0, corrupt 0, gen 0
This was on OpenSUSE Tumbleweed which I was trying out, but I just switched over to Linux Mint 19.2 today since I’m more familiar with that. (It turns out the BadTLP,etc. errors show up on both distros, but I figured I would try Mint and see if the results were any different)
So yeah I’m definitely not going to try turning off aspm again. I think I’ll just ignore these errors as they don’t seem to be causing any real problems as far as I can tell. I might eventually try setting “noaer” to hide these errors, but for now I’m fine just not thinking about them as long as my system isn’t crashing.
edit: BTW, my SSD is: Crucial P1 500GB 3D NAND NVMe PCIe M.2 SSD - CT500P1SSD8