First part yes, second part kind of. Desktop users activity is mostly pretty predictable but, from a drive performance view that mainly looks at things in terms of either 1MB sequential or 4k IOPS, there’s a bunch of reading files of a few hundred kB to a few MB that gets proxied as 1 MB sequential and bunch of small file IO that gets proxied as 4 kB random access.
Neither model’s all that great but, as drive performance tends to be pretty flat above 128-256 kB, the sequential approximation’s often pretty decent. If you’re only going to pick one data point to represent the range from 1 byte to 128 kB, 4 kB isn’t a bad size either. The random part’s more questionable as IO from any one user’s likely pretty correlated. IOPS and random latency are measures that incline more towards cases like databases or badly fragmented files on hard drives than typical SSD activity.
There’s also not much code doing multithreaded IO or single threads capable of more than 1-2 GB/s. So many apps lack the IO performance to saturate a 3.5 or SATA SSD, much less the ~3.5 GB/s limit and lower latency of PCIe 3.0 x4 NVMes, and the ones that can often do bursty IO of a couple GB or less that’s absorbed by OS or file system read prefetch and write caching. Meaning drive latency and bandwidth’s pretty well hidden.
The most common cases for typical(-ish) users to move sizeable amounts of data are probably initial backup syncs, video copies, and transferring image sets. Usually USB, Ethernet, SD card, and hard drive constraints mean those happen below 3.0 x4 performance, though USB 4/Thunderbolt 3 is becoming more prevalent. Since these occasional transfers are usually in the hundreds of MB to tens of GB range, a corollary is the ~400 write and ~600 write minimum lifespans typical of QLC and TLC specs are effectively infinite.
Next to no difference, yup. Speaking as someone who regularly does 1-2 TB sequential reads at the ~7 GB/s limit of PCIe 4.0 x4, it doesn’t mean much to me either. I have to get the 1-2 TB into place first, which is usually network or physical mail constrained (what is the bandwidth of a USB drive in a postal van?) and, in the best case of physically moving 4.0 x4 NVMes between M.2 sockets in different machines, drops to the upper bound of a ~1.5 GB/s cache folding rate once pSLC’s exhausted.
Once that’s done there’s a few initial processing steps that do 7 GB/s with subsequent ones running out of desktop cores at 5-6 GB/s, getting stuck at ~3 GB/s because that’s where the code libraries they need to call through peak with 24-32 threads, or hitting other constraints. With a 5.0 x4 it looks like the initial steps would run out of cores at 10-12 GB/s. But, if it’s not an SM2508 or E31T drive, probably I’d have to throttle the workload below that to avoid overheating even with good airflow and the drives under upper spec NGFF heatsinks.
I think 14 GB/s is not out of the question once 16 core Zen 6 is available. But reading 2 TB at 12 GB/s takes 167 s. At 14 GB/s it’s 143 s. Either way I’ll have switched to doing some other task while the job’s running. It’s unlikely I’ll happen to check back somewhere between 143 and 167 s.
Diminishing returns are diminishing. The marketing numbers, not so much. 
100 MB/s is 25,000 4k IOPS. Not many real world workloads where one thread generates that many random requests for long enough that it matters. Like, when was the last time you said hey, I have 100 GB on this drive, I gotta have a 0.1% random sample of that data, and ZOMG I absolutely positively need this app code to have finished doing that five seconds ago?
Also, I think there’s an erroneous conflation of Q1T1 and Q32T1 here. At Q1 DiskSpd/CrystalDiskMark does sequential IO, meaning the thread blocks until the 4 kB comes back, then issues the next IO. At Q2 and up it’s async, meaning the thread issues multiple requests and then picks back up when the first of them returns. In IOPS bottlenecked workloads that are implemented for performance (e.g. databases) the code’s usually multithreaded async, so Q32T16’s likely a more relevant measure.