Speaking from somewhat earlier personal experience with one of MDPI’s apparently better journals, which I rather suspect Micromachines isn’t among, peer reviews are usually perfunctory. Like maybe skim the paper and jot a few remarks, so maybe 20 minutes of time. This results from MDPI management setting time from submission to online as their key performance metric. MDPI’s copy editorial staff’s not dumb, so they’ve followed that mandate by increasingly optimizing to shove articles onto the website.
Springer and Frontiers seem as bad as MDPI, if not worse, about quality control. But they’ve been less clueless about managing the optics, so haven’t gotten as much of a reputational hit.
The Micron and Hynix datasheets I found don’t specifically say, but appear to provide whole HBM pJ/bit (not counting the IMC on the GPU/CPU). So I think those figures are probably comparable to calculating pJ/bit by dividing DDR5 SPD reported power by DIMM size. I was also able to find papers that look at Hopper power monitoring and stuff, including mentioning specific reporting for HMB, but unfortunately nothing I came across actually had results for just the HMB on its own.
It’s not apparent to me Synopsis’ PHY power reporting controls for process node. Which is partially appropriate given generational node shrinks but also leaves latitude to cherry pick comparisons for marketing purposes since (at least some of) the PHYs they sell support a fairly wide range of nodes. It seems to me their graphs are generally relevant and broadly indicative but I have found myself questioning how much attention should be given to the specifics on occasion.
If I put on a tinfoil hat, the answer’s Nvidia. Conspiracy theories aside, HBM only alleviates bandwidth bound memory workloads if you can keep the right data in it at sufficiently low latency. Doing a good job of that presumably means reworking core designs to effectively utilize HBM alongside DDR, suggesting either the HBM won’t end up being all that effective or AMD and Intel would have to pull engineering time off less niche CPUs. I also suspect width scaling’s lower than with GPUs. Both from the shape of CPU workloads plus having to add or maybe reassign memory channels for HBM3.
For EPYC it probably makes more sense to focus on v-cache as it’s more versatile to CPU workloads and doesn’t compete with CDNA for HBM allocations. Plus I’d guess there aren’t a lot customers wanting to pay enough for HBM EPYC that AMD can make the same margins on it as Instinct. That seems to fit with how HBv5 turned out. For Xeon, getting off Intel10 Intel7 to more competitive nodes, reducing costs, dealing with layoffs, and not having to compete with AMD and Nvidia for HBM allocation. Plus some of what I’ve read suggests Xeon Max was motivated by increasing HPC performance when clustered with Ponte Vecchio, so it might be end of line because of the switch to Jaguar.
GPU adoption seems to me to have been simpler as HBM’s a straightforward upgrade from GDDR. Not exactly drop in, but fairly close.