Funny you should mention that, the FEA benchmark can switch between using the the relatively new AOCL 4.1.1 that AMD made for Zen4 and the 2022 version of the Intel MKL, and the Intel BLAS outperforms the AMD BLAS more often than not even on EPYC/TR.
This is the thread:
To be fair the expected performance difference isn’t entirely because of cross CCD communication bottlenecks, amdahl’s law is somewhat to blame too.