I had tested performance with 4-6-8 cores. I currently use all 8 cores on the guest as it gave the best performance.
Storage and memory usage on the guest is low. CPU usage can be between 40-80% usage in game for both host and guest.
I had tried different caching modes when using qcow2 then tried using a LVM partition which didn't make a difference.
I configured noatime on my partitions and deadline i/o schedular on grub after reading your post. Unfortunately it didn't make a difference in my setup. Disk I/O within the guest was very low to begin with.
*Note however that I am sharing HDDs between the guest and host, a SSD and regular HDD and I was never able to attempt it with a separate drive for the guest.
I honestly did not check and compare the enabled CPU features to the guest. Still have to check this.
I had configured the guest CPU to match the output of lscpu and virsh capabilities. One has the AMD 8350 as 4 cores, 2 threads and the other as 8 cores,1 thread. I tried it both ways but no difference.
Another thing to note is that when I tried using the onboard "auto" overclock on the CPU I gained a very slight performance increase, like 1-2fps difference. Also increasing my ram allocation from 8GB to 10GB and setting hugepages to match, I gained a very minor performance increase.
As for where the bottleneck lies, I am really not sure. The fact is that newer games, higher direct X versions run very good, close enough to bare metal. Doom, the only vulkan game I own runs like bare metal. The problem is only with older DirectX games, they run horribly.