Hyper Threading vs. Logical cores

This question is mostly aimed at Wendell and others with his in-depth knowledge of the inner workings of CPUs, but what specifically is the difference between hyperthreading and logical cores? I've had a couple of friends that say there is no difference, but when I talk to a Senior DBA about hardware, he points out that when his company switched from a quad core (4 physical cores) to a Hyper Threaded dual-core, the performance lagged almost 20% in certain process runtimes.

Also, would this comment be suitable for the INBOX.EXE forum?

Think of it this way. You have a bank, with four tellers (cores). Many customers (threads) waiting to get serviced (CPU time on a core). These tellers are pretty bright, they can each multi-task (hyperthreading) between two customers (threads on each logical core). They can each have two customers (threads) at their window, but can only work with one customer at a time (there is after all only one physical core per two logical cores of the CPU), but if that customer has to fill out a form (the thread stalls, or sleeps, or has used its fair share of CPU time), the teller can work on the next customer (thread) in their line (the two hyperthreads per core). They simply make a note of what they are doing with the first customer (the thread CPU states), put it aside (each core in the CPU has two areas per core to track state), and start working on the second customer (thread). The tellers (cores) can work very efficiently with these two customers (threads), and can do so for as long as the bank manager (Operating System) lets them, because they basically have everything they need to know about those two customers (threads) right at their fingertips (the two areas of each core that hold this information.)


Above all of this is the bank manager (the OS). The manager decides which two customers (threads) are at each teller (core) at any given time, and can swap one of those customers (threads) with another waiting customer (thread) in the bigger line of waiting customers (the whole thread pool for the OS).


Now it so happens that this swapping slows the servicing of the customers (threads), so the manager (OS) avoids this at all costs. In addition, the manager (OS) knows that the more work (threads) he can keep on as few tellers (cores), the better. In fact, if the manager (OS) can, he'll put a teller (core, real or logical) on break (CPU parking), not having to pay them during this time (energy savings for the CPU.) Even more interesting, the manager (OS, actually OS and CPU features) knows that if he can push as much work on to a few tellers (cores), these tellers drink a big cup of coffee and work even faster than normal (turbo-boost). The manager (OS/CPU features) knows that there's not enough coffee to go around to all the tellers (cores), and that all of the tellers (cores) cant all be working at the faster rate, so the manager (OS/CPU features) tries to keep as few tellers (cores) active as possible, so long as he thinks it won't affect the overall servicing of the customers (thread pool).



Playing with things like the manager's decisions (thread scheduling) by overriding him (playing with affinity) can force all of the tellers (cores) to do work, even when the manager (OS/CPU features) would dictate this is not the most efficient way to do things. It will likely have no effect on the rate of servicing of the whole customer collection (thread pool), and may in fact cause all of the tellers (cores) to work at normal speed (no turbo-boost), slowing things down in reality.


The manager (OS) knows best, that's why you hired him. Barring him being drunk (a bug in the OS or CPU scheduling logic), he'll usually make better decisions than you.


Cited from: http://pcgamingtips.blogspot.com/2010/03/hyperthreading-explained-to-ten-year.html

Okay, that's nice, but the article refers to logical cores and hyper threading as two sides of the same coin. As if the tellers that can multi task are functioning as two cores, (hence logical core) and the act of utilising this is known as hypre threading(the multitasking itself). Is this true? And if so, which is more correct, to say teh function as two cores, and can be treated as two, (the AMD way, hence the 8350) or that one core is multitasking, (the intel way, hence the hyperthreaded I3s, I5s, and I7s)?

With AMD's FX series, they are all physical cores, they just share other resources like cache between the same module (2 cores). With Intel's Hyper Threading, It just allows for 1 physical core to switch between 2 logical cores very quickly. 1 physical core can never actually be doing 2 things at the same time, but the rapid switching between them increases the overall output of that 1 physical core. So AMD has all physical cores that can function simultaneously, and Intel does not.

Wrong, there are either four six or eight discrete algorithmic and logic units, however, on top of being grouped in twos and forced to share things like their cache, they also must share a single floating-point processing unit. This reduces their effectiveness because processes that require certain functions such as division, roots, bitshifting and some other operations must have an available floating point unit as the algorithmic and logic units aren't capable of those operations.

I was bound to look like a fool. I went out on a limb on a topic I wasn't 100% sure about and here it is biting me in the ass. My apologies.

www.youtube.com/watch?v=7c3CfJe_6kQ

and giving steroids to your tellers is like overclocking

Here's my attempt at an analogy to explain hyperthreading. :P

Let's say my mouth is one "physical core" and I have one hand feeding it food (feeding the core data to be processed). If my mouth finishes processing a chunk of food before my hand can feed it more food, that is time wasted where my mouth is doing nothing, waiting for more food. Now if we introduce a 2nd hand getting food ready for my mouth, there's very little to no wasted time because the two hands can more quickly and efficiently prepare the food to be processed as fast as the mouth can process it. 

It's basically a more efficient way of scheduling the work to be done/processed.

Linus did a really good video explaining it pretty quick :)

https://www.youtube.com/watch?v=wnS50lJicXc

Not all cores in an Intel CPU have separate FPUs, and ARM CPU doesn't have an FPU for every CPU, a graphics card has a lot of FPUs that can work for a CPU without FPUs... AMD shares resources, and for some tasks that will bottleneck the cores, but in general, when it comes to general multi-core scaling, because they have separate CPU cores, they do scale better than Intel HT CPUs, even if the performance per core is lower.

Now this is all pretty theoretical, multi-core scaling is held back on Windows machines, it's not on linux machines, because of that there is a performance difference between Intel and AMD on both platforms. In the real world, for most applications, there will hardly be any added performance between an i5-2500 and an i7-4770 and an FX-8350, in Windows maybe 1-2% in games, maybe 5-10 % in CPU-bound applications. Synthetic benchmarks are not that representative of real life performance, they are mostly a marketing thing.

Sit someone before a PC with even a Phenom II X4 965 or an i5-750, and then before an i7-4770 or an FX-9550, all with the same graphics card, and let them play a game. Do you honestly think that they will notice the difference in framerate? Maybe when running MLL at extreme settings, and even then.

For the sake of the argument though, HT is an algorithm-based optimization technology that works pretty well, but much like branch prediction, can also make bad choices in some applications. This is less of a factor with AMD FX processors that have real logical cores, these are full application processors, full CPUs, FPUs or cache is peripheral, and doesn't make them any less functional as CPU cores. However, there is a reason why burst performance of a Phenom II X4 955/965, with full periphery per core, will be better in most applications than burst performance of for instance an FX-4350. End conclusion still is that AMD should make a Phenom III X8.

That's basically where I got my analogy from. ;) It's the best explanation I've heard yet and have been using it ever since seeing that video. Thanks Linus. :) 

^That's a great visual representation of it. 

thanks 4chan /g/... tis where I got it from.

Yes, they really should. I would buy into that with no qualms. But they will be needing a new socket for that, and they need to work something out so they can get those chips into mini-itx form-factor boards. Intel is hogging that market, and I want an AMD powered mini-itx build in the EVGA Hadron Hydro.

Wouldn't each pair of AMD "cores" have a single pair of legs?