Okay lets go:
(This is the basics)
Branch predictor, predicts the what to fetch in the future. Haswell have a far superior branch predictor than piledriver. This can be critical when running larger loads. If the branch predictor guess wrong, it could cause a pipeline flush. It will flush the entire pipeline(ironical right?) which is a bad things, and cause things so slow down.
Fetch: Fetch data, and queue them.
Decoders: Decode instructions. Intel have AMD uses different kinds of decoders. Intel have 2 kinds: Simple decoder and advance decoder. General in the perspective of 3:1. Simple is for smaller decodes, which dont require much, whereof complex is for longer and more complex decodes, which require more time.
AMD is using 4 of the same decoders, which are slightly stronger than Intels simple decoders.
Now the executing part.
Haswell core contain: 4 ALUs. 3 256bit SIMD.
Piledriver module(2 cores) contain: 4 ALUs. 2 128bit SIMD.
Intel and AMD have different kinds of cores, have different kinds of concepts of cores.
One may argue that the fx 8xx0 isn't a 8 core, but in fact a 4 cores. AMD is using CMT. AMD have cluster-cores. 1 cluster core is 1 module.
The fx 8xx0 have 4 modules. These module is a core with duplicated component like the ALU and memory pipeline. (SR did add additional decoders). Means that the '2 cores' are sharing the fetch, decoders(not in SR), FPU and l2 cache.
The general cluster-concept is really strong, but AMD made some mistakes which is pulling it down on stronger loads.
Piledrivers branch predictor have a hard time on heavier load, and for it to function to it fullest, it require it to be running a predictive stream of instructions.
Decoders also had a hardtime not starving the cores(reason why they added them in SR), which was noticed when both cores ran at higher loads.
If you noticed, Intel have far better SIMD. This is Intels strongest site, reason why an i5-4670k can outperform an 8350 in gaming and streaming.
Piledriver having to share its FPU, can in some scenarios cause huge-fallback.
AMD cannot compete on the SIMD level, which is the primary reason the bought themselves into the HSA foundations.
Not even to mention cache management, where AMD is so far behind.
All units are generally never in use at the same time, this is essentially why hyper-threading exist.
Hyper-threading is Intels implementation of SMT. It is as simple: Let another thread run on the core. Non are giving the priority over the other. This can also have some fallback on the bandwidth, but 9.5/10 times it is faster.
Then there is the whole GHz. Amount of cycles per second. Here you will notice difference between architectures.
AMD is using a longer pipeline which naturally provides higher cycles per second, but takes more cycles to perform a instructions. CPI. A small pipeline will perform better with higher clockspeed.
But the processor is not the one giving out the processes.
To support multi-threading the developer would need to code it to use(if not supported by the compile by default). So if you code your application to use 8 cores, it will use 8 cores.
You would also have to understand how windows schedules its processes.
Windows will use the least amount of cores possible, and will stack process on that core. Many people think windows schedules it out on many cores, but that is wrong. Incorrect, and would mean what core wouldn't have the ability to sleep.
Then there is ISA, but it would take days explaining each and everyone.
Many people think that haswell wasn't the huge upgrade from IV, they are wrong. Haswell have potential to perform 30-60% better than IV if carefully coded for, and peak to over 100%.
This surely would become the norm when compilers will have support for the various haswell features by default, and wouldn't need to be hardcoded.
Remember this is only the basics.
Would like to add: AMD simple cannot fix these things. AMD are not manufacturing their own products, reason why they just reached 28nm.
Why Intel is pricier: Intel is manufacturing their own products, which is heavily costly. AMD used to manufacturer their own products too, back then Intel and AMD was similar prices, AMD slightly cheaper because running on the same architecture-line-up.
Also Intel is a far bigger company than AMD, advertising much more than AMD and having more employees. Intel is starting to push AMD more and more forcing
AMD to lower their prices on their processors.
Both companies use defected parts. This is where piledriver shines. 1 core is defect(example)? Disable it and sell it as a 6xx0. Intel is doing the same thing, just not in the same scale. A 4770k with a defected cache can be sold as a 4670k(with hyper-threading disabled ofc).
Sorry for the long essay.