Could anyone explain how is that better or different than hyper-threading in cpu? Or just form of super-threading layer over multi-threaded api?
I understand this will be implemented in 300 series.
Could anyone explain how is that better or different than hyper-threading in cpu? Or just form of super-threading layer over multi-threaded api?
I understand this will be implemented in 300 series.
My knowledge is admittedly limited, but I'll try to work this out.
This appears to be a bit different than hyper threading, as instead of activating a virtual/logical thread to handle lower dependency instructions from the core, it is just utilizing the available shader threads to complete an output in a mix of preemptive and out-of-order tasking.
I imagine the current model to start out as shaders #1 - #8 all start working on an input at the same time, meaning the final string of information is assembled in sequence to form a legible output. So all threads have completed a task at the same time, but shader #1 outputs instruction #1 first and this cascades down the line. Meanwhile they are working on the next 10 sets of outputs synchronously. The idle time explained is the release time of the output which translates into the signal time of the input. No thread can begin a new operation until the preceding thread has signaled that it is done and the output is finished. So 1-2-3-4-5-6-7-8.
I imagine the asynchronous model is just different in that all threads are signaled in sequence after the first thread has received the input, instead of waiting for the output to signal another input. On top of this, each thread will be given its own string of information to deal with by the command processor, so we could see outputs like 1-6-2-4-7-5-8-3 and so on. The preemptive system recognizes it has 8 threads of info coming, and arranges them in the order that the command processor deals them. Like tetris, it arranges the "random" blocks into neat little rows. :P
I guess the video shows it best. The old model left gaps in the timing, and now by changing when things are signaled along with preempting the next signal they can time it to all line up without a whole lot of idling while the next string forms.
Oh, and should be better than hyperthreading because it's dealing with physical threads and not splitting instructions into logical threads.
I'm no expert, so I'm probably wrong.
in old days (5years? or more) as i mentioned (in opengl) under 3dsmax i was creating additional layer for rendering 'super-threading' on top of multi-threaded rendering.
This way i was eliminating this kind of problem; but it hurt on calls (but just a little) as i just created bottleneck.
I always thought to myself why can't we visualize the threads as 1 and have hardware figure it out on its own. It would be much faster than by programming my own api to follow exactly that gpu. I see that Async is kinda doing that, but it seems its perspective is to lock you down to specific hardware.
Maybe thats just me, but thats the feeling i have. As programming would have to differ for different gpu with different amount of threads.
We also know that no task is the same, and takes same amount of time to compute, so to utilize it as should we need at least 2 lanes per thread, preferable 3 lanes (but that may require 3d transistors?)
still just my feeling, i'm noob also :(
Aha! AnandTech has the slides and a much better explanation of what all is happening with the Async Shader model.
http://www.anandtech.com/show/9124/amd-dives-deep-on-asynchronous-shading
So what's actually happening here is that instead of completing things in a string like both of my ideas were, it's actually completing different strings all at the same time. So to use my 8 thread model.
INPUT --- #1 - #7 - #8 \
INPUT --- #4 - #5 -------- OUTPUT
INPUT --- #2 - #6 - #3 - OUTPUT
So what happens there is that threads 1, 7, 4, 5, and 8 work on two parts of one output, and 2, 6, and 3 work on a separate output. Of course they are all tied to their own ACE, and those are working with the Command Processor. I didn't model the ACE input system for the sake of space...
Pretty cool. I love GPU architecture. :P
that makes me wonder will nvidia support them right off the bat with dx12 or it'll come with their Q2 new gpu in 2016?
(that made me wonder, where will nvidia produce their die's?, they sued samsung and most of companies doing them, same with HBM how will they implement v2? this is fishy and funny at the same time)
From what I read, Maxwell 2 will support it. I haven't seen nVidia mention it, but the article states as such. So, Titan X and beyond support the async model... Interesting that AMD supports it all the way back to 2011, even if the cards from that era (GCN 1.0) support it to a very limited extent. It's like they knew this was going to become a thing...
All great stuff. If DX12 and Vulken both sport the features they advertise, the GPU landscape is set to be changed immensely. This is actually shaping up to be the biggest leap forward in GPU tech since programmable shaders in 2007.
I love it. :)
Remember kids... this has been in graphics cards since 2011! :D
gcn = sadly way ahead of its time with no current software supporting its 'fancy' stuff
I think dx12 is going to make amd gpu's + cpus (6cores and up) shine.
price performance wise they are easily going to be back in the game (for gaming at least).
core count beats single thread ipc on dx12 (from everything I have read so far)
considering that ms has console with amd apu i think they (microsoft) should work hard on it. So its possible, consoles will gain some performance by async shaders.