Super-single instruction, multiple data (SIMD) Patent of AMD's from 2016 - GCN replacement

While watching the stream this afternoon, I noticed it mentioned GCN replacement. Now, I’m not the best person to divine the finer intracacies on this topic and the potential function of this design, so I ask for people to chime in that may be interested and more capable at that. With that said, the information is presented “as is and with all faults.”



From patent

Id.

Id.

Id.

Id.

Id.

Id. - This can help finding the text of the patent application with the US PTO, or you can use these sites that reposted the content of the patent:
http://www.freepatentsonline.com/20180121386.pdf
https://patents.justia.com/patent/20180121386
http://www.freepatentsonline.com/y2018/0121386.html

Official government site of patent application
http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PG01&s1=super-SIMD&OS=super-SIMD&RS=super-SIMD


Article discussing patent and one person’s interpretation of the patent.

Here are a couple patents some have suggested might be worth reading:
Low Power and Low Latency GPU Coprocessor for Persistent Computing
http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PG01&s1="gpu+coprocessor"&s2="persistent+computing"&OS="gpu+coprocessor"+AND+"persistent+computing"&RS="gpu+coprocessor"+AND+"persistent+computing"

Identifying Primitives in Input Index Stream
http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=2&f=G&l=50&co1=AND&d=PG01&s1="identifying+primitives"&s2="input+index+stream"&OS="identifying+primitives"+AND+"input+index+stream"&RS="identifying+primitives"+AND+"input+index+stream"

Identifying Duplicate Indices in an Input Index Stream
http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PG01&s1="identifying+primitives"&s2="input+index+stream"&OS="identifying+primitives"+AND+"input+index+stream"&RS="identifying+primitives"+AND+"input+index+stream"

Here is a video discussing it a bit at RedGamingTech:

List of AMD recent patent applications:
https://companyprofiles.justia.com/company/amd/patents/application

And an article discussing how AMD put like 2/3rds of their engineering resource on Navi, which was supposedly designed for Sony and PS5:



There isn’t a lot out there other than speculation, but this is the rumored GCN successor (which makes sense, hence reporting on the rumor). Anyone that wants to chime in on the speculation, please, feel free!

@wendell - this is what I was referring to in chat.

7 Likes

Woah. You really went all in with this.

Here is my updated speculation (a response to a video posted by AdoredTV):

Aside from the main patent, I mention one other patent that has interesting implications (aside from data handling changes, which is a bit beyond me as I’m not a programmer): a GPU Coprocessor for Persistent Computing. Now, this needs read together with a couple other news stories to shine a light on what may be happening over with AMD (my full speculation for the added facts). Suzanne Plummer, a key architect in Zen (as seen here:


) was moved over to RTG following Raja’s departure.


In this last article, it mentions that two focus areas are “significantly improving the clock speeds of AMD’s GPU designs and pushing them to be more power efficient.”

Now, about 3 months back, the RTG SVP David Wang gave an interview to PCGamesN.

In that interview, he discussed the use MCM approach and implications on future products, including that “‘we’ve yet to conclude that this is something that can be used for traditional gaming graphics type of application,’” “‘The challenge is that unless we make it invisible to the ISVs [independent software vendors] you’re going to see the same sort of reluctance,’” “‘the GPU has unique constraints with this type of NUMA [non-uniform memory access] architecture, and how you combine features… The multithreaded CPU is a bit easier to scale the workload. The NUMA is part of the OS support so it’s much easier to handle this multi-die thing relative to the graphics type of workload,’” and that “anything’s possible” when asked if they could make an MCM design invisible to game developers. Now, elsewhere in the article, he hinted that commercial will be getting multi-die GPUs since they do not care about NUMA on GPU.

So, to tie these all together, AMD may be designing a chiplet powerhouse that won’t hit consumers for awhile, but may be the Arcturus architecture hinted at beyond Navi. Super-SIMD is like mixing the simple instruction multiple data with side ALUs supporting the main ALUs, new cache handling, etc., with the Very Long Instruction Word that was used before going to GCN. This would be combined with a graphics coprocessor that is rumored to be a type of AI ASIC, which could do denoising similar to a tensor core. If they additionally used an uncore chip, which would incorporate something similar to what is rumored on Zen 2, or like the HBCC, while using the information protocol they are trying to develop that was mentioned in your discussions of interposer topology (something AMD proposed to combat Intel giving an AIB tech to DARPA;




(AMD - also discussed in part in some of the other data patents that I brushed aside earlier)). There is a question of if for CPUs centralizing the memory controller, rather than having each die have its own, would work as well with a GPU and the technical aspects and complications of doing such (hence the “anything’s possible” statement from the SVP quoted earlier), but it leaves open a possibility for adaption for such an uncore, combined with HBCC which allows for NVRAM to be used with the regular vRAM, to allow a hide from the ISVs programming while being driven through other means, like a driver.

Maybe I’m just being to pie in the sky hopeful on this and seeing what I want to see since these pieces are all sitting in front of me. But, this is where I see things going in the future, along with the GPU dies sitting on an interposer to reduce latencies for inter-die comms and accessing memory and the coprocessor. But it would be interesting to find out if since Su mentioned wanting to change the WSA if that would include guaranteeing uncore on 12nm/14nm, any legacy products, but then add GF interposers to what will be produced (


) to make sure compliance roughly with any prior AMD obligations. As I said, I am likely being too hopeful. But time will tell.