Questions that nobody has a straight answer to regarding VRAM and their bus-widths

NvJunkie · December 30, 2013, 12:17am

I've posted a question onto the GPU forums on both here and at Tom's Hardware that I can't seem to get a good answer to.

I was wondering if any of you knew what kind of configurations of video memory capacity and bus-widths are best for higher resolutions. Here's a link to the thread on the forum -> https://teksyndicate.com/forum/gpu/trying-figure-out-optimal-vram-configurations-sli-high-resolutions-57601080/166227

So. As per the post, here are the memory/bus configurations that I would like to see clarified as to which resolutions they would be best suited for.

384-bit@3GB, SLI & non-SLI @ 5670*1080 (abbreviated to NvS for Nvidia Surround), 1440, 1080

256-bit@4GB, SLI & non-SLI @ Nvs, 1440, 1080

256-bit@2GB, SLI & non-SLI @ NvS, 1440, 1080

Hopefully you guys can answer this for me or at least get this specific issue tested.

Thanks.

JefferyD90 · December 30, 2013, 12:22am

I would opt for option #1. It would probably give you a solid delta over the other 2. Even tho option #2 does have more VRAM so maybe it wouldn't have to switch out data as much, but I just don't see that as reason enough. Also keep in mind that at this point the PCI-E bus limits your data transfer capability.

JefferyD90 · December 30, 2013, 12:25am

Another thing, the memory has a lot to do with the video card, yes... but the CHIP itself means SOOOOO much more. Of the options you have given us, I would grab a single 780 Ti and call it a day. Here in 6 months or so you can add another one at a fraction of the cost and be WAY better off than before.

So in a more conclusive answer: you are asking all the wrong questions. Memory bandwidth is literally only 10% of the equation.

SupaMesican · December 30, 2013, 12:30am

I must ask, is there a reason you are not considering the 290? It has a larger memory bus and more vram than the 780, its a tad slower but the extra vram and mem bus could help negate that at higher resolutions.

JefferyD90 · December 30, 2013, 12:53am

I totally agree with this statement. I would rather just get a R9 290, they DO perform much better at those higher resolutions. Hands down.

NvJunkie · December 30, 2013, 1:01am

I wish people would just read the damned question through. I'm not looking for a card.

I'm looking for a technical answer between the correlation of performance based on ratios and configurations between the memory bus width and available video memory for high-resolutions.

Reason I'm asking this is because I know if I add another GTX 660 Ti to my system (at that point having a 3-way SLI system), it won't do a damn thing because there's not enough VRAM nor bus-width to run games at those resolutions anyway.

If anything, I'll be picking out either a GTX 780 SLI system or just a single GTX 780 Ti.

So please, answer the damned question.

wendell · December 30, 2013, 1:31am

Well,the answer is that it's complicated. :)

Strictly speaking, what are you are asking is very, very implementation specific. There are a lot of "buts..." and exceptions. And there are two components to your question -- a hardware component, and a software component. The software component is probably at least (if not moreso as AMD's Mantle is teaching us) as important as the hardware component.

I'll speak only to the hardware component. Usually, for memory bandwidth, you're talking about bytes per second. Throughput. But even between 256 and 384 it is not an automatic 50% increase in terms of bytes per second. Actually, different vendors/article writers/benchmarkers disagree on what the thing to even measure is -- could be raw throughput as measured by byte-for-byte copies from one place in ram to another, or a strictly read-and-discard-as-fast-as-possible operation or even a more exotic 'real world' test like how fast memory could do (for example) three reads then a write.

Think about the mechanics of stream processors here, and how fast data can be loaded and unloaded to stream processors. And sometimes the memory is used for vector operations, and sometimes for texture operations.

H/W Designers have been careful (to a fault) to optimize for One Resolution, or another. Very fast video cards from even one generation ago suffer at resolutions like 2560x1440 more than the math would seem to indicate that they would because of the optimization (software too) that went into stuff. Sometimes having extra ram for more textures helps, but that's another conversation.

There is also latency, or delay, in how long it takes from the time you tell memory to do stuff to the time that it actually happens. Again, "real world" faster clocked (or lower latency) memory at a 256 bit bus width may outperform 384 bit bus widths for "real world" needs, while exceling at synthetic benchmarks. Some of the 192 vs 256 vs 384 debate really speaks to the hardware architecture and chip revision in play.

Here is a little more info about GDDR5 and how it can be organized:

http://www.elpida.com/pdfs/E1600E10.pdf

It also talks a little bit about the command latency and overhead of GDDR5 vs DDR3. Sort of interesting. You'd have to dig into the specific cards you're looking at and see how they've elected to arrange their GDDR5 configuration to know if they're optimizing for low latencies, ease of manufacturing or capacity.

Then, SLI on top of that.. how fast/low latency is the SLI interface? Can the work reasonably be divided among multiple GPUs? Is the SLI/Crossfire link both fast enough (bandwidth! and low latency!) to manageoperations across multiple GPUs. At what point are there diminishing returns?

Skipping a bunch of stuff and doing some back-of-the-envelope I think the AMD 290x has more room for optimization and growth (from an architecture standpoint) but the 780Ti or 780Ti SLI system will be nothing to sneeze at. Neither can handle 4k gaming, both can handle 1440p gaming. The 290x is likely to do better at higher resolutions from a purely hardware perspective.

NvJunkie · December 30, 2013, 1:46am

Thanks for answering my question that's been wracking at my brain for the past week. It was a little long-winded, and I understood that there wouldn't be a nice cookie-cutter answer to it, as I knew there were more factors to consider than just the two, but it's nice to know that it can be explained.

Thank you.

jon666 · December 30, 2013, 3:01am

Holy shit my head hurts. So does data rate depend on how the command to memory is split up? Is this why dual channel, tri-channel, etc on DDR3 increases in speed? And is that comparable to the clamshell setup in the article Wendell posted? I don't pretend to have even a basic understanding of how this works, I'm just trying to wrap my head around the concept.

jon666 · December 30, 2013, 3:19am

This is leading to more questions for me. SLI/Crossfire doesn't double performance because of latency? Or is it something else?

Zoltan · December 30, 2013, 4:45am

As Wendell said, software and specific design are very important factors. The crucial question is always: what is it for, with what software and applications will it be used. nVidia and AMD cards are not designed in the same way, bus width doesn't mean the same thing with AMD/Intel and nVidia. For those using linux, they have discovered that AMD/Intel doesn't unlock the entire GPU memory for buffering or pipelining OpenCL calls when the application that is making the calls is programmed towards nVidia cards (and thus optimized for CUDA logic instead of OpenCL), That has to do with the way AMD/Intel cards use the memory, which is basically like a CPU would use memory, whereas nVidia cards don't follow the normal logic, but use their own system, with less support for Khronos spec API calls. Because the memory bandwidth of AMD cards is bigger, AMD cards will often have less memory available for pipelining workload for streaming processors in compute applications that aren't open source compliant, but they will provide more ray graphics power. In applications that use the industry standard Khronos spec APIs as a guideline, AMD cards perform much better than nVidia cards (on average about 5 to 20 times better), because they can fully utilize the larger bus width and the normal instruction logic. nVidia uses faster memory than AMD, but the data is squeezed through a narrower bus, so the data throughput is about the same or less than with AMD cards. nVidia cards use their own proprietary machine language, called PTX, which stacks up instructions in a proprietary format and parallel process them, which leads to very good benchmarks, but unfortunately, PTX is not compatible with any open source format, so the closest nVidia can come to working with open source formats like OpenCL and OpenGL, is to incorporate a "trojan horse" into an open source compiler (LLVM/Clang, the Apple compiler, which is liberally licensed with an Apache license, because the GNU compiler still doesn't accept the nVidia trojan horses, and hopefully never will because it's a very bad software practice), that compiles closed source binaries into the open source kernel that translate the Khronos spec API calls into PTX in real time, and ignores the non-supported calls. nVidia cards are basically typical Windows/DirectX cards. Intel and AMD graphics cards perform better than nVidia graphics cards in applications that are optimized with the industry standard open source Khronos spec APIs, nVidia benchmarks faster in Windows. In linux, 4k and above resolutions are pretty common, and the only thing missing right now is support for OpenGL 3 in the open source drivers for AMD and Intel. That is still a work in progress, and a final version is expected for mesa 10 release, which is in a couple of months. nVidia has clearly demonstrated that they will not support linux and open source unless linux and open source allow nVidia malware binaries into the open source code, and that they will continue to rely upon directx and CUDA, which allows nVidia to produce cheaper cards and sell them at a higher price because they benchmark higher in windows. AMD cards are much more expensive to make, because higher bus width means more memory modules, which are expensive, but also means a beefier power supply, because fast VRAM uses a lot of power. Add to that the bigger lithography of AMD chips, and higher transistor counts for the sales price, which makes for a larger piece of silicon, which means less units per silicon wafer, which means more expensive production costs... AMD has a very small profit margin, nVidia has a very large one, but the customer gets more for his money with AMD hardware, and in linux, and with HSA coming in a few months, that's a big thing right now. According to NPD, 21% of all laptops sold in the US in 2013 were Chromebooks, which are linux preinstalled machines that have no Windows on them, have no NSA or Microsoft infected BIOS (they all use opens ource coreboot BIOS images), and they contain no binary drivers or binary kernel modules (aka proprietary drivers). Noy you know why Microsoft and nVidia have mounted those huge slurring and sabotage campaigns in 2013 against Google, linux and AMD... do the math, the US is the market with the least linux-acceptance in the world, and in 2013, one if four new laptops was not running windows but linux as PC operating system, and SteamOS wasn't even out yet...

So yeah, bus width doesn't make much difference in closed source sabotageware, but in open source software, it suddenly makes a lot of difference, and with HSA, the difference will become even bigger, as GPU cores will have to access the system memory directly and use the GPU memory to pipeline instructions to boost the system performance, as the VRAM is much faster than the system RAM. At that point, the GPU's will need even more VRAM, or lose performance because they have to use the system RAM more. With HSA systems, for applications like Lightworks or Maya or 3D gaming engines with a lot of next-gen functionality, the typical performance system two years from now will probably have 8 GB of system RAM and 8 GB of VRAM, to get the most out of HSA acceleration and still offer fast 4k graphics, however, a lot of systems (Intel and AMD both are evolving towards APU's) won't have dedicated VRAM anymore, and Intel and AMD will probably offer 512-bit bus width GP-GPUs or co-CPUs for a lower price so that people can put multiple ones in a system to scale the performance, but each card will have less VRAM (which also keeps the price low), and the HSA systems will mainly rely on a large 16 GB or more DDR4 system RAM instead.

SheepInACart · December 30, 2013, 4:53am

-1 here, I agree the results of that question would be interesting, but if people are discussing the wrong thing its more likely an issue of not understanding your intent than deliberately side tracking the discussion, so consider either clarifying calmly or rewording the original question. Simply adding bolded "read the damned question" adds a level of aggressiveness that's totally unnecessary and hardly convinces people to give up there time to help you.

In answer to your question though, I'd say that FPS improvement and higher settings in games matters just as much as running multiple\large resolution monitors for gaming experience, so while memory bus width and Vram do allow higher pixel counts and thus are worth thinking about, they are still a minority factors in choosing a video card. For this reason I don't see AMD's clear bus width and vram advantage in high end graphics solutions to be as bigger factor as the relative performance of the GPU on the cards or the level of optimization for a given resolution or set (eyefinity ect). Also remember you can run a single 1440p or three 1080p monitors off of a 2gb card without any dramas, so even the gtx760 has enough memory unless you want 3x 1440p or 4k, but till then vram and bus width are generally a non-binding constraints. Thus my pick is a single gtx780ti if you are choosing high end video card setups, you can always upgrade to a SLI solution in future when card prices have fallen.

jon666 · December 30, 2013, 11:23pm

Looked up HSA http://hsafoundation.com/ so if you are able to dumb this down does this mean that systems will use ALL available memory if using HSA? Does this mean, other then what Linux or whathave you is using up, when you start up a program it will have no problem utilizing all of the RAM for whatever it wants?

I guess what I am asking is will a program pre-load as much as it can on VRAM, and RAM? At least that is what I think you said with "GPU cores will have to access the system memory directly and use the GPU memory to pipeline instructions to boost the system performance, as the VRAM is much faster than the system RAM. At that point, the GPU's will need even more VRAM, or lose performance because they have to use the system RAM more"

SupaMesican · December 31, 2013, 12:31am

So much information to take in in this thread.... I don't know if I should upgrade my gpu or ram now....