Threadripper and Geometry nodes in Blender?

I find odd how many gaming benchmarks there are for $5000 Threadrippers. All the while I can’t find a single person that would publicly post anything related to Blender geometry nodes performance, for example.

Now, some of you might think of Blender and procedural modelling as something maybe not as interesting, considering industry standard apps like Houdini and similar, but Blender has done fascinating steps towards becoming a solid replacement option, in many areas, imo. Sure, it doesn’t excel probably in any category (when compared to it’s for-profit counterparts), but as a swiss knife, you can really leverage it for many things and to a great extent.

So, is there anyone with 32/64 core Threadripper running complex geometry nodes in Blender? If so, please, share your experience! I know that not everything is always using every core available, but more often than not, my 7950x is really flexing everything it has. And it’s great. But I have quite ambitious plans for my workflows and I wonder how far I can realistically push it, with the right hardware.





I’ll show you my use case, what I’m experimenting with currently (and what I keep learning on and expanding upon with every Blender release ever since 3.0 a few years back).

This entire map was a testing sample for my private procedural modelling tools I was working on for the last 2 years, ever since I decided to take Blender seriously.

To keep it simple, the idea was to create something like Unreal Engine map editor, but tailored for export into RAGE (GTA V) or any other game engine later on (for my own games). And with certain procedural modelling benefits you can’t get with strictly splat based terrain, like in traditional level editors. The result being fantastic creative control even over macro details of the terrain and all objects generated alongside it. Huge inspiration behind all these systems were GDC talks (or their equivalent) from employees of companies such as Ubisoft, EA, Epic, CDPR, CIG and Rockstar, to name a few.

Procedural rock formations

Doing funny things with roads, bridges or tunnels (to see how my algorithms will behave under various conditions)

Seemless merging of terrain with other objects, like generated rock formations, roads, buildings etc.

(view from bottom up/inside out)

Procedural distribution of grass and stone instances, based on modular rule set and capability of mixing unlimited foliage layers together seemlessly.

Decal generation

FiveM (GTA Online, but with mods) screenshots




2 Likes

Blender has improved a LOT.

I also find it amusing how many game benchmarks we are seeing. I think most of the problem is the reviewers have no idea what someone would use one of these machines for. Also funny is how they seem to have no idea how to configure it for game performance.

I just completed my order for a pro today. Should be here in about 4 weeks.

I plan to use it for unreal development and generating large terrains using proprietary compute software. I was still on Saphire Rapids because the AVX-512 of Zen4 sucked.

1 Like

if youd like to give me a step by step of stuff to download and run to test I will :slight_smile:

1 Like

That would be great!

You can start by downloading Blender and then download demo files here.

In particular, I’d love to see you run simulations like Index of nearest, Mesh fracturing and especially my own file with some really heavy computations.

Original Blender files with Simulation nodes depend on Scene time. Activate the simulation by hitting spacebar. To reset the simulation, you can either drag the Scene time cursor back to 1st frame or use shortcut L Shift + Left arrow and hit spacebar again.

Simulations automatically bake data on the first run and most of the time every other run is much faster, because no actual simulation is running anymore. It’s just an animation of prebaked simulation at this point. However, if you’ll change anything within the simulation node tree, bake will cancel and the entire simulation will have to run and bake once again. This baked data is visually represented by little, purple bar at the bottom on Scene time tab.

For basic interface movement within simulation node environment, click and hold mouse wheel. To measure how much each simulation/geometry node tree takes to execute in milliseconds, you can turn on Timings (hit that little, white arrow pointing down, on the blue background). This is file depedent setting and will have to be turned on with each file individually.

image


For Index of nearest file:

image

At the top Geometry node window, you’ll see Distribute Points on Faces node, second from the left. The default value is 2500. After you run this, change it to 20 000. Run the simulation from the beginning again. I’d like to know how Threadripper’s performance will vary.

The simulation seems to be running constantly, without being automatically baked for next run. Not sure why is that.

2500 (default) with 7950x: Results to ~27ms during the simulation, Task manager shows 68% of CPU used.
20 000 with 7950x: Results to ~170ms during the simulation on my system, Task manager shows 50% of CPU used.
Over 20 000 value seems to be visually breaking entiry simulation…


For Mesh fracturing file:

image

Same procedure, open file, hit spacebar. Only this time, the simulation will also bake during the first run and every subsequent run will run off that baked simulation. But the first one results in a growing CPU performance spike like this one:

{250EE799-4DFA-4E26-BBC8-8838BE58509E}

Threadripper will most likely have this curve much lower, but just for science, you can simply crank up subdivision in the Grid node for 2 to 200. Let’s see how Threadripper reacts.


Lastly, my own file I prepared specifically for you.

I prepared several computations based on my own modelling workflow I already demonstrated at the beginning of this thread. To begin the computation, select one of several options from the drop down menu highlighted in purple, the one with default option “None”. You can go one by one to try them out.

Some will only take a few seconds. Some might take up to several minutes and push close to 200Gb RAM.

Besides total computational time, I also wrote down individual nodes’ times, in case you’d feel like writing them down as well for me or anyone else reading this thread. Those nodes in question are highlighted in teal and yellow colors. 2 of those computations rely on 2 heavy node calculations. To made them more clear to see, I used teal.

There’s not much to show off visually, because I wanted to keep as uniform scaling as possible when preparing this file. So, not as interesting to look at as previous examples. But definitely the most valuable in my eyes :slight_smile: .

For example, first “Remesh” calculation generates excessively large amount of geometry, specifically 210 124 802 vertices and my system is unable (or gives up) to render it. I wonder, if your system will be able to or if this is Blender’s own limitation. Please, let me know, if it renders all that geometry for you or not. And computational time, of course.

Next two “Mesh boolean” calculations consist of “Manifold” and “Exact” variants. First is very fast (and new) and other… Very slow, but precise.

“Sample nearest surface” and “Store named attribute” are nodes I often use for transferring UV maps between models, for example. Very handy, but also pretty demanding. Especially, if used in tandem or on multiple objects at once.

“Geometry proximity” is another useful tool to have. Speaks for itself, I hope. In this use case, highlighted “Set position” node is where this load will show on.

“Blur attribute” is like Swiss knife. Can be used to countless things. In this case, I used it for relaxing out object’s vertices. This is often useful, if you want to smooth out objects in various ways, with weight masks applied etc.

“Distribute points on faces” seems to be running only on a single thread. I wonder how Threadripper will manage? Faster? Slower?

“Per each element” - this can take single threaded nodes, like “Distribute points on faces” and make them work in paralel, it seems! Can all 96 Threadripper cores be under full load? By default, number of runs is set to 192. On my system, when running only one iteration, it jumps to ~10% of CPU load and jumps around 2-3% with each additional iteration.

Here’s an example of first two computations and the Menu switch for choosing between them.

Extra notes from within the file (with timings for anyone interested enough to check):

I took some time to gather all ideas I hope will be relatively easily digestable for you. But I also have to admit I suffer from tunnel vision. For me, many things are quite natural as this is my daily environment. So, if you’ll get stuck on something, please, let me know. I appreciate your time!

Idk if you are familiar to blender script. A good blender script might be much simpler for wendel to test on different setup and be more respectable.
I could try to help you but really today I’ve found plenty llms that can get you some good snippets (for my use cases).
There’s so much you can do with blender it’s crazy

Thank you for bringing us such an interesting topic to cover, providing thorough resources to get the tests done, and patiently awaiting the results!

On behalf of Wendell I will now provide the benchmarking results we gathered from Blender’s official benchmark, index of nearest, mesh fracturing, and the thorough and wonderfully informative file you provided!

Let me know if you have any comments or questions on the results/testing, thank you for being a member of the community!

Benchmarks

Blender - Official Benchmark



Index of Nearest


Mesh Fracturing



Viewer Submitted file









4 Likes

Thank you very much for these results!

I have to say, I was hoping for more pronounced performance differences, but I’m also glad I’m not missing out on some insanely large power delivery with those pretty expensive CPUs. I might end up buying one of those, based on refined workflow in the future. One might really pin point their speciality and base their HW on this.

I expected Epyc CPU to be more behind than up front, but that just goes to show how testing is important under right conditions! I had no idea! Already was under impression Blender is all about single core performance (outside of CPU rendering I believe). I’m also curious why Epyc has such a gap on Threadrippers (in some cases) despite having similar core and thread count to 9995 WX. Any idea why? Could that be because of doubled 12 channel memory?

I know about several crucial updates for Blender over time that have fundamentaly changed performance of certain nodes. For example, in 4.4, there was updated Triangulate node which sped up to 30-100x (how that is influenced by various CPU architectures I don’t know). And there was a few similar cases I can’t remember right now. I’d be happy to gather all relevant information and if there will be some major changes in geometry nodes over the next year, maybe you could run my bench file again? Possibly with some additional tests even.

Although I’ve read topics here close down automatically 9 months after the last message. Could this thread stay opened regardless? I don’t see that message here anymore. Is possible this planned closure is disabled already? If so, thanks for that!

Edit.: Am I looking at correct results for Blur attribute testing? Because looking at scaling per each CPU in your graph, 16 core 7950x might be right behind, but my testing repeatedly shows ~130 000 ms. While your results start at ~13 000 ms. That’s 10x faster for just double the cores on 9970x. Similar case with Per each elements testing. I wouldn’t be surprised, if it would be a typo :slight_smile:.

1 Like

Your suspicions were correct about that graph being off, The results of the blur attribute stage got mixed up with geometry proximity results, thank you very much for pointing that out! I have edited my previous message to correct the mistakes.

But yes, the improved memory capacity of the EPYC CPU is exactly why we tested it! The results suprised me as well, but the doubled memory amount truly has a strong affect on the results.

As for your moderation question I’ll ask someone with more knowledge on the forums moderation, @Level1_Amber this user has a question about forums closing after 9 months.

1 Like

Thank you for those corrections!

the doubled memory amount truly has a strong affect on the results.

Just so I understand correctly, Blender was utilizing more than 256Gb with Epyc CPU? I though RAM usage will be universally same across different platforms and if RAM will make any difference, it will be due to speed or latency. But not due to capacity.

Everytime I pushed Blender over 192Gb utilization (give or take), my PC would start lagging massively and eventually Blender would crash. But that’s my system’s hard limit. So, I assumed my test file will never go over 192Gb utilization on your workstation platforms. So, is it that Blender can simply utilize more RAM capacity on different platforms or is there something else I missed?

Maybe you meant bandwidth and not capacity?

I noticed some “2x” Epyc CPU entries in Blender open data benchmark website. You don’t happen to have some dual socket motherboard for two Epyc CPUs lying around, do you? :slight_smile: (I didn’t even realize this is a thing up until now).

he means that each chiplet in the 9575F has 2x GMI links. i.e. every core in 9575F can, theoretically, use double the memory bandwidth of their threadripper cousins. The platform only had 384gb memory total which is only 50% more vs threadripper. (also the TR Pro platform had 8 dimms half as big as the 4 dimms in TR non-pro)

So it doesnt necessarily use more than 256gb but it can access what memory it does have access to with less of a bottleneck. the cores run at 5.0ghz as long as there is no power or thermal constraint, so lightly threaded @ 5.0ghz vs 5.4ish on threadripper can make a difference, too.

The 9575F is “Only” 64 cores but 12 memory channel platform PLUS, at least until 12 memory channels is saturated, each chiplet-to-iodie connection is double the bandwidth of most other epyc cpu configurations.

the 9575F is an incredibly awesome and special monster in its own right. Sometimes it helps tremendously. Sometimes it doesn’t make much of a difference.

2 Likes

Hm, well, since it is 9 months without a post that it closes, that would indicate the topic had died and should probably close?

That’s understandable. But I was also suggesting a possibility of revisiting this very thread with either same or maybe even new tests due to possibly major Blender updates (and also new, crucial nodes). As I explained in an earlier post, there were some significant performance updates for geometry nodes (and new, major ones added) in the recent past. And instead of asking for new tests every 2-4 months, I figured maybe a year down the line (if I’ll see need for it), I could gather all mention worthy updated nodes, so we could see, if maybe behavior of some CPUs changed. It might not be just about universal speed gain. So I won’t have the need to create another thread extremely similar to this one. We could simply keep it all inside this one thread.

@SgtAwesomesauce is there a way to keep this thread open longer?

I’m curious about what’s going on in Index of Nearest and to a lesser extent Mesh Fracturing where the 9575F looks like it’s getting disproportionately beaten. I think the 9575F has only about 7% slower turbo clocks, so I don’t expect that much of a handicap even on single-threaded tasks. Is there something about the platform that might be dragging it down?