Catsay plays with Xeon Phi

You’re doing the Lord’s work here sir! I’ve been interested in these Intel coprocessors for years now. I’m really surprised to hear that the software support for them is so unstable.

HPC labs that shelled out the big bucks for these must have had some Intel software engineers on hotline when the scientists were trying to deploy code onto these things. I can picture the conversations now, “We paid you a small fortune for all of these, now get our simulations to f*cking work!”

Oh god, what a time to be alive. This is great.

I never really posted this. So let me add it now.
I actually got somewhat optimized OpenCL code execution working on this thing a while ago.

OpenCL support with Xeon Phi 5110 under windows 10.

First test run you can see it chugging the power. Getting any sort of memory bandwidth out of it with OpenCL kernels is honestly tough. It fights you every step.

Single precision compute numbers

Double precision compute characteristics, you can see it take a shit when running 16wide doubles (1024bit), 8 wide (512bit) it still managed.

Memory transfer operations (not great not terrible)
image

Compute wise this is pretty much maxing out what the card is capable of.
Memory bandwidth wise there’s a lot left on the table.

But if you’re seriously considering using a Xeon Phi for any modern work
FOR GODS SAKE STOP
USE A GPU

9 Likes

Hello there Phi-Natics :stuck_out_tongue:

Not a Phi-natic but still interested in how this all comes together. Have you found modern hardware to out perform this or are there still use cases where this setup would be viable in 2023?

Not viable at all anymore.
Desktop CPU’s outperform this now.
Nvidia Tesla K40’s outperformed this by A LOT almost a decade ago.

It’s just a novelty item now.

2 Likes

I remember reading about Knight’s Corner (or Landing?) many years ago back on the OMPF forum. One of the devs was boasting about “1 billion rays per second” on a test scene rendered with one of these cards.

If it’s not too much trouble, could you try to get a ray/path-tracing benchmark to running? Maybe something like LuxMark.

I’m quite curious to see how they compare to desktop Ryzen or Threadripper CPUs that we have now.

Thanks for demystifying how this hardware works! I’ve enjoyed reading your posts on it.

2 Likes

I include LuxMark results in this post, and it is not favorable to Xeon Phi because it targets SSE2, and so is using only a fraction of Phi’s 512 bit wide vector units. Intel’s OpenCL driver produces even less favorable results.

Mitsuba 2 shows very good results, although it is a pain for artistic use. You also have to be using the Intel compiler, modify some make files, and so on.

Untrue, if used with properly vectorized software. Though examples are admittedly few and far between. Generally if you’re making use of Intel libraries like MKL or Embree, you’ll see good scaling.

2 Likes

That kind of sounds like Itanium all over again.

That requirement is no different from a GPU, or for any HPC application. Spaghetti code is going to turn out poor performance numbers no matter where you run it; consumer x86 processors and applications are less sensitive in this regard.

With minimal changes I was able to make a Knights Landing Phi system blow 64-core Epyc Milan out of the water performance wise using perfectly normal open-source x86 code, while using less power. For this reason, Xeon Phi was used in plenty of HPC and Top500 machines to great effect.

Itanium’s performance was not convincing due to compiler shortcomings, and its technologies vanished once it failed to gain significant market share. In contrast:

  1. AVX 512 is still included in Enterprise-grade processors and offers tangible benefits to software that takes advantage of its extensions.

  2. The Knights Landing core-to-core “mesh” type interconnect formed the basis of Skylake and later Intel processors, supplanting the previous ring-bus interconnect.

  3. The implementation of the MCDRAM on-chip memory caching scheme has been reused for the recently released Xeon Max series of processors.

Because Phi’s performance is convincing, and because technologies pioneered with Phi are still in active use, it is undebatably unlike Itanium in significant ways. Just like a GPU, “specialized hardware” is not synonymous with “bad hardware.”

I don’t do HPC yet, but I cannot agree with that statement. I have seen Itanium in the wild until about 2017. The same arguments were made. Don’t get me wrong, it was a cool experiement and a lot of lessons learned from Itanium went into future technologies. Compilers only benefited.

But thanks for the information. It is all really neat. I am doing a masters’ degree in computer vision, machine learning, and artificial intelligence. It would lile to get into research eventually, so HPC actually interests me.

Thanks for contributing your benchmarks of the 7250 KNL.
I appreciate reading them and your effort. However, I’m still not convinced that the existing Xeon phi chips have a better use case/are better to use than GPGPU hardware when it comes to efficiency, ease of use and the available software ecosystem.
My experience with the software ecosystem surrounding Phi was ‘not ideal’. Keeping in mind this thread and most of my experience was based around the Knights Corner (k1om) architecture hardware. Especially since I no longer really had any of the KNL and beyond platform hardware available to me from my old work and the boards still remains quite costly.

I agree. Between having to interface with the add-in cards through MPI, being limited to 16GB of local memory, and imperfect implementation of compute frameworks like OpenCL, I don’t see much value in Knights Corner generation Phi.

The bootable form factor of Knights Landing, its ability to supplement the 16GB cache with DRAM, far superior scalar cores, and full AVX-512F compliance mean that there is a small (but not non-existent) body of software that takes full advantage of the hardware’s features without modification.

3 Likes

I recently became the excited owner of a 7120A, and I’ve been trying to get the drivers for linux back up to snuff, so I’m curious how your efforts with linux went. Did you only ever get the card running on windows?

I’ve managed to get the driver running on modern linux! I’ll try to upload my patches at some point, after i get this cleaned up a bit lol. Enjoy a pic of my jank setup as well :slight_smile:

4 Likes

I presume you’ve patched and reintegrated the formerly deprecated in kernel mic module? This was fairly limited from the start and had multiple issues even loading a OS into the mic processor.

I was busy with the Intel MPSS stack which was provided as source from Intel, but never integrated to mainline in any way. That code is far more capable and about the only one that actually let one do anything meaningful with the hardware.

As for how my efforts went, I had the in kernel mic module bits working but it proved fruitless. since it was missing so many elements.

I then went about patching the MPSS provided modules and support utilities, but it became a never ending nightmare of patches since the last kernel that actually supported the MPSS parts properly was 3.18 or something (not sure atm) and since then it had been cobbled up with bodges to port it to newer 4.x kernels (barely). By that time there was so much legacy cruft to figure out that had changed I became very disheartened to put in the amount of hours required to fix it.

Instead I resorted to just using the card in a Virtual Machine via PCI-e passthrough with an old kernel.

Just wanted to follow up and see how things are going.
Still interested to see what you’ve managed to accomplish.

1 Like