Liqid Fabric Switch

please any in this forum - has liquid hardware that they have in production or in their home lab. I like to know how this hardware performs. I try to contact liquid and watch the whole youtube liquid channel for an in-depth hardware review but nothing.

Interview : https://youtu.be/zXlWeVrsk14

OK. it seems like it allows you to disaggregate PCIe which gives you more flexibility in terms of PCIe attached devices.

Still watching… not sure what the tradeoffs are (e.g. any additional latency due to PCIe device going far away, or if there’s any scaling bottlenecks).
Are there any IP over PCIe protocols that the switch can switch, or is the switch just a programmable mirrors optical switch (mems device with only circuit switching and no packet switching)?
… let’s see.

it is in production from what I understand, but hyperscalers are buying a lot.

They just partnered with western digital also

I am actually in the process of finishing up a POC with Liqid and looking to implement it in production.

A lot of my workload is heavily GPU accelerated but requires a mix of GPUs to keep costs down. We normally run multiple 2U Boxes, either Dell or HPE with 2 Quatro RTX8000s and one Quatro RTX4000. The 8000s are for computation and the 4000 is for Image rendering.

During the POC I have been testing the performance of Liqid Provided GPUs versus Locally installed GPUs by running calculations on each and comparing computation time.

I have run testing in the following scenarios.

I have a Liqid Chasse with 5 RTX8000s and 2 A40s.
The Hosts I am running for this test are a Nutanix Cluster running AOS 5.20 on Cisco UCS Hardware.

I also have it hooked to a Dell R7525 running ESXi 7.0U3.

After a lot of playing with BIOS versions and settings, I was able to get Liqud to be able to pass through GPUs (At at least RTX8000s more on this later) to each platform.

During performance testing, the Liqid provided GPUs performed within the margin of error to the Locally installed cards on both the Nutanix Hosts as well as the VMWare box I am testing on.

Though performance is good there are some caveats with using this platform.

First- OEM support for BIOS can get tricky, Liqid really only validates on R7525 so other hardware will take some legwork to get working and may not work right i.e. on the Cisco UCSs I am only able to pass through 2 GPUs to each host while keeping it stable.

Second- Everything is PCIe so hot-add and reboots can get weird. For example, if a GPU is composed into a running ESXi machine (so far only on 7.0U3 because that’s what I have tested) it causes the system to pink screen and needs a hard reboot. While on the Ciscos, if a GPU is composed into the machine and the machine is rebooted it will hang on the post screen. Either the PICe LOM needs to be disabled in BIOS and the GPU un-composed or un-composed the GPU pre-reboot then re-compose it in after. Again not really Liqids fault but in a production environment during reboots it will increase downtime.

Third- At the moment the platform is not validated on Nvidias Newes A series GPUs so while we were able to get the A40s to compose into a machine we had to disable the ability to run the RTX8000s at the same time. Therefore Mixed GPU builds would be an issue if running new cards and “Old” cards.

Fourth- The only bottleneck I was able to find was more than 4 GPUs composed into a single machine all running at 100% for an extended period of time. The Standard PCIe LOM that Liqid installs has 4 x16 PCIe connections to the Liqid Switch. When I composed 5 RTX8000s into a single machine and ran a complex calculation across all of them for an extended period of time I was able to saturate all 4 links with the 5 cards. Granted the performance lost was less than 10% but with larger workloads and more cards to a single machine I can see this getting exponentially worse as PCIe the PCIe bus would be slammed.

Overall though if you are thinking about trying it out contact their sales and get a POC setup. Depending on your workload it may be better to bring your own GPUs as it will cut cost. Granted there are some growing pains but bleeding edge always has that and most of the issues I ran into were not by fault of Liqid and just laking OEM support.

Their support has been fantastic and very responsive during this entire process so that’s always a plus and everyone is US based.

1 Like

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.