Recently, I sat down with Lee Bushen from NVIDIA and Citrix’s Muhammad Dawood for a webinar where we discussed how, together, the latest Citrix HDX graphics stack and NVIDIA virtual GPU solutions can provide unmatched graphics acceleration. We talked about how the combination of two technologies is enabling designers, engineers, graphic artists, and others to take on big visualization challenges while also driving down costs.

We had a great turnout for the webinar, and there were many more questions than we had time to answer in the Q&A session. We wanted to cover some of those questions in this blog post to help you to enable a productive remote workspace with NVIDIA and Citrix HDX. The complete webinar is available on demand now.


What recommendations do you have for a Citrix policy for two GPU M60s that are used for 3-D CAD for PTC Creo. How can I configure the policy?

Lee Bushen: PTC Creo would be classed at the mid-range professional graphics level. If this were a new project, NVIDIA would recommend an A40 first, then probably an A10. The M60 has about the same power as one of the A16’s four GPUs, so it might struggle with anything but small models. I would recommend possibly even a single user per GPU (the M60 has two) using the M60-8Q profile. This would mean a 1:1 contention on the GPU and as much performance as possible.

Check out some of these resources:

What about driver update cadence and best practices on how often to update?

Lee Bushen: Check our website to see when your vGPU software release goes out of support. We have long-term (three-year) and short-term (one-year) releases. This could drive your decision to update.

In terms of how often you should update, outside the release cycle, you might want to update for new hardware/software or a new feature. Check out the “What’s New” section of each release to see if there are any compelling features you need. When you do want to update, always upgrade the GPU Manager first, then the VM drivers. We do offer backward compatibility if it takes a bit longer to re-spin new gold images (new GPU manager/old VM driver) — generally, 1x version back but LTSR to LTSR, too. See the hypervisor release notes on the new release for full details.

Is there a limit on vApps FPS?

Lee Bushen: For vApps, it is actually 60fps. Check out our detailed guide.

Is GPUProfiler a good tool to use to determine if an app is using GPU if the app vendor is not sure?

Lee Bushen: You could use GPUProfiler and see if the GPU activates during your workflow. It really depends on the application as to which parts of the GPU it leverages and which APIs it uses. An app vendor should be able to provide a hardware compatibility list for GPUs if they do leverage GPUs. If this is a RDSH host, make sure to switch on HW GPU with group policy. Learn more in our documentation on GPU acceleration for Windows multi-session OS.

As I understand it, the Ampere cards are to PCI-e 4.0 specification, and I know that vendors such as Dell and HP are no longer selling the T4s. Is the A2 compatible with PCI-e 3 buses for the slightly older HCI kit?

Lee Bushen: PCIe Gen 4 cards (like the A2) will work fine in Gen 3 slots. For VDI workloads, the PCIe bus is rarely the bottleneck. In fact, we see little difference between x8 and x16 slots for general purpose VDI. You would need to talk to your hardware vendor to see which cards they certify in their servers. It’s always important to have this certification because each GPU has its own power and cooling requirements, especially since the majority of datacenter GPUs are passively cooled by the server chassis (again, like the A2).

The T4 is not EOL and can still be found in the market. Stock levels for all our GPUs are very time-specific and vendor-specific. Bear in mind, the A2 is roughly 0.6 the performance of a T4, so it’s not a like-for-like replacement (although it has the same form factor).

Please comment on degradation in vGPU graphic performance versus physical GPU as shown through Solidworks performance test

Lee Bushan: I suspect the Solidworks tests probably work on frame rates. As we intentionally limit the frame rate within VMs to 45 or 60 fps (for best performance for all), the benchmark is very likely to throw up lower numbers compared to physical where there is no limit. As I said in the webinar, be careful with benchmarks that champion FPS as a measure of performance. If you want to compare more like-with-like, either change the GPU scheduling mode to Equal Share (that disables the fps limiter) or try doing a direct PCIe pass through to the VM to eliminate the vGPU stack. You can also turn off the FRL on a VM by VM basis in the VM metadata.

I am concerned that Citrix keeps giving me slightly lower Solidworks large assembly rotate/zoom performance. Should I expect better large assembly Solidworks rotate/zoom graphics performance using Citrix HDX? Another issue I am seeing with Citrix HDX is a “buffering” lag when rotating a large assembly when I leave my desktop for a couple hours and come back and reconnect. My testing is showing a lower Solidworks large assembly response/rotate/zoom performance

Muhammad Dawood: The improvements we made in 2112 really shine when the endpoint (client) has an NVIDIA GPU. You may find that for Intel-based clients (including high end Core i9 laptops), the performance isn’t as good, and that’s something we’re looking to resolve in the next Citrix Workspace app for Windows release or two. We’re actively shifting our focus from server to client to ensure that the entire end-to-end system is as optimized as it can be.

Lee Bushan: Normally both protocols work well for ProViz apps, but there are so many policy parameters to each that it’s hard to compare like with like. Also, because protocols tend to autotune, it could be that they are using different defaults (like JPG encoding vs. H.264). The target frame rate could be set differently. You should try to tune both protocols to use equivalent settings and eliminate any other differences (such as profile size, the load from other VMs on the GPU, vCPUs/vRAM, or concurrent apps). Try running GPUProfiler with each to get better insights into what the GPU is doing.

You could also try upgrading your VDAs to VAD 2112. As we discussed in the webinar, 2112 and later have lots of extra HDX optimizations!

What is the expected degradation going from physical to virtual, assuming low, one-to-one density?

Lee Bushen: A few percentage points. Ampere GPUs use simplified plumbing for vGPU (SR-IOV), so we also expect that delta to be lower with Ampere. Don’t forget, this is a shared environment, with a GPU time-sliced between a number of VMs for cost benefits. It’s never going to be as fast as physical if users are doing the same things at the same time on the same GPU. VDI workload characteristics lend themselves well to GPU sharing because loads are generally spikey and users share well with each other. Jobs like Offline Rendering or AI don’t do sharing well because loads from each VM tend to be constant and sustained.

Why does Adobe tell us that our VDI (Citrix-VMware-vGPU-NVIDIA) environment is not supported to run Photoshop? Will the work we covered in the webinar go to waste if we need to go back to the traditional PC environment?

Lee Bushan: We encounter this often in the field. Most software vendors cannot support all the different hypervisor, VDI, and protocol configurations out there, so it’s much easier to just stick with supporting physical PCs “officially.” Of course unofficially, even some of those vendors use virtualization in-house to leverage their own apps! Often in practical terms, when you’re logging a support call with them, they will ask you to replicate the problem on a physical workstation if they suspect the issue lies with the VDI stack. There are thousands of companies out there virtualizing apps with few worries about support.

Is the demo we saw in the webinar on Citrix Hypervisor?

Lee Bushan: In my case, this was a VMware server, but vGPU is supported just the same on Citrix Hypervisor, too.

Muhammad Dawood: From an implementation perspective, the optimizations in 2112 are independent of the actual hypervisor being used. While we haven’t tested Citrix Hypervisor, we expect to see a comparable increase in performance there also.

How about a scenario where you are dealing with a medical device for electroencephalography (EEG) and video (@ 25FPS) synchronized with EEG signals study, and you want to allow 15 doctors to connect simultaneously to the Citrix Application Server virtual machine where it is installed in the medical software for real-time monitoring of the patients’ EEG signals.

In this scenario, the HW server needs NVIDIA GPU and HDX 3D-Pro. But how can I calculate the HW characteristics (CPU, RAM, GPU, etc.) necessary for hosting the virtual machine application server for 15 doctors connected, concurrently, to the same server? Does Citrix or NVIDIA provide tools to calculate these needs?

Lee Bushan: You can use GPUProfiler to view GPU activity, and assuming the apps can leverage a GPU, it should provide better performance. I’m slightly concerned, though, that your app is a little too “high end” to be using the shared RDSH (virtual apps) model. A better method would be pure VDI on Windows 10 with each doctor having their own VM. In terms of sizing, the only way is to do a PoC using different configurations and select the best GPU Profile and HW spec for your app and dataset. There’s no rule of thumb here because all apps are different.

Are the new 8K HDR monitors being supported in Citrix DaaS with NVIDIA GPU?

Muhammad Dawood: We are actively working on 10-bit / HDR support and seeing very promising results. We haven’t yet tested with 8K displays. However, internally we’ve been using multiple 4K screens for some of our test scenarios. If 10-bit/HDR color support is desired, H.265 will be used.

Lee Bushan: In terms of 8K, things are bit more complicated. NVIDIA recommends using the H.265 video codec with resolutions above 4K because our encoder (NVENC) only supports H.264 up to 4K. Learn more here.

Another consideration for 8K is the endpoint machine. If the endpoint has a NVIDIA card, too, bear in mind that we limit H.265 decode to 4K for 32-bit applications (Citrix Workspace app is 32 bit). You can either use Thinwire instead of the Video Codec or use a non-NVIDIA GPU in your endpoint.

Does the 2112 release for Linux also have those performance optimizations?

Muhammad Dawood: Due to the way frame capture is done on the Linux VDA, the specific improvements we made for the Windows VDA do not apply. We will however perform the same level of performance investigation and apply the lessons learned optimizing the Windows VDA.

Miss the webinar? Want to leverage the combination of Citrix and NVIDIA technologies in your IT infrastructure? Our webinar is available on demand now and is a great place to get started.


Disclaimer: The development, release and timing of any features or functionality described for our products remains at our sole discretion and are subject to change without notice or consultation. The information provided is for informational purposes only and is not a commitment, promise or legal obligation to deliver any material, code or functionality and should not be relied upon in making purchasing decisions or incorporated into any contract.