Nvidia Talks About Asynchronous Compute; Is It Really Important for GeForce GPUs?

If you’re a gamer then there is a very good chance that you’ve already heard of Asysnchronous Compute. It is a fundamental part of DirectX 12 API which allows GPUs to maximize the use of their resources to allow the maximum possible performance.

It has been rather interesting however that Nvidia’s Maxwell architecture can’t handle asynchronous compute as well as do the AMD’s GCN-powered GPUs. Ever since the release of first DirectX 12 benchmarks last year, Maxwell has performed relatively slower than Radeon GPUs. The reasoning behind that is because AMD graphics processors feature an asynchronous compute engine (ACE) which is a unique piece of hardware that they added to the GPUs.

Nvidia Pascal & Async Compute

That being said, a lot of folks hoped that Nvidia’s next big architecture called Pascal would fix the errors of Maxwell and we would see a large bump in performance; however that doesn’t seem to be the case.

New rumor is swirling around that comes from Bits&Chips stating that the hardware found in Pascal hasn’t just sufficiently evolved when it comes to async compute and it still doesn’t offer a big performance jump relatively to Maxwell. Instead, it will rely heavily on raw power and driver optimizations to achieve high performance.

Now, there was also some rumor that Nvidia GPUs could not even execute asynchronous compute, but this theory doesn’t seem to hold true. According to a recent benchmark constructed by a user of the Beyond3D forum, it does work but once you start getting to high loads, Nvidia GPUs take such a long time to process the workload that it causes “Windows to pull the trigger and kill the driver.”

AMD GPUs on the other hand are capable of handling a much higher load – about 10x times of what Nvidia GPUs can handle – and this is once again thanks to aynchronous compute engines.

Async Shaders – A Big Advantage for Radeon GPUs

So how does Asynchronous Compute actually work? It basically looks for “bubbles” in graphics pipeline. While you may not realize this, when your game is running there are thousands of shaders in a GPU which are all being used to process different tasks in the form of instructions issued to them. Now what ACEs do is that they help execute multiple lines of commands in parallel. This allows for tasks to be processed simultaneously, and independent of one another.

AMD-Asynchronous-Shaders_1

In other words, the technique can put each one of the multiple shader units inside a GPU to maximum use, improving performance.

On contrary, GeForce GPUs work on the principle of prioritizing some tasks over others. Known as pre-emption, the technique just could not handle multiple tasks simultaneously, so leaving a deep well of untapped performance.

What Nvidia Has To Say About It

So far, we haven’t got any real answer from Nvidia on this very subject. The only statement which the chip maker has issued to the technical press over the recent months is that their GeForce GPUs are capable of running multiple command streams concurrently, but the feature is not yet active in their current driver. So any results should be taken as inconclusive (Ashes of the Singularity benchmarks among others).

More recently though, Rev Lebaredian, Sr. Director of Content & Technology at Nvidia, shared some interesting details about Async Compute. During an interview with Hardware.fr at GDC 2016, the Nvidia official made two arguments when asked if the async compute is of any significance to Maxwell based GPUs.

First, if Async Compute is a way to increase performance, what matters in the end, is the overall performance. If GeForce GPUs are the most efficient basis than Radeon GPUs, the use of multi engine in an attempt to boost their performance is not a top priority.

On the other hand, if the rate of use of the various blocks of the GeForce GPU is relatively high at the base, the potential gain from Async Compute is less important. Nvidia says here that overall there are far fewer holes (bubbles in modern GPU) at the activity of units of its GPU than its competitor. But the purpose of concurrent execution is to exploit synergies in the treatment of different tasks to fill these holes.

These arguments reflect Nvidia’s approach towards designing its next gen GPU architecture. The company might be planning on integrating one or more advanced control processors into its upcoming Pascal chips, to stay ahead of the curve. That is, the green team will rely on raw power hoping it could provide more computing units and tune the performance up directly in games. After all, at the end of the day what actually does matter is the overall performance than taking advantage of any other functionality, right?

Despite all these arguments, Nvidia can not totally ignore the possibility that enabling async compute grants a gain in performance in some cases. Which Lebaredian reiterated by wishing they could implement the feature on the hardware level.