AMD CTO Discusses Design Goals for Zen – How to Achieve High Throughput & More Perf/Watt

Back in August at a special event for the press, AMD detailed their much anticipated Zen CPU core. The company claimed it had made a significant performance per watt progress with Zen, and could finally “re-enter” the high-performance x86 desktop and server market. Today, we have some interesting tidbits about the new CPU microachitecture that come straight from Mark Papermaster, chief technology officer (CTO) at AMD.

Papermaster sat down with SemiEngineering and elaborated on the design goals for Zen: what’s different in Zen versus previous core designs and how the company was planning to achieve more Perf/Watt.

We just designed a brand new CPU core, Zen, from the ground up. We actually started this effort in late 2012, so we’ve been working on it for four years. It takes four years to get a brand new x86, high-performance CPU done. We are right on track. It’s a very modern core and very efficient in terms of driving that performance per watt of energy, and it’s very scalable. We also designed it to work very well with accelerators, like our GPUs. You can add more CPUs if you need to get more work done, and you can connect to GPUs, FPGAs, or other accelerators.

We know that a single CPU-Complex (CCX), which is the basic building block of the AMD Zen, is comprised of four independent CPU cores connected to a bunch of L3 cache. Each of those four cores is capable of handling two threads in what is called SMT (simultaneous multi-threading). According to Papermaster, this design methodology allows them to simply add additional CCX’s to the core, and throw that together with GCN (Graphics Core Next) cores to develop powerful APU solutions.

AMD-Zen-CCX-05

In Zen, AMD wanted to have a modern core that could handle a range of workloads. Mark said they already had the power-optimized as well as high-performance set of processors; so what the company was looking for was an architecture that could scale from low-end to mid- and high-end ranges. Which means the new CPU core has to offer high throughput, energy efficiency and floating point efficiency.

That is done with both design and process. Design is microarchitecture, attacking every element of the execution units, of the cache subsystem, of the scheduling, every aspect to ensure you are removing bottlenecks. Technology is twofold. We’ve leveraged the new 14nm finFET technology. The scalability you have with finFETs is really quite a large range because it has very little leakage. When you turn off your clocks—when you are not doing active work—you can get very close to nil energy, and leakage is lower than previous technologies. Yet as you turn on your clocks and accelerate your workloads, you get very fast performance per watt.

At the unveiling event, AMD stated in the brief that Zen features a Micro-op cache that reduces the work by decoding processor instructions and thus saving the CPU from doing the task over and over again. Moreover, they have implemented the clock gating technique that allows the processor to turn off sections of itself to reduce heat output and improve efficiency.

amd-zen-efficiency_02

AMD CTO then detailed how they were able to achieve optimized bandwidth and latency, as well as shed light on the processor’s caching systems.

You need enough bandwidth and pipes to optimize your latency to ensure you don’t create bottlenecks.

We looked at what we could do to speed up both, ensuring no bottlenecks in terms of the execution flow. We’ve improved the micro-op cache, the efficiency of getting those instructions into the pipe. We’ve also made a number of efficiencies in terms of reducing the number of cycles executing though our execution units. In terms of memory and feeding it, we’ve optimized our cache subsystem.

Looking at the cache hierarchy, the AMD Zen packs a dedicated L2 cache of 512KB, with 8-way associativity. The L3 cache sits at 8MB which is shared between two groups, giving 2MB per core. Overall, Zen would deliver double the bandwidth compared to Excavator on Level 1 and Level 2 caches, and up to 5x the amount of cache bandwidth for L3.

AMD-Zen-Throughput-03

Papermaster also touted the 40 per cent IPC gains with Zen, saying:

When Zen comes out in early 2017, it is going to have a 40% improvement. The only way you can get that is to use a combination of every aspect of the design, of feeding the engine, of optimizing the engine itself and improving the throughput to the engine. Those are the three key elements in terms of how you get improvements. Anyone who has been around microprocessors design for a while will say it is not rocket science. They’re right, but those are the levers. It’s about breaking it down into dozens and dozens of specific changes you drive into a design.

AMD-Zen-CPU-IPC-01

Summit Ridge will be the first series of processors to use the Zen processor x86 core architecture. Based on the 14nm FinFET process technology from GlobalFoundries, Summit Ridge features eight Zen cores and 16 threads, and support AMD’s brand new AM4 socket platform, which is also compatible with the company’s seventh-gen A-Series Bristol Ridge APUs. The AM4 platform supports a variety of features, including DDR4 memory, PCIe Gen 3, USB 3.1 Gen2, NVMe, and loads more.

Summit Ridge will rival Intel Broadwell-E chips, and assuming AMD could nail decent yields with the 14nm FF process, they’ll do so at a discounted price. The first Zen-based products will appear in high-end desktops which are reportedly scheduled for early 2017 launch. These will be followed by Zen-based Naples SoC aimed at the enterprise-class x86 server market.

For the full interview with Mark Papermaster, head over to SemiEngineering.