AMD’s APUs continue to be interesting because of how arguably over-architected the GPU sides of these chips often are and how they react to memory speed. Trinity and Richland were already hitting memory bandwidth limitations, and Kaveri’s shift to the more powerful and efficient GCN architecture and away from the older VLIW4 architecture means ever increased pressure placed on the memory controller and the system memory.
When I last examined memory scaling on Kaveri, I concluded that there had to be more gas in the tank. Performance testing demonstrated no diminishing returns at any stage. AMD specs Kaveri with a maximum memory speed of DDR3-2400, and going above that means having to overclock the chip’s BClk. So let this be a lesson: ASRock may be the only vendor advertising A88X chipset boards with speeds above DDR3-2400, but they’re just as beholden to the limitations of Kaveri’s memory controller as anyone else is. I was able to get the memory controller up to DDR3-2640, but the chip simply wouldn’t run at any higher a BClk than 110MHz.
Today I’m doing a somewhat more thorough dive into Kaveri’s dependency on memory speed by testing both the effect of latency on performance and the effect of overclocking the GPU cores. The results are extremely interesting to say the least.
The test platform was:
- AMD A10-7850K APU (@ stock unless otherwise specified)
- 2x4GB Corsair Vengeance Pro DDR3-3000
- ASUS A88X-PRO Motherboard
- 240GB Corsair Neutron GTX SSD (System) & 240GB Corsair Force LS SSD (Game)
- Windows 7 Ultimate 64-bit SP1
During my initial testing I suspected Kaveri’s memory controller was a bit on the weak side, and this testing more or less confirms it. Regardless of memory speed, write performance stagnates. There’s some mild variation, but it’s practically within the margin of error. That in mind, we do see a trend of write performance peaking at DDR3-2133, then slightly declining again when we go faster. Also pay attention to the read performance; we see gains each time we bump the raw clockspeed, but DDR3-2400 offers virtually no improvement here. DDR3-2640 does at least give us another spike.
It’s not unusual for a memory controller to start essentially punching above its weight class at a certain point. For example, Haswell scales beautifully up to about DDR3-2400, but at DDR3-3000 even synthetic tests have a hard time demonstrating differences; Haswell’s IMC just isn’t powerful enough to handle those high speeds. By the same token, it looks like Kaveri starts to run into trouble after DDR3-2133.
Copy bandwidth tells essentially the same story. You can get a little extra pep out of DDR3-2400, but DDR3-2133 seems to be Kaveri’s sweet spot.
The other trend to pay attention to is the effect of timings and latencies on Kaveri’s performance: it’s marginally present, but invariably dwarfed by raw clock speed.
Latency testing tells the same story. DDR3-2133 CAS 9 is probably the most balanced point unless you’re going to start overclocking, as DDR3-2400 offers very little in the way of performance benefit.
Synthetics are useful for getting a baseline idea of how the memory controller is operating, but they’re not the whole story. In real world testing, I found that Kaveri’s Steamroller-based x86 cores are still too underpowered to take advantage of faster system memory. A run in Cinebench R15 saw 307 points at DDR3-1333 CAS 7, a minor bump to 310 points at DDR3-1600 CAS 9, and then that 310 point score held fast all the way up to DDR3-2400 CAS 10. For gaming, I went to my standby of BioShock Infinite.
While I tested last time at 1080p Low, this time I went with 1600x900 Medium to both start fairly playable and also to at least slightly ease strain on the memory controller. In line with my synthetic testing, I found that BioShock Infinite saw the most gains going from memory speed to memory speed, with tighter timings offering very little performance improvement. The performance jump going from DDR3-2133 CAS 9 to DDR3-2400 CAS 10 was minimal, though. Initially this would suggest that we’re finally getting to a point where we’re GPU limited, but this isn’t the whole story.
To test the effect of overclocking the GPU on gaming performance and see if it could operate as an alternative to or in conjunction with using faster memory, I ran a series of tests with the GPU speed topping out at 950MHz (1GHz wasn’t stable).
Under any circumstances where Kaveri might be the slightest bit memory bandwidth limited, overclocking the GPU yields a whopping 1fps benefit at most. If you’re looking for a free performance boost by overclocking Kaveri’s GPU, I suspect you’ll only see it at 720p, if at all. On the graphics side, the chip is entirely memory bandwidth limited.
With all this testing done, there are a few conclusions to be drawn.
First, Kaveri’s sweet spot is DDR3-2133 CAS 9 or CAS 10. That appears to be the true point of diminishing returns; after that, the memory controller seems to have an increasingly difficult time handling memory speeds and latencies.
Second, while I haven’t seen other reviews mention this, I do believe that the memory controller on Kaveri is a dormant weak point. Under most circumstances this isn’t going to be exposed, but benchmarked memory bandwidth is consistently lower than Haswell at comparable speeds; an i7-4770K at DDR3-1333 CAS 9 will get better synthetic bandwidth results than the A10-7850K at DDR3-1600 CAS 7, with substantially lower latency. The middling performance of AMD’s memory controller isn’t an issue for the Steamroller CPU cores, but keeps a tight leash on the GCN GPU cores and prevents the A10-7850K from seeing its full potential.
This bandwidth problem is exacerbated by the poor memory ratio support. I understand why AMD caps it at DDR3-2400, however I maintain that they should have exposed additional ratios the way Intel did on Haswell to allow enthusiasts to try and tune as much performance as they can out of the architecture.
Kaveri is still a very capable architecture with a lot to recommend it, but it’s disappointing to see that the memory controller itself is actually a weak point that prevents the graphics side of the chip from stretching its legs. Nonetheless, Kaveri was architected more for compute than for gaming, where these memory bandwidth limitations are going to be less crippling.
If you have any suggestions on alternative settings to tweak or test, please let me know. AMD’s latest APU, whatever its weaknesses are, is at least a fascinating specimen.