Original Link: https://www.anandtech.com/show/10446/the-amd-radeon-rx-480-preview
The AMD Radeon RX 480 Preview: Polaris Makes Its Mainstream Mark
by Ryan Smith on June 29, 2016 9:00 AM ESTBack in December of last year, AMD’s Radeon Technologies Group began slowly trickling out the plans for what would be their first GPU architecture built for the now-modern FinFET processes: Polaris. As part of a broader change in how GPU architectures have been handled – more information is now released ahead of launch – AMD laid out what they wanted to do with Polaris. Aim for the mainstream, radically improve power efficiency, lay the groundwork for HDR displays, and, of course, improve performance.
Now six months later we are seeing AMD’s plans come to fruition, as the Polaris GPUs are in full production, and the first retail products are launching today. Kicking off the Polaris generation in the desktop market will be AMD’s Radeon RX 480, which is aiming for the mainstream market. We’ve already seen the card, the price, and AMD’s marketing spiel back at Computex 2016, so now it’s time to take a look at the final, retail hardware.
AMD Radeon GPU Specification Comparison | ||||||
AMD Radeon RX 480 (8GB) | AMD Radeon RX 480 (4GB) | AMD Radeon R9 390 | AMD Radeon R9 380 | |||
Stream Processors | 2304 (36 CUs) |
2560 (40 CUs) |
1792 (28 CUs) |
|||
Texture Units | 144 | 160 | 112 | |||
ROPs | 32 | 64 | 32 | |||
Base Clock | 1120MHz | N/A | N/A | |||
Boost Clock | 1266MHz | 1000MHz | 970MHz | |||
Memory Clock | 7-8 Gbps GDDR5 | 7Gbps GDDR5 | 5Gbps GDDR5 | 5.5Gbps GDDR5 | ||
Memory Bus Width | 256-bit | 512-bit | 256-bit | |||
VRAM | 8GB | 4GB | 8GB | 2GB | ||
Transistor Count | 5.7B | 6.2B | 5.0B | |||
Typical Board Power | 150W | 275W | 190W | |||
Manufacturing Process | GloFo 14nm FinFET | TSMC 28nm | TSMC 28nm | |||
Architecture | GCN 4 | GCN 1.1 | GCN 1.2 | |||
GPU | Polaris 10 | Hawaii | Tonga | |||
Launch Date | 06/29/16 | 06/18/15 | 06/18/15 | |||
Launch Price | $239 | $199 | $329 | $199 |
At the highest level, the RX 480 is based off of a fully enabled version of AMD’s Polaris 10 GPU. This is the first Polaris GPU to hit the market, and is the larger of the two GPUs. The total transistor count is 5.7 billion, which takes up 232mm2 on GlobalFoundries’ 14nm FinFET process. That this GPU is built at GloFo and not TSMC is a significant departure for AMD, who previously has used partner TSMC just shy of forever, and is the first time AMD and NVIDIA haven’t used the same fab in some 13 years. We’ll touch upon the foundry issue more in the full review, but the important thing to take away right now is that with the split in foundries, it’s no longer architecture alone that dictates whether a given NVIDIA or AMD GPU is better; process now plays a part, and the playing field is no longer even.
As it’s using a full Polaris 10 GPU, the RX 480 ships with all 36 CUs (2304 SPs) enabled. Ignoring architectural efficiency for the moment, this puts it somewhere between the Radeon R9 390 (Hawaii) and Radeon R9 380 (Tonga) in terms of CU count, with AMD having spent a good chunk of their 14nm density gains on adding CUs. Note that the CUs themselves have not substantially changed – it’s still 64 stream processors and 4 texture units per CU – which is where the 144 texture unit counts comes from.
On the backend of things, RX 480 is equipped with 32 ROPs. This is fewer than Hawaii’s 64 ROPs, but it is consistent with mainstream parts, as ROP needs don’t scale nearly as quickly from one generation to the next like compute (CU) needs. These 32 ROPs are paired with 2MB of L2 cache, which is twice as much L2 cache per ROP as the bulk of AMD’s last-gen lineup. The increased L2 cache has a die space cost – which is now easier to pay with the 14nm process – and helps to improve performance and cut power consumption by keeping more data on-die.
However once you go off-die, you will run into RX 480’s VRAM, which is a small story in and of itself. Once again common for mainstream AMD cards, AMD has stuck with a 256-bit GDDR5 memory bus here. Attached to this bus is either 4GB or 8GB of VRAM, with AMD offering two capacities for RX 480. The reason for offering multiple capacities is that AMD wants to hit the $199 price point with the card – the traditional sweet spot for mainstream cards – which would be hard to do with an 8GB card at this time. By offering both, AMD can hit that price while offering a full 8GB card at a slightly higher price for buyers with a bit more flexibility and/or greater VRAM needs.
Where things get tricky here however is the memory speeds. Officially, 7Gbps GDDR5 is the minimum speed for both RX 480 capacities, and this is the speed that AMD’s 4GB reference card runs at. However for their 8GB reference card, AMD has opted to ship the card with faster 8Gbps memory in order to further boost performance. I suspect that AMD would have liked to have used 8Gbps memory throughout, but the aforementioned price target required AMD to make some concessions to comfortably reach it. Otherwise for the higher priced 8GB card, AMD didn’t need to pinch pennies, and as a result they were able to ship it with 8Gbps memory.
AMD Radeon RX480 Memory Bandwidth | |||||
AMD Radeon RX 480 8GB Reference | AMD Radeon RX 480 4GB Reference | AMD Radeon RX 480 Min Requirements | |||
Memory Clock | 8Gbps GDDR5 | 7Gbps GDDR5 | 7Gbps GDDR5 | ||
Memory Bus Width | 256-bit | 256-bit | 256-bit | ||
Total Mem Bandwidth | 256GB/sec | 224GB/sec | 224GB/sec | ||
VRAM | 8GB | 4GB | 4/8GB |
The end result is that we have an odd schism between AMD’s card requirements and what they actually ship. The reference 4GB RX 480 meets the RX 480 minimum specifications, whereas the reference 8GB card is de facto overclocked relative to those same specifications. As we’ll see in our benchmark results, the difference in performance isn’t too great, but I don’t think this is an ideal outcome for consumers. My biggest concern right now is what happens when AMD’s partners start shipping their custom cards; if they opt for slower memory buses, then this would mean that custom 8GB cards could end up slightly underperforming the official reference card. But we’ll have to see how that plays out.
Moving on, let’s talk about power consumption. As AMD has made clear over the last several months, one of the major goals of Polaris was power efficiency, and this is where we see some of the first payoffs from that decision. RX 480’s official Typical Board Power (TBP) is 150W, over 20% lower than the last-generation R9 380, and 45% lower than the otherwise performance-comparable R9 390. Consequently the card only requires a single 6-pin PCIe power connector for external power, making it a more friendly option for power-limited desktops that don’t offer additional power connectors.
In terms of design, the reference RX 480 is a double-wide, blower-style card measuring 9.5-inches long. Notably, this is the first AMD retail reference card since the Radeon R9 290 series to use a blower, giving AMD the opportunity to show that they’ve learned from 290’s excesses and that the company can build a better blower. Given AMD’s mainstream ambitions, a blower makes a lot of sense for a $199, 150W card, as a fully exhausting card is going to be the most compatible with the wide variety of desktop designs out there. AMD doesn’t need to worry about whether the cooling built into the chassis can handle 150W of heat, since the card can remove the vast majority of the heat on its own. The blower design does add some length to the card though; the PCB is only 7-inches long, while the space requirements for the radial fan push the card out to the full 9.5-inches.
For connectivity, buyers will find 3 DisplayPorts and an HDMI port; AMD has done away with the DVI port for their reference design. As this is a new card on a new architecture, both port types support their latest respective standards. For DisplayPort this means support for the 1.3 and 1.4 standards, adding the newest, fastest HBR3 signaling mode, along with full HDR support. Meanwhile for the HDMI support, HDMI 2.0b is supported, offering 4Kp60 support with HDR.
For today’s launch, this is going to be a full reference launch. All of AMD’s partners are shipping AMD’s reference design in 4GB and 8GB capacities, which means the differences between the vendors will come down to pack-in items, support, and whether anyone charges a premium for the aforementioned items. Card availability is said to be good, but at this point I’m going to be surprised if most retailers don’t sell out by the end of the day, as these days it’s rare for video cards not to sell out, even mainstream cards. Looking at the slightly longer term, AMD isn’t able to state exactly when we’ll see custom RX 480 boards hit the market, but from what I gather it will be sooner rather than later.
Moving on, with two different capacities there are two different prices for the RX 480. The entry level 4GB card will be launching at the previously unveiled price of $199. Meanwhile the 8GB card will launch at $239, a $40 price premium for the extra 4GB of memory and the higher memory frequency. I do not have a good idea of what the split is between 4GB and 8GB cards, but I suspect that it will be the 8GB cards that are more plentiful.
Finally, looking at the competitive landscape, just as was the case last month with NVIDIA’s GTX 1000 series and the high-end market, the Radeon RX 480 series is launching uncontested into the mainstream market. At least for the time being all of NVIDIA’s products are positioned well above the RX 480 – with GTX 1070 starting at $399 – which means what competition there is for AMD is composed of last-generation 28nm cards, particularly the GTX 970 and GTX 960. As these are last-generation cards, neither one is strictly comparable to the RX 480, and in the long run these cards have a limited shelf life as they’re due to be discontinued sooner than later.
Summer 2016 GPU Pricing Comparison | |||||
AMD | Price | NVIDIA | |||
$659 | GeForce GTX 1080 | ||||
$429 | GeForce GTX 1070 | ||||
Radeon R9 390X | $329 | ||||
$259 | GeForce GTX 970 | ||||
Radeon RX 480 (8GB) | $239 | ||||
Radeon RX 480 (4GB) | $199 |
AMD's Path to Polaris
With the benefit of hindsight, I think in reflection that the 28nm generation started out better for AMD than it ended. The first Graphics Core Next card, Radeon HD 7970, had the advantage of launching more than a quarter before NVIDIA’s competing Kepler cards. And while AMD trailed in power efficiency from the start, at least for a time there they could compete for the top spot in the market with products such as the Radeon HD 7970 GHz Edition, before NVIDIA rolled out their largest Kepler GPUs.
However I think where things really went off of the rails for AMD was mid-cycle, in 2014, when NVIDIA unveiled the Maxwell architecture. Kepler was good, but Maxwell was great; NVIDIA further improved their architectural and energy efficiency (at times immensely so), and this put AMD on the back foot for the rest of the generation. AMD had performant parts from the bottom R7 360 right up to the top Fury X, but they were never in a position to catch Maxwell’s efficiency, a quality that proved to resonate with both reviewers and gamers.
The lessons of the 28nm generation were not lost on AMD. Graphics Core Next was a solid architecture and opened the door to AMD in a number of ways, but the Radeon brand does not exist in a vacuum, and it needs to compete with the more successful NVIDIA. At the same time AMD is nothing if not scrappy, and they can surprise us when we least expect it. But sometimes the only way to learn is the hard way, and for AMD I think the latter half of the 28nm generation was for the Radeon Technologies Group learning the hard way.
So what lessons did AMD learn for Polaris? First and foremost, power efficiency matters. It matters quite a lot in fact. Every vendor – be it AMD, Intel, or NVIDIA – will play up their strongest attributes. But power efficiency caught on with consumers, more so than any other “feature” in the 28nm generation. Though its importance in the desktop market is forum argument fodder to this day, power efficiency and overall performance are two sides of the same coin. There are practical limits for how much power can be dissipated in different card form factors, so the greater the efficiency, the greater the performance at a specific form factor. This aspect is even more important in the notebook space, where GPUs are at the mercy of limited cooling and there is a hard ceiling on heat dissipation.
As a result a significant amount of the work that has gone into Polaris has been into improving power efficiency. To be blunt, AMD has to be able to better compete with NVIDIA here, but AMD’s position is more nuanced than simply beating NVIDIA. AMD largely missed the boat on notebooks in the last generation, and they don’t want to repeat their mistakes. At the same time, starting now with an energy efficient architecture means that when they scale up and scale out with bigger and faster chips, they have a solid base to work from, and ultimately, more chances to achieve better performance.
The other lesson AMD learned for Polaris is that market share matters. This is not an end-user problem – AMD’s market share doesn’t change the performance or value of their cards – but we can’t talk about what led to Polaris without addressing it. AMD’s share of the consumer GPU market is about as low as it ever has been; this translates not only into weaker sales, but it undermines AMD’s position as a whole. Consumers are more likely to buy what’s safe, and OEMs aren’t much different, never mind the psychological aspects of the bandwagon effect.
Consequently, with Polaris AMD made the decision to start with the mainstream market and then work up from there, a significant departure from the traditional top-down GPU rollouts. This means developing chips like Polaris 10 and 11 first, targeting mainstream desktops and laptops, and letting the larger enthusiast class GPUs follow. The potential payoff for AMD here is that this is the opposite of what NVIDIA has done, and that means AMD gets to go after the high volume mainstream market first while NVIDIA builds down. Should everything go according to plan, then this gives AMD the opportunity to grow out their market share, and ultimately shore up their business.
As we dive into Polaris, its abilities, and its performance, it’s these two lessons we’ll see crop up time and time again, as these were some of the guiding lessons in Polaris’s design. AMD has taken the lessons of the 28nm generation to heart and have crafted a plan to move forward with the FinFET generation, charting a different, and hopefully more successful path.
Though with this talk of energy efficiency and mainstream GPUs, let’s be clear here: this isn’t AMD’s small die strategy reborn. AMD has already announced their Vega architecture, which will follow up on the work done by Polaris. Though not explicitly stated by AMD, it has been strongly hinted at that these are the higher performance chips that in past generations we’d see AMD launch with first, offering performance features such as HBM2. AMD will have to live with the fact that for the near future they have no shot at the performance crown – and the halo effect that comes with it – but with any luck, it will put AMD in a better position to strike at the high-end market once Vega’s time does come.
The Polaris Architecture: In Brief
For today’s preview I’m going to quickly hit the highlights of the Polaris architecture.
In their announcement of the architecture this year, AMD laid out a basic overview of what components of the GPU would see major updates with Polaris. Polaris is not a complete overhaul of past AMD designs, but AMD has combined targeted performance upgrades with a chip-wide energy efficiency upgrade. As a result Polaris is a mix of old and new, and a lot more efficient in the process.
At its heart, Polaris is based on AMD’s 4th generation Graphics Core Next architecture (GCN 4). GCN 4 is not significantly different than GCN 1.2 (Tonga/Fiji), and in fact GCN 4’s ISA is identical to that of GCN 1.2’s. So everything we see here today comes not from broad, architectural changes, but from low-level microarchitectural changes that improve how instructions execute under the hood.
Overall AMD is claiming that GCN 4 (via RX 480) offers a 15% improvement in shader efficiency over GCN 1.1 (R9 290). This comes from two changes; instruction prefetching and a larger instruction buffer. In the case of the former, GCN 4 can, with the driver’s assistance, attempt to pre-fetch future instructions, something GCN 1.x could not do. When done correctly, this reduces/eliminates the need for a wave to stall to wait on an instruction fetch, keeping the CU fed and active more often. Meanwhile the per-wave instruction buffer (which is separate from the register file) has been increased from 12 DWORDs to 16 DWORDs, allowing more instructions to be buffered and, according to AMD, improving single-threaded performance.
Outside of the shader cores themselves, AMD has also made enhancements to the graphics front-end for Polaris. AMD’s latest architecture integrates what AMD calls a Primative Discard Accelerator. True to its name, the job of the discard accelerator is to remove (cull) triangles that are too small to be used, and to do so early enough in the rendering pipeline that the rest of the GPU is spared from having to deal with these unnecessary triangles. Degenerate triangles are culled before they even hit the vertex shader, while small triangles culled a bit later, after the vertex shader but before they hit the rasterizer. There’s no visual quality impact to this (only triangles that can’t be seen/rendered are culled), and as claimed by AMD, the benefits of the discard accelerator increase with MSAA levels, as MSAA otherwise exacerbates the small triangle problem.
Along these lines, Polaris also implements a new index cache, again meant to improve geometry performance. The index cache is designed specifically to accelerate geometry instancing performance, allowing small instanced geometry to stay close by in the cache, avoiding the power and bandwidth costs of shuffling this data around to other caches and VRAM.
Finally, at the back-end of the GPU, the ROP/L2/Memory controller partitions have also received their own updates. Chief among these is that Polaris implements the next generation of AMD’s delta color compression technology, which uses pattern matching to reduce the size and resulting memory bandwidth needs of frame buffers and render targets. As a result of this compression, color compression results in a de facto increase in available memory bandwidth and decrease in power consumption, at least so long as buffer is compressible. With Polaris, AMD supports a larger pattern library to better compress more buffers more often, improving on GCN 1.2 color compression by around 17%.
Otherwise we’ve already covered the increased L2 cache size, which is now at 2MB. Paired with this is AMD’s latest generation memory controller, which can now officially go to 8Gbps, and even a bit more than that when oveclocking.
Gaming Performance
So with the basics of the architecture and core configuration behind us, let’s dive into some numbers.
Overall, AMD is pitching the RX 480 as a card suitable for 1440p gaming as well as 1080p gaming and VR gaming. In the case of 1080p the card is clearly powerful enough, as even Crysis 3 at its highest quality setting is flirting with 60fps. However when it comes to 1440p, the RX 480 feels like it’s coming up a bit short; other than DiRT Rally, performance is a bit low for the 60fps PC gamer. Traditionally cards in the $199-$249 mainstream range have been 1080p gaming cards, and in the long run I think this is where RX 480 will settle at as well.
Gaming Performance, Continued
While AMD’s launch drivers for the RX 480 have by and large been stable, the one outlier here has been Grand Theft Auto V. In the current drivers there is an issue that appears to affect the game’s built-in benchmark on GCN 1.1 and later cards, causing stuttering, reduced performance, and in the case of the 380X, complete crashes. AMD has told me that they’ve discovered the issue as well and will be issuing a fixed driver, but it was not ready in time for the review.
Continuing our look at gaming performance, it’s becoming increasingly clear that RX 480 trends closely to the last generation Radeon R9 390 and the GeForce GTX 970. Given their architectural similarity, in a lot of ways this is a repeat of 390 vs 970 in general; the two cards are sometimes equal, and sometimes far apart. But in the end, on average, they are close together on our 2016 benchmark suite.
For mainstream video card users, this means that last year’s enthusiast-level performance has come down to mainstream prices.
Power, Temperature, & Noise
Given AMD’s focus on power efficiency with Polaris – not to mention the overall benefits of the move to 14nm FinFET – there is a lot of interest in just how the RX 480 stacks up when it comes to power, temperature, and noise. So without further ado…
When it comes to idle power consumption I'm posting the results I've measured as-is, but I want to note that I have low confidence in these results for the AMD cards. Ever since the GPU testbed was updated from Windows 8.1 to Windows 10, AMD cards have idled 3-5W higher than they used to under Windows 8.1. I believe that this is an AMD driver bug – NVIDIA’s cards clearly have no problem – possibly related to the GPU tested being an Ivy Bridge-E system. In this case I don’t believe RX 480’s idle power consumption is any higher than GTX 960’s, but for the moment the testbed is unable to prove it.
Traditionally we start with gaming load power before moving on to FurMark, but in this instance I want to flip that. As a power virus type workload, FurMark’s power requirements are greater than any game. But because it’s synthetic, it gives us a cleaner look at just GPU power consumption.
Among AMD’s cards, the RX 480 is second to only the Radeon HD 7850 in power consumption. Even then, as a GCN 1.0 card, the 7850 is one of the last AMD cards without fine-grained power states, so this isn’t a true apples-to-apples comparison. Instead a better point of reference is the GCN 1.2 based R9 Nano, which has a 175W TBP. Compared to the R9 Nano we find that the RX 480 draws about 30W less at the wall, which almost perfectly translates to the 25W difference in TBP. As a result we can see first-hand the progress AMD has made on containing power consumption with Polaris.
However things are a bit more mixed under Crysis 3. RX 480 is still near the top of our charts, and keeping in mind that higher performing cards draw more power on this test due to the additional CPU workload, the RX 480 compares very favorably to the rest of AMD’s lineup. System power consumption is very close to R9 280/380 for much improved performance, and against the performance-comparable R9 390, we’re looking at over 110W in savings. Hawaii was a solid chip from a performance standpoint, and Polaris 10 picks up where that left off by bringing down the power consumption to much lower levels.
The drawback for AMD here is that power consumption compared to NVIDIA still isn’t great. At the wall, RX 480 is only about 10W ahead of the performance-comparable GTX 970, a last-generation 28nm card. 1070FE further complicates matters, as its performance is well ahead of RX 480, and yet its power consumption at the wall is within several watts of AMD’s latest card. Given what we saw with FurMark I have little reason to believe that card-level power consumption is this close, but it looks like AMD is losing out elsewhere; possibly with driver-related CPU load.
Moving on to idle GPU temperatures, there’s little to remark on. At 31C, the RX 480’s blower based design is consistent with the other cards in our lineup.
Meanwhile with load temperatures, we get to see the full impact of AMD’s new WattMan power management technology. The RX 480 has a temperature target of 80C, and it dutifully ramps up the fan to ensure it doesn’t exceed that temperature.
With idle noise levels RX 480 once again posts a good result. At 37.8dB, it’s in good company, only meaningfully trailing cards that idle silently due to their respective zero fan speed idle implementations.
Finally, with load noise levels, RX 480 produces middling (but acceptable) results. Given that we have a mix of blowers and open air coolers here, the RX 480 performs similarly to other mainstream blower based cards. The $199 price tag means that AMD can’t implement any exotic cooling or noise reduction technologies, though strictly speaking it doesn’t need them.
First Thoughts
Bringing our first look at AMD’s new architecture to a close, it’s exciting to see the field shape up for the FinFET generation. After over four years since the last great node transition, we once again are making a very welcome jump to a new manufacturing process, bringing us AMD’s Polaris.
AMD learned a lot from the 28nm generation – and more often than not the hard way – and they have put those lessons to good use in Polaris. Polaris’s power efficiency has been greatly increased thanks to a combination of GlobalFoundries 14nm FinFET process and AMD’s own design choices, and as a result, compared to AMD’s last-generation parts, Polaris makes significant strides where it needs to. And this goes not just for energy efficiency, but overall performance/resource efficiency as well.
Because AMD is launching with a mainstream part first they don’t get to claim to be charting any new territory on absolute performance. But by being the first vendor to address the mainstream market with a FinFET-based GPU, AMD gets the honor of redefining the price, performance, and power expectations of this market. And the end result is better performance – sometimes remarkably so – for this high volume market.
Relative to last-generation mainstream cards like the GTX 960 or the Radeon R9 380, with the Radeon RX 480 we’re looking at performance gains anywhere between 45% and 70%, depending on the card, the games, and the memory configuration. As the mainstream market was last refreshed less than 18 months ago, the RX 480 generally isn’t enough to justify an upgrade. However if we extend the window out to cards 2+ years old to things like the Radeon R9 280 and GeForce GTX 760, then we have a generational update and then-some. AMD Pitcairn users (Radeon HD 7800, R9 270) should be especially pleased with the progress AMD has made from one mainstream GPU to the next.
Looking at the overall performance picture, averaged across all of our games, the RX 480 lands a couple of percent ahead of NVIDIA’s popular GTX 970, and similarly ahead of AMD’s own Radeon R9 390, which is consistent with our performance expectations based on AMD’s earlier hints. RX 480 can't touch GTX 1070, which is some 50% faster, but then it's 67% more expensive as well.
Given the 970/390 similarities, from a price perspective this means that 970/390 performance has come down by around $90 since these cards were launched, from $329 to $239 for the more powerful RX 480 8GB, or $199 when it comes to 4GB cards. In the case of the AMD card power consumption is also down immensely as well, in essence offering Hawaii-like performance at around half of the power. However against the GTX 970 power consumption is a bit more of a mixed bag – power consumption is closer than I would have expected under Crysis 3 – and this is something to further address in our full review.
Finally, when it comes to the two different memory capacities of the RX 480, for the moment I’m leaning strongly towards the 8GB card. Though the $40 price increase represents a 20% price premium, history has shown that when mainstream cards launch at multiple capacities, the smaller capacity cards tend to struggle far sooner than their larger counterparts. In that respect the 8GB RX 480 is far more likely to remain useful a couple of years down the road, making it a better long-term investment.
Wrapping things up then, today’s launch of the Radeon RX 480 puts AMD in a good position. They have the mainstream market to themselves, and RX 480 is a strong showing for their new Polaris architecture. AMD will have to fend off NVIDIA at some point, but for now they can sit back and enjoy another successful launch.
Meanwhile we’ll be back in a few days with our full review of the RX 480, so be sure to stay tuned.