First of all, I would like to thank our sponsor Zed Up Gaming PCs for kindly providing us with a Ryzen 5 PRO 4650G. This is not at all a matter of course, as AMD did not initially plan Renoir for the retail market. In this test I'm not interested in the Vega graphics unit on the APU, but in the special tuning potential in relation to the RAM. But that is not the only special feature. Because AMD is pursuing a monolithic design with Renoir, the intercore latencies turn out better. I will prove this with appropriate benchmarks. On the other hand, the Ryzen 5 Pro has comparatively little L3 cache with just 8 MiB. So the question I want to answer with this test is this: Can Renoir with its special tuning potential and lower intercore latencies overcompensate for the small L3 cache and perhaps even beat Matisse?
The test systems
Matisse
- CPU: Ryzen 9 3950X 6 Core Configuration (1 CCD and 3-3 CCX)
- Cooler: be quiet! Dark Rock PRO 4
- Board: X470 ASRock Taichi
- RAM: 4x8 G.Skill Trident Z 3800MT/s CL16
- GPU: SAPPHIRE Pulse RX 5700@1800MHz +20% PL
Renoir
- CPU: Ryzen 5 PRO 4650G
- Cooler: be quiet! Dark Rock PRO 4
- Board: X570 Gigabyte Aorus Master F21 BIOS
- RAM: 4x8 G.Skill Trident Z 4200MT/s CL16
- GPU: SAPPHIRE Pulse RX 5700@1800MHz +20% PL
The test systems are configured so that max OC competes against max OC. However, the "max" must be considered relatively. I did the RAM tuning as good as I could. Although I'm not a professional, the results are impressive. Of course there was help from the RAM OC community (Discord). Thanks for that at this point. The core clock potential of the Renoir APU turns out rather modest at around 4.3GHz at 1.4V. To my surprise the gaming performance with fixed core clock was even slightly worse, so the APU ran on Auto in the gaming benchmarks, whereas the 3950X was clocked with 4.5GHz. This might seem unfair at first glance, but it is exactly the configuration with which both opponents showed the best performance. The 3950X thus most closely corresponds to a 3600XT, which can be clocked at around 4.5GHz thanks to the somewhat better manufacturing process. Both the Renoir APU and the Matisse CPU are not pushed to the limit and offer further optimization potential. But this makes the comparison more realistic, because not everyone is an ambitious overclocker. It should be stressed that no IPC comparisons are made. The test aims at what can realistically be got out of the hardware by normal OC. Leaving the Matisse CPU unoverclocked would not be fair, because Ryzen 3000 also offers a solid optimization potential.
Application benchmarks
AIDA64 Extreme
Let's first take a look at the latencies and bandwidths of the memory hierarchy, which the AIDA64 Cache & Memory Benchmark tells us.
The tuned Ryzen 4000 can clearly stand out, both in terms of latency and bandwidth. The RAM latency, which is important for gaming performance, is about 14% better in comparison to the also tuned Matisse. The read bandwidth is even about 16% faster. Because there is no bottleneck in the connection (number of lanes) to the I/O die, the write throughput corresponds to the read throughput, as we already know from Zen and Zen+.
SisSoftware Sandra intercore latency test
7-Zip
Next we look at the compression and decompression performance.
Compression performance differs significantly, while decompression is comparable. Many factors play a role here that can influence the results. One should not ignore the power limit of the APU. The cache, on the other hand, will make the main difference.
Packetwise memory accesses
This benchmark is my own development. The memory accesses are randomized. The duration of the processing of different packet sizes is measured. This presents the levels of the memory hierarchy.
In addition to the tabular form, the data is also displayed as a curve diagram, which better illustrates the differences from 4 MiB upwards.
As expected, the processing time from 4 MiB on the 3950X is lower because the R9 has more L3 cache. Just before the 32 MiB mark, the curves converge again because the accesses mostly run over the RAM. The 4650G's faster memory shows its effect here. The blue curve runs from this mark above the orange curve.
Cinebench R20
Of course Cinebench must not be missing in application benchmarks.
Although the power limit slows the 4650G down, the rates are only marginally apart. Cinebench is indeed very cache-heavy, but the smaller cache is still completely sufficient. A similar thing could already be observed in the 2400G and 3400G.
Geekbench 5
Geekbench is a popular suite that combines many different individual tests. Version 5 is not as RAM sensitive as its predecessor, making the suite very suitable for IPC considerations by the way.
The single core score turns out higher as expected due to the higher boost clock. The multi core score is somewhat lower due to the power limit. The higher RAM performance does not help the Renoir APU here.
Game benchmarks
The main focus of this test is on the game benchmarks, which we will now take a closer look at. The question mentioned in the introduction, whether the better tuning potential of the memory can compensate for the smaller L3 cache, shall now be answered. How important is the L3 cache really for gaming performance? The memory tests already give an idea of how the results will turn out. The course consists of 7 known AAA titles, which cover all common APIs. There is a YouTube video for each scene to provide maximum transparency. Furthermore all settings are given. The resolution is 720p, to fulfill all requirements for a proper ranking. Comparisons in which faster hardware is limited upwards by other components, such as the graphics card, are not only unfair, but basically even invalid.
The Division 2
- Scene: Youtube
- Settings: 720p, ultra preset, AA/AF/AO min, 50% render scale, 18:00 world time (photo mode)
Death Stranding
- Scene: Youtube
- Settings: 720p, very high, AA/AO min, post processing + CAS off
Ghost Recon Breakpoint
- Scene: Youtube
- Settings: 720p, 80% render scale, details very high, terrain and gras quality very high, everything else min or off
Battlefield V
- Scene: Youtube
- Settings: 720p, 25% render scale, AF/AA/AO min, post processing off, mesh quality low
Far Cry New Dawn
- Scene: Youtube
- Settings: 720p, ultra preset, AA/AF min, low texture quality, blur off
Star Wars: Jedi Fallen Order
- Scene: Youtube
- Settings: 720p, epic preset, AA/AF min, post processing off
Metro Exodus
- Scene: Youtube
- Settings: 720p, ultra preset, AF min, Hairworks off, tesslation off
The optimization potential of RAM OC is considerable. The clearly better latencies (RAM and intercore) don't help to overtake Matisse, though. The CPU's clock difference only plays a minor role (probably about 3%). The importance of the L3 cache for the gaming performance becomes abundantly clear. Here you can already guess what performance increases Zen 3 could bring with it through the uniform, 32MiB large L3 cache or "Game Cache", as AMD calls it.
Conclusion
The application performance of the Ryzen 4000 APU is impressive and falls only slightly behind the Matisse CPU. Only the 7-Zip compression benchmark differs significantly. The single core performance is understandably lower due to the lower boost clock rates.
The gaming performance is comparable, but this has to be "fought" with relatively tight RAM OC. The tuning potential is undoubtedly considerable and can be used by enthusiasts. In spite of the significant extra performance compared to the configuration according to the specification, the "Game Cache" wins in the game benchmarks.