It’s unified memory indeed, wierdly MS’s Xbox series X has asymmetric bandwidth though even though it is unified memory.
PS5 just has a big single bus of GDDR6.
Think of it this way… AMD’s SSG card, that’s the first place the did anything like this, basically the took a GPU + PLX chip + SSD on the same card. And the GPU could memory map anything on the SSD basically. The SSD was just a normal SSD from the OS perspective also just behind a PLX chip.
This takes that a step further and moves the control of when things get moved back and forth out of the driver/OS/main CPU and moves it into a coprocessor… advantage being you can build something small just to do that thus saving die space and power. Also HBCC touted some ability to move things back and forth based on your typical cache methods scoreboarding etc… or something like that but it was predominately controlled in the driver, for instance the RoCm drivers have some support for it but it is effectively just virtual memory for OpenCL and nothing more, all the smarts were in the driver.
I think what they are doing on PS5 is they have an IO controller that profiles each frame determines when mapped pages are needed during a frame from the SSD before the GPU even needs them, so it can predictively load based on the last frame so you have zero latency intra frame streaming from the SSD rather than on demand loading. The big sram is not so much cache as it is profiling data for the IO controllers. All the CPU ends up doing is telling the IO controller if data is going to be used or not so for instance if your game engines knows a texture is offscreen it could unmap the texture and potentially spill any profiling data to ram. Interestingly Zen 3 or 4 will likely be able to spill the micro-opcache to ram… as AMD released a patent on this recently.
Anyway all of that reduces how much you have to keep loaded during each stage of a frame, and may let you plan access patterns around GPU bandwidth usage.
Apparently “Sampler Feedback Streaming” is now part of DX12Ultimate (aka DX13 but we can’t call it that), It’s something that requires a hardware access profiler to be fast otherwise it would already be done today.
Also I think it is not so much about making it nessicarily faster they could have made one of the big zen cores do this and attach all the decomrpession/coherencey/dma hardware to it… but there is alot of stuff in a zen core you don’t need for this task so they didn’t for the PS5, while MS kind of did also they wen’t with a more conventional cache.
Also the conherency stuff is why more important than you’d think… since it is now able to prevent cache flushes by scrubbing memory instead of flushing (even Navi flushes a times). Which results in a significant percentage of bandwidth usage improvement.