AMD V-Cache

Past few days I've been going through some research for a server upgrade (currently using a Ryzen 9 5900X). As I would like to upgrade my personal PC, I've made the decision to upgrade the server and use the current hardware as my PC, Which is a BIG upgrade coming from an ancient I7 4th gen CPU.. But I've got down the rabbit hole of finding claims around V-cache being useful for games, but also hosting server like Factorio and possible MC. That said I'm yet to find any sort of benchmark or results. But as I'm already locking dockers and VMs to cores. I don't think the V-Cache only being present on the first CCD would cause an issue. In fact I would be able to test the server on both CCDs and draw my conclusions. And spread other applications accordingly too. I would be getting a 7900X3D. Unless someone is telling me differently. I've thought about getting the 7800X3D but this would of course remove to option to choose between CCDs. And I'm also don't see how I can get away with 8 less threads compared to what I have now. What are your opinions around these CPUs with V-cache? Would it be worth the extra 110€ between 7900X and 7900X3D? The price difference is not much of an issue for me. Would you still recommend a different Intel CPU?
73 Replies
Admincraft Meta
Thanks for asking your question!
Make sure to provide as much helpful information as possible such as logs/what you tried and what your exact issue is
Make sure to mark solved when issue is solved!!!
/close !close !solved !answered
Requested by flexz#0
ProGamingDk
ProGamingDk2y ago
Factorio has a big difference with vcache from what i know Source: @Discount Milk minecraft there isnt any evidence afaik?
Flexz
FlexzOP2y ago
Thats why I'm still not sure. I see many people claiming it should help. But no one seems to actualy own (and run) a server on those V-cache CPUs.
ProGamingDk
ProGamingDk2y ago
uh a TON does not as a physical machine but hetzner's ryzen 9 7950x3ds dedis very popular
Flexz
FlexzOP2y ago
But then we get into CCD issue, only the first has V-cache. The other one acts as a normal 7950X If you don't pin the application certain threads it may be running on a normal 7950X CPU. And if it is running spread over both CCDs you get into performance hits because of the cross talk required of both CCDs I would love multiple spark reports of a server running pinned to one and the other CCD and maybe even one that is forced to run on both or something. Just to see of it would give any significant/noticeable change.
AeonRemnant
AeonRemnant2y ago
Damn finally someone but me notices that.
Flexz
FlexzOP2y ago
https://www.reddit.com/r/factorio/comments/11dzrty/ryzen_9_7950x3d_is_not_better_in_factorio_then/ This is what I'm referring to. The result shown it performed not great. But in comments you find people reacting to another benchmark that beat the 5800X3D. Meaning the link is showing factorio running on the wrong CCD or mixed. They also point out that CPU affinity is important.
Reddit
From the factorio community on Reddit: Ryzen 9 7950X3D is NOT bette...
Explore this post and more from the factorio community
AeonRemnant
AeonRemnant2y ago
I mean what's MC actually limited by in the main thread? Where exactly is the bottleneck, do we know? Is it throughput? IPC? Efficiency? We're clearly gaining speed with IPC improvements, so that points to it being an instructions limitation, so idk how effective V Cache is for that.
Flexz
FlexzOP2y ago
I'm just shopping for a new CPU. Trying to decide if it's worth the extra 110€. We ran into issues with create and 4 players online (dropping to 10TPS) but that is related to an issue within Create mod that has been patched
AeonRemnant
AeonRemnant2y ago
@ProGamingDK Do we have any hardware engineers and/or software engineers in this discord that'd chat about this?
ProGamingDk
ProGamingDk2y ago
@Disconsented
AeonRemnant
AeonRemnant2y ago
Like good and knowledgeable ones? I mean other than him.
ProGamingDk
ProGamingDk2y ago
uh no clue then
AeonRemnant
AeonRemnant2y ago
Hmm.
Disconsented
Disconsented2y ago
😢 I'll read this tomorrow
Disconsented
Disconsented2y ago
But I've got down the rabbit hole of finding claims around V-cache being useful for games, but also hosting server like Factorio and possible MC. That said I'm yet to find any sort of benchmark or results.
https://images.anandtech.com/graphs/graph18795/132231.png As for MC, I have seen no evidence to suggest it'd benefit despite my own speculation. I'd presume it doesn't.
But as I'm already locking dockers and VMs to cores. I don't think the V-Cache only being present on the first CCD would cause an issue. In fact I would be able to test the server on both CCDs and draw my conclusions. And spread other applications accordingly too.
I does cause issues, but you can also manually wrangle applications to a single CCD and then its a non factor.
I would be getting a 7900X3D. Unless someone is telling me differently. I've thought about getting the 7800X3D but this would of course remove to option to choose between CCDs. And I'm also don't see how I can get away with 8 less threads compared to what I have now.
I don't really recommend the 7900X3D, its in an akward space. Generally speaking, for a gaming focued machine you want a 7800X3D. Because its either the fastest, or almost the fastest CPU and its incredibly efficient. Drawing up to a third of the power that Intel will under load. 7950X3D for hybird work/games with the 7800X3D for primarily games. https://tpucdn.com/review/intel-core-i9-14900k/images/relative-performance-games-1280-720.png There's extra value here, the games that do actually scale with Vcache scale really damn well. These tend to be sim games like factorio, MSFS or some MMOs like WOW and co. These generally aren't captured in benchmarks, so if they're relevant to you, its worth considering.
Disconsented
Disconsented2y ago
Hetzner is using them because they're at much lower power draw stock lol Its single core performance, how fast you can process the main loop. The yardstick paper even validates this academically.
Discount Milk
Discount Milk2y ago
Yardstick paper?
TheCubeN00B
TheCubeN00B2y ago
That’s not what he was asking, though. He was asking which component of the CPU was the bottleneck.
Disconsented
Disconsented2y ago
Its not a component is the problem The part I replied to wasn't asking about components directly either
TheCubeN00B
TheCubeN00B2y ago
A component, throughout, or something similar that is the limiting factor is the question he asked. Saying ‘single core’ doesn’t provide much insight into why this CPU is better or worse
Disconsented
Disconsented2y ago
Its really damn complicated, there isn't a single component or aspect of it Its everything from the process its built on, to the libraries it uses, to the PPA trade offs for the design. To the compiler support, how well it does branch prediction, handling op caching, how ops are broken down... to levels of caching in their size, throughput and latency as well as the complicated trade offs of those in relation to eachother and their hireachy
TheCubeN00B
TheCubeN00B2y ago
Yeah that does make it quite a difficult question to answer definitively then
Disconsented
Disconsented2y ago
Hence, single core performance, because thats the abstraction thats relevant here
Snow Kit
Snow Kit2y ago
single core performance is important because generally, having multiple threads won't increase the performance of a single minecraft server
Disconsented
Disconsented2y ago
The point of extra cores for MC, is to have something else process tasks besides the main loop thus giving the main loop more time on the core without being interupted.
Snow Kit
Snow Kit2y ago
or at least, the difference between 8 and 12 cores won't make a difference if you're only running minecraft
Disconsented
Disconsented2y ago
Amdahl's law
In computer architecture, Amdahl's law (or Amdahl's argument) is a formula which gives the theoretical speedup in latency of the execution of a task at fixed workload that can be expected of a system whose resources are improved. It states that "the overall performance improvement gained by optimizing a single part of a system is limited by the ...
Snow Kit
Snow Kit2y ago
only things in paper I'm aware of that are done on seperate threads are async plugin tasks, chunk generation (limited to 8 cores max by default), reading chunks from disk and networking. All entity logic, redstone logic and generic world logic is on done on a single thread
Flexz
FlexzOP2y ago
Unfortunately I'm using forge. I just need more then 16 threads because the server does more then MC alone. At some occasions multiple MC servers. What I learned from all the reactions so far. It's a waste of money if you don't use CPU affinity. But even then chances are the first CCD with additional V-cache wont give additional performance. I should investigate if anything else on the server could benefit from this additional cache. If not I may just be better of with 7900x. The only known benefit so far would be when I use the hardware as my PC in a few years.
Snow Kit
Snow Kit2y ago
yeah, for reference, forge has even less multithreading than paper does from my understanding
Flexz
FlexzOP2y ago
Didn't paper exist to improve performance to bukkit in the first place?
Snow Kit
Snow Kit2y ago
although maybe the increased core speeds of the 7900x would beat the benefits of the higher cache size from the 7800x3d, it's really hard to know unfortunately. Apparently most people who host game servers don't care enough to buy two nearly identical $400 cpus to find a 10% performance difference 😅
Flexz
FlexzOP2y ago
I think the 7800x3d is most likely the better option for hosting only MC. Less power draw. No CCD issues. It's basicly a 7950x3d with the seconds CCD disabled.
Disconsented
Disconsented2y ago
Vcache is great at hiding poor RAM performance
ProGamingDk
ProGamingDk2y ago
yes spigot* Also bug/exploit fixes my internet is dying dont worry about me
Disconsented
Disconsented2y ago
It also appears to reduce the load on the Infinity Fabric The nice part about the Vcache parts, is that they're just running at sensible power targets
Flexz
FlexzOP2y ago
Also something i found online. Would it improve the situation knowing I'll be using 128GB of memory. According to AMD site: 4x2R DDR5-3600. The only 128GB I found at the store where I would buy the components was 5600.
ProGamingDk
ProGamingDk2y ago
ddr5 at 3600?
Flexz
FlexzOP2y ago
Snow Kit
Snow Kit2y ago
Feel free to correct me, but the most important part of memory would specifically be the first word latency, which is how long it takes for the memory to return data to the cpu might not be listed, but it's a combination of cas latency and memory speed
Disconsented
Disconsented2y ago
^
AeonRemnant
AeonRemnant2y ago
Interesting. So there’s not a relatively simple way to isolate what’s causing issues, even if it’s difficult to fix. Ikr. No cache and you best be running DDR5-6000 CL30 or you WILL be noticing drawbacks. Cache on and you can yolo down to 4000Mhz and it’s about fine.
Disconsented
Disconsented2y ago
Correct, CPU pipelines are incredibly complex and minecraft doesn't exactly do 1 thing.
Flexz
FlexzOP2y ago
So a 96GB 2dimm kit with CL32 would be better then a 128GB 4dimm kit with CL40? I should look into it, 96Gb might be enough. The 96GB kits also go up to DDR5-6800
AeonRemnant
AeonRemnant2y ago
You want whatever your maximum capacity is in the number of RAM channels your CPU can handle at CL30. Ryzen chips are dual channel RAM, but most boards have 4 RAM slots. This runs 2 sticks per 1 channel if you populate all of the slots. This kills a smidge of your performance. As for speed? 6000Mhz is the sweet spot for all DDR5 Ryzen, with lower CAS latency being better. The exception to this is chips with V Cache as it compensates for lower speed, but not higher latency. The problem here is that the only Ryzen 7000 chip to feature V Cache on all CCXs is the 7800X3D, the higher end SKUs only have V Cache on half of the cores, so you need to be more careful about CPU assignment if you truly are trying to use it to compensate for the RAM itself.
Quinn
Quinn2y ago
I’m kinda confused by the logic here. If you’re referring to first word latency off cuff then that’s a combination of both cas and ras. Though why only focus on first word latency? Are you assuming that queries would only be 64 bit for some reason I’m kind of confused? Because if more than 64 bits then fast page mode so row access time might not matter. Though in that case the actual bus speed won’t matter because the memory can set up the next operation at the same time as it’s bussing the information back. It’s not a time loss if you need to get several words from the memory. So saying that cas and memory speed both matter especially here is a bit confusing. It would be either CAS and RAS + memory speed ( I assume you mean bus speed) matter or just CAS matters in fast page mode with a more than 64 bit query.
Snow Kit
Snow Kit2y ago
No clue, I’ve also read a giant chunk of text that said reads depend on tRCD, tCL and BL, never mentioning tRAS
Quinn
Quinn2y ago
RAS would probably not be important for reads generally because reads are often more than one word so the row wouldn’t need to be specified every time. Though for a first word latency RAS definitely matters because just like CAS you need to set RAS. Though I think the problem is that first word latency isn’t the pivotal quantifier here. Because reads are more often than not going to be more than one word. ^ more often than not meaning probably always in most situations For OP, this kind of seems like a YOLO decision to me unless you have an estimate for the V-cache miss rate vs the miss rate for non v-cache and with the miss penalty for non v-cache and the added miss penalty for v-cache. Because I mean if you are running 100 Gb of memory it seems kind of hard to eyeball it like, maybe the benefits from increased hit chance for L3 would be better for v-cache but if v-cache is only 96 mb then who’s to say? Personally it seems like because in minecraft the main bottleneck problem is the main thread having to wait, then maybe it’s better to look at worst case scenario which might be the main thread having to wait for a memory request due to a cache miss. If in the worst case v-cache miss penalty is considerably worse then maybe that’s a hard sell because the cache miss is when you really have the opportunity for the CPU to just be stalling Personally I would optimize for miss penalty because do you trust random plug-in developers to organize their plugins in ways which make sense? I don’t. Maybe it just takes one plug-in to be accessing way more memory than it needs to be for all the benefits of the v-cache to go out the window
Flexz
FlexzOP2y ago
Meaning I should spend the 110€ difference on better memory instead of V-cache version of the CPU. I'm already doing some research but unless I go for a 96GB kit there just aren't many options I think. And I think I should be fine with a 96GB kit anyway. The only time I get there is when I host multiple MC servers. And they are usually given to much memory anyway for the rare moments we do have many players online at the same time. But that's more of a 'because we can'
AeonRemnant
AeonRemnant2y ago
For the love of fuck just do the 96GB. I thought 64 was enough. It isn't. In no world is 64GB actually enough RAM. :Sadge:
Quinn
Quinn2y ago
Ya flex that’s what makes sense to me though I’m going off so little information that’s it’s like 🤷‍♂️ could make sense could not
Flexz
FlexzOP2y ago
This was never an option as I'm always hovering around 64GB already. But 128GB kits just don't have many options
AeonRemnant
AeonRemnant2y ago
You can get 48 gig DIMMs for your platform.
Flexz
FlexzOP2y ago
!answered
Admincraft Meta
post closed!
The post/thread has been closed!
Requested by flexz#0
Flexz
FlexzOP2y ago
Just for more information. I ended up buying a 7950X3D on black friday as it was cheaper then a 7950X and the same price as 7900X3D. Even if V-cache didn't help it would still be better then my 5900X. More information: - 5900X has 128GB 3200 kit - 7950X3D has 96GB 6400 kit Same server (copy) of a relative large base with Create mod. (Forge) 5900X: 11 ms/tick (4cores/8threads of CCD1) 7950X3D: Every docker had 7 threads of their CCD - CCD0: 4.9 ms/t (second run is about the same) - CCD0HT: 5 ms/t - CCD0: 9.9 ms/t (second run is stable around 6.3 ms/t) - CCD0HT: 10.5 ms/t It was a short test. So not going to say there are other factors in play. The 7950X3D is also currently outside of a case to test things before doing the upgrade. None the less I think we can say V-Cache has some influence in performance
No description
Flexz
FlexzOP2y ago
I can already tell that there is like no difference between HT or not on the same CCD. Currently getting 5.2 ms/t on CCD0. I think this is expected of course. Using 3 cores / 6 threads: CCD0: 4.8 ms/t CCD1: 6 ms/t CCD0 is much more stable quicker. It takes much more time to stabelize on CCD1 .. I feel like a lot of performance gains is due to faster memory. As a 7950X vs 5900X should only be a 20% difference on single core performance. But CCD1 (close to a normal 7950X) is 60% faster. That said it's still impressive that V-cache is still giving even better performance. @AeonRemnant@Disconsented I'm sorry for the ping. But I assume you both would be interested in some limited test result of a V-cache CPU with MC.
AeonRemnant
AeonRemnant2y ago
Oh hell yes.
Flexz
FlexzOP2y ago
If someone has a map that would create a larger load on the server. Feel free to share. Willing to run some tests.
AeonRemnant
AeonRemnant2y ago
Yeah it’ll be interesting to see how it scales up and out with players as well.
Flexz
FlexzOP2y ago
Any ideas on how to simulate a decent load. While still being somewhat realistic? Is there some sort of benchmark somewhere? I never used a mod to generate chunks. Maybe that could be used to simulate a player traveling. I assume thats going to be somewhat the same load each time if I start from the same copy. I was also thinking. The copy of the server I was running now allows for chunkloading. I could do that for a few of the players bases giving a way larger load then 4ms/t. Combination of the 2. It doesn't simulate multiple players though I could also do a spark. Not sure if it allows for an option to automaticly stop after x minutes? If someone knows a forge mod to generate chunks. That would be nice. Preferred if it also provides a regular update of how much it has done on an interval. Just to see an extra value between runs. Other then TPS or ms/t
Skullians
Skullians2y ago
Chunky is a forge pregenerator https://www.curseforge.com/minecraft/mc-mods/chunky-pregenerator-forge you can run multiple pregens at the same time (overworld, end and nether)
ProGamingDk
ProGamingDk2y ago
it does, /spark profiler start --timeout (time in seconds)
AeonRemnant
AeonRemnant2y ago
Spawn in a fuckload of Minecraft baritone bots and have them go mine diamonds.
ProGamingDk
ProGamingDk2y ago
well command not option
Flexz
FlexzOP2y ago
The only issue i would have with this is that no run would be equal. Correct me if i'm wrong
Discount Milk
Discount Milk2y ago
Could you try to repeat it on the same seed so it is equal? IE if you just script out the same behavior on the same seed, it would be equal "just"
AeonRemnant
AeonRemnant2y ago
What Leche said, yes. Not a premade solution for this.
Discount Milk
Discount Milk2y ago
@EterNity You were working on/know of work on some form of a world that can be used for some form of benchmarking, right? nodders
ProGamingDk
ProGamingDk2y ago
awesome, also high Eternity
Disconsented
Disconsented2y ago
Mobs & other entities are what typically eat your TPS, thats what you need to simulate

Did you find this page helpful?