Stress Testing/Performance Testing
Hello, I'd like to get some input from you guys on stress testing and performance testing.
I and some friends have been working together on various projects and especially of late have been trying to optimize and research different forks and performance of different systems. As we've done so we've found a lot of conflicting arguments over which forks are better and what to do for performance. As such we have started talking about testing the performance of different softwares and different settings, but I'd like to get some information on what you guys would say should be done to make this as accurate as possible and as automatable as possible.
Here are some questions I'd like addressed.
For stress testing, would it be most accurate by having a pregenned world of a specific size with default settings set on each software tested?
Would using bots rather than actual players change the performance at all or are bots seen and treated as regular players by the server? specifically bots that are created/run outside of the server and made to join.
What should be used to make these tests most accurate in being able to measure performance differences between server softwares as well as between having/not having plugins or datapacks, etc.
I'm sure I'll have more questions, but I will post them here as needed.
30 Replies
Thanks for asking your question!
Make sure to provide as much helpful information as possible such as logs/what you tried and what your exact issue is
Make sure to mark solved when issue is solved!!!
/close
!close
!solved
!answered
Requested by lavaking46#0
Also what would be thoughts on testing how performance differs between features on different softwares, such as chunk gen, or entity processing, etc.
Chunkgen isn't really that great of a benchmark of real world usage. Best case scenario is you have like 60-70 autonomous agents running around playing Minecraft.
I've done something similar to this with impact client and baritone
With bots:
yes they’re more ‘accessible’ however they by no means simulate a normal player, who will be mining, making farms, breeding animals etc.
Sure you can set the spawn radius to something ridiculous so bots spawn very spread out to find the upper limits of your server with a bunch of chunks loaded but that will only do so much.
It’s better to test while being pregenerated of course because when public you will be using a pregenerated world.
There’s always so little testing before release can do, and as always you can over optimise prematurely.
As with chunk gen…
when pregenerating I would bump your worker threads higher than normal to speed it up by the way and decrease afterwards
Things like fabric w/ c2me are way faster last time I checked for chunk gen.
In general you’re better off using a paper fork like pufferfish.
If you can provide some estimates to how many players you’ll be expecting and what kind of server you’re running, that would be great.
Softwares like fabric have arguably worse performance and less ‘essential’ mods for a public server too
I should fr start making fabric mods to bring some parity 🤔
Fabric has kotlin right?
data packs… are weird
Avoid them if possible, use plugin alternatives if you really want to.
Pregenerating is DEFINITELY a must if you have generation data packs
yeah with fabric language Kotlin
(That’s the name of the library)
We have a variety of servers we are planning to run eventually and some that already exist but run more infrequently as they are events.
The bots would not simply exist or run around. I was planning on using bot software that would allow programming the bots to do specific things, which would "replicate" player actions(Not perfect, but better than having empty blank players)
Things like soul fire can do pathfinding sure but they’re incredibly cpu heavy and you’ll struggle to do a good test with bots depending on how many bots you try use
A* pathfinding and similar is terrible in terms of cpu usage
The list of our types of servers is as follows
* An event server, typically on the latest version with a custom set map with up to 16 players (as is)
* A SMP like server that has a variety of custom features and systems that will be on a mostly if not entirely custom map
* A custom adventure like map that we have made that we may run on a few servers
* An SMP that will use a datapack with lots of custom features and mob stuff, datapack based as it will be designed for release for singleplayer and multiplayer. May have a plugin version but we are going to keep as datapack as much as possible as its designed mostly for singleplayer with it being made to work with multiplayer after
The number of bots would probably be 16
if you’re doing ~20-30 players on each server then you’re generally fine sticking with paper or pufferfish
for initial testing*
The SMP-datapack combo will be something to look out for. Datapacks can be incredibly laggy especially when it comes to those which bring in a lot of mechanics. They scale QUICKLY for players
I have experience with that unfortunately, I do however have experience with doing that and know some people who are good at optimizing those systems.
wonderful
Yo wait
Voyager 😂😂
Run like 100 instances
$5/second in API costs 🔥🔥
Ez stress test
GitHub
GitHub - MineDojo/Voyager: An Open-Ended Embodied Agent with Large ...
An Open-Ended Embodied Agent with Large Language Models - MineDojo/Voyager
huh
is this meant for general
Going back to the actual stress testing, which I would hope results of which could be used by more than just us for comparing different players.
For a baseline on the stress test:
Default paper with no changes made, with a pregenned overworld 10kx10k
All bots will collect basic resources from spawn, this being 8 logs and if available, 1 or more food pieces
16 bots, with 1 set to travel off in each major direction, with instructions to loot structures on the surface
the other 12 bots being set to travel in random directions in order to until they find caves at which point they will start mining.
Test will last 1 hour with spark profiler running the entire time to be analyzed after
this test will be repeated with same parameters at least twice to ensure consistentcy.
Any additional thoughts?
Oh, no. This is a suggestion if they have a shitton of money to burn
Just an idea 👍
you’re better off doing rough optimising on a good confirmed server software like puffer
Then fine tune
obviously not too much
Imo add a bigger variety of bot tasks
Specifically building smth collectively, groups of 3-6 together
we do want to optimize, but we are also hoping to provide a good resource for others to look at so they hopefully don't get quite as confused when looking to optimize or wondering what exactly is gained by swapping between things
go for it, make another guide on top of the two that exist
Your referring to the optimization guides, such as what is provided when you post a spark profiler right?
!optimize
You can follow these guides to optimize your server
Admincraft Canned Responses
I don't think I'm conveying what I'm really wanting to do here.
I don't want to make a guide on what settings to make in order to optimize your server best as those kinda already exist, as you've pointed out. What I'm looking to do is actually test those and get actual numbers. ex. puffer provides 10% more performance than paper, we determined this through these tests that combined show this data.(idk what the real difference is). I wouldn't be looking to go in and remake the existing optimization guides, just gain actual insight on what gains are made where.
Not useful for most people I know, but I think it'd be good
this sounds like a guide 😭
but yeah I get what you mean
Sorry I wasn't being clear.
When doing a stress test, the best you can do is run those test in the closest conditions to what you expect to have on the production environment
You should also have a reference, for example, a paper server without optimizations in the config files, this also helps if you can't replicate what you'll have in the production environment, for example: you're running those tests on a local network, with pc hardware.
This means that it's actually a good idea to pregen the world, as you'd do it on production.
Chunk generation is too expensive anyways and will clutter your results.
However, you can pregen a lot less so you get information on bots loading chunks vs generating chunks after a while
As for bots, I use a project I found that I'm not sure I can share here (in other contexts, people can use them maliciously)
But looking for "minecraft stresser" or something similar in google will help you find one
It makes random accounts join and fly around, to sort of replicate player movement
Won't help to try out command execution or more complex tasks unless you modify the project