Trying to diagnose performance issues
Hello everyone, I am trying to figure out this strange hiccup that is happening on my server(s). Chunk loading performs normally most of the time, and then it will occasionally take an extremely long time
Can see in the bandwidth chart: https://i.imgur.com/7yewJbQ.png
It will just stop and hang for an undetermined amount of time. Sometimes its just a few seconds, other times its 60 seconds and the server shuts down. I am trying to figure out if there could be some bottleneck with my hosting provider, such as the network, disk, etc to cause this? The server is running on 14GB of memory and 3 cpu cores, which provides very fast chunk loading outside of these spikes. I am testing it by flying around in creative mode, but these spikes happen in regular gameplay too. This happens on vanilla but I am using fabric to utilize spark, and it also occurs there. Thanks!
Spark profile: https://spark.lucko.me/6tQQGFKim6
spark
spark is a performance profiler for Minecraft clients, servers, and proxies.
35 Replies
Spark Profile Analysis
ā Processing Error
The bot cannot process this Spark profile. It appears that the platform is not supported for analysis. Platform: Fabric
Requested by p.uppy#0
Thanks for asking your question!
Make sure to provide as much helpful information as possible such as logs/what you tried and what your exact issue is
Make sure to mark solved when issue is solved!!!
/close
!close
!solved
!answered
Requested by p.uppy#0
have you pregenned?
looks like chunk saving
are you on an ssd or hdd?
Its SSD but over nfs
ouch
very ouch
huge yikes moment here
wait
no I misread
this is the wait for next tick
yes you do
but yeah ssd over nfs
not good
no wait wtf
this is on the netty thread, no?
no im wrong ffs
ill just shut up
bruh what
š
netty thread isnt profiled by default
ok ok ok so
thats what we have /spark profiler start --thread * for
Is that net pause we see caused by waiting on the NFS write?
the wait for next tick is up here
this is processing for the move packet
which leads to a getChunk call
which goes into the run tasks, which leads to the parkNanos
you see what I mean?
mesa tired, mesa go sleep
xd
what I think is happening is that it is trying to read a chunk that is unloaded and then since youre on nfs, it is blocking the main thread till that chunk can be read on nfs which is killing your perf
since normally you actually have the world on an ssd or something
the problem is we are using NFS in order for server files to be available for dynamically scaling nodes and containers. is there a way to counter the effects of this?
get faster wifi
your bottleneck seems to be your network speed
you basically need network speed that is as fast as having the ssd locally on the device
which is why id never actually do nfs for server files myself xd
Get fast ethernet
You know what I meant
I do, some dont š
But this 100%
To mitigate it you can also preload your chunks onto your local system further than what nms would use
You'd still have it bad when anyone does anything to move fast like elytras
But early game itd prob be fine assuming you preload enough
then you can also run into collisions of multiple servers having the same chunks
overall idk why you'd do it this way
Hi, server host owner here. For some context: NFS is definitely not a good solution for MC but unfortunately its sort of the best we've got considering our setup. We're a "pay by the minute" server host so we dynamically start and stop servers when players are online/offline. We also dynamically add and remove nodes depending on how many servers are online. Because of this there's no guarantee that the assigned node will be mounted to the volume that we store the servers on, therefore we use NFS to mount to the "always online" node that has the servers volume. While there are definitely performance issues its not too bad considering all the nodes are in the same data center, but its issues like this that occasionally prop up.
I guess I'm curious if yall have any suggestions for this particular use case. I'm wondering if there's a way to tell MC to more aggressively write chunks to disk rather than all at once so there's not these massive lag spikes where we're blocking the main thread, but I'm not sure that's possible. For some more context we're using Longhorn with RWX volumes https://longhorn.io/docs/1.6.2/nodes-and-volumes/volumes/rwx-volumes/
Maybe setting the sync-chunk-writes property to false would help improve performance? That way we're asynchronously writing instead of blocking the main thread
oh yeah this isnt paper
paper forces that to be false
For performance reasons, presumably? I've read there are possible issues with data corruption on a crash (makes sense considering this is an async write). But I'm surprised Paper would force this to be false considering that possibility.
yeah, performance
Cool. Paper disabling that by default makes me more confident that data loss is rare except in the case of a crash. I think what we'll do is advise our users to disable that property and install a backup plugin
for your case I dont see data loss being an issue since part of the shutdown process will including flushing chunks to disk
and then you do whatever you need to do to store that (s3 or whatever you do)
realistically theres always some bit of dataloss on server crashes. that can't really be stopped
Sounds good, thanks. After some adhoc tests this has significantly improved performance
nfs is notoriously bad for minecraft. If you need to use remote storage, iSCSI is probably a better option.
I know SMB would likely be better for Minecraft if you were on windows, but I'm unsure about the current state of SMB on linux