Coder pod count explodes
When coder runs the pod count explodes and most get evicted immediately, how do I resolve this?
40 Replies
<#1264976044721438823>
Category
Help needed
Product
Coder OSS (v2)
Platform
Linux
Logs
Please post any relevant logs/error messages.
@Nax what template are you using? did you use any of the example ones?
I followed the k8s deployment guide w/ helm beat for beat
yeah this behavior is pretty weird, could you send the logs for your Coder daemon over?
Sure, how do I pull those?
or in DMs if you prefer
grab the pod's logs via
kubectl
(e.g kubectl logs coder
)
assuming you're also running Coder within k8sYes
to me this is either one of two things: either the template is wrong (but that's probably not the case because that shouldn't cause anything to happen at startup), or there's a bug in
provisionerd
that causes it to spawn a lot of pods/respawn old pods
i've got to go but feel free to ping me so that I can look in a bitscaling it back up to one replication to try and view logs, the pods explode and get evicted before I'm able to view their logs.
even the Coder instance itself?
yeah
I wish I were more familiar with k8s internals, it's because of these that I'm not wholly confident this is a Coder issue.
oh yeah your node has disk pressure it seems
can you send the output of
kubectl get componentstatus
?
how used is your disk? or the partition that hosts the root?how should I check that?
oh i sent you the wrong command
df -h
let me find the right one
i forgot what it is
try kubectl describe nodes
this is
df -h
you're going to need to extend the disk
k8s disables itself when the free disk space is too low
What I don't understand is why it's filled?
right now the partition on your / (/dev/sda3) is 90% used
that I don't know, are you using a lot of different container images in your templates?
images can be large
I'd recommend installing
ncdu
to track down what's taking up the most spacenothing here is huge in terms of image size
try to narrow down which directories are the biggest in your /
var @ 18G
var/lib @ 15G
var/lib/rancher @ 15G
what's the output of
du -hxs /*
?yeah honestly you're going to need to extend that disk
PersistentVolumeClaims have a maximum amount of size, if that maximum is not available then there's probably some security mechanism in place
you can try to see which volume is the biggest but really i'm not surprised by the space it's taking up since you have images and user data
But that's the thing, everything I'm running currently isn't using the disk space, only coder is
mm, seems I upped the disk space but didn't extend the partition, I still think 30g is plenty for what i had running but I'll expand it to 128gb and see whether the issue subsides
what do you mean?
do your workspaces not have persistent data?
Sorry isn't using that much disk space
Most things are a couple of mbs, coder exploded into using gb
do you have a lot of user data ?
no
could you try to pinpoint what takes the most space in your volumes?
uhh no I tried to expand the disk and now the server won't boot
š
as seen w/ @Nax, it was indeed a disk space issue
@Phorcys closed the thread.