C
Coder.comā€¢7mo ago
nax

Coder pod count explodes

When coder runs the pod count explodes and most get evicted immediately, how do I resolve this?
No description
40 Replies
Codercord
Codercordā€¢7mo ago
<#1264976044721438823>
Category
Help needed
Product
Coder OSS (v2)
Platform
Linux
Logs
Please post any relevant logs/error messages.
Phorcys
Phorcysā€¢7mo ago
@Nax what template are you using? did you use any of the example ones?
nax
naxOPā€¢7mo ago
I followed the k8s deployment guide w/ helm beat for beat
Phorcys
Phorcysā€¢7mo ago
yeah this behavior is pretty weird, could you send the logs for your Coder daemon over?
nax
naxOPā€¢7mo ago
Sure, how do I pull those?
Phorcys
Phorcysā€¢7mo ago
or in DMs if you prefer grab the pod's logs via kubectl (e.g kubectl logs coder) assuming you're also running Coder within k8s
nax
naxOPā€¢7mo ago
Yes
Phorcys
Phorcysā€¢7mo ago
to me this is either one of two things: either the template is wrong (but that's probably not the case because that shouldn't cause anything to happen at startup), or there's a bug in provisionerd that causes it to spawn a lot of pods/respawn old pods i've got to go but feel free to ping me so that I can look in a bit
nax
naxOPā€¢7mo ago
scaling it back up to one replication to try and view logs, the pods explode and get evicted before I'm able to view their logs.
No description
Phorcys
Phorcysā€¢7mo ago
even the Coder instance itself?
nax
naxOPā€¢7mo ago
yeah
nax
naxOPā€¢7mo ago
No description
nax
naxOPā€¢7mo ago
I wish I were more familiar with k8s internals, it's because of these that I'm not wholly confident this is a Coder issue.
No description
Phorcys
Phorcysā€¢7mo ago
oh yeah your node has disk pressure it seems can you send the output of kubectl get componentstatus? how used is your disk? or the partition that hosts the root?
nax
naxOPā€¢7mo ago
how should I check that?
nax
naxOPā€¢7mo ago
No description
Phorcys
Phorcysā€¢7mo ago
oh i sent you the wrong command df -h let me find the right one i forgot what it is try kubectl describe nodes
nax
naxOPā€¢7mo ago
No description
nax
naxOPā€¢7mo ago
this is df -h
nax
naxOPā€¢7mo ago
Phorcys
Phorcysā€¢7mo ago
you're going to need to extend the disk k8s disables itself when the free disk space is too low
nax
naxOPā€¢7mo ago
What I don't understand is why it's filled?
Phorcys
Phorcysā€¢7mo ago
right now the partition on your / (/dev/sda3) is 90% used that I don't know, are you using a lot of different container images in your templates? images can be large I'd recommend installing ncdu to track down what's taking up the most space
nax
naxOPā€¢7mo ago
No description
nax
naxOPā€¢7mo ago
nothing here is huge in terms of image size
nax
naxOPā€¢7mo ago
No description
Phorcys
Phorcysā€¢7mo ago
try to narrow down which directories are the biggest in your /
nax
naxOPā€¢7mo ago
var @ 18G var/lib @ 15G var/lib/rancher @ 15G
Phorcys
Phorcysā€¢7mo ago
what's the output of du -hxs /*?
nax
naxOPā€¢7mo ago
root@k3s:/# du -hxs /*
0 /bin
253M /boot
4.0K /cdrom
0 /dev
5.1M /etc
592K /home
0 /lib
0 /lib32
0 /lib64
0 /libx32
16K /lost+found
4.0K /media
4.0K /mnt
4.0K /opt
du: cannot access '/proc/1226/task/1347/fdinfo/327': No such file or directory
du: cannot access '/proc/1226/task/1472/fd/327': No such file or directory
du: cannot access '/proc/1226/task/1530/fdinfo/327': No such file or directory
du: cannot access '/proc/1846183/task/1846183/fd/4': No such file or directory
du: cannot access '/proc/1846183/task/1846183/fdinfo/4': No such file or directory
du: cannot access '/proc/1846183/fd/3': No such file or directory
du: cannot access '/proc/1846183/fdinfo/3': No such file or directory
0 /proc
26M /root
7.7M /run
0 /sbin
8.0K /snap
4.0K /srv
4.1G /swap.img
0 /sys
60K /tmp
3.0G /usr
18G /var
root@k3s:/#
root@k3s:/# du -hxs /*
0 /bin
253M /boot
4.0K /cdrom
0 /dev
5.1M /etc
592K /home
0 /lib
0 /lib32
0 /lib64
0 /libx32
16K /lost+found
4.0K /media
4.0K /mnt
4.0K /opt
du: cannot access '/proc/1226/task/1347/fdinfo/327': No such file or directory
du: cannot access '/proc/1226/task/1472/fd/327': No such file or directory
du: cannot access '/proc/1226/task/1530/fdinfo/327': No such file or directory
du: cannot access '/proc/1846183/task/1846183/fd/4': No such file or directory
du: cannot access '/proc/1846183/task/1846183/fdinfo/4': No such file or directory
du: cannot access '/proc/1846183/fd/3': No such file or directory
du: cannot access '/proc/1846183/fdinfo/3': No such file or directory
0 /proc
26M /root
7.7M /run
0 /sbin
8.0K /snap
4.0K /srv
4.1G /swap.img
0 /sys
60K /tmp
3.0G /usr
18G /var
root@k3s:/#
Phorcys
Phorcysā€¢7mo ago
yeah honestly you're going to need to extend that disk PersistentVolumeClaims have a maximum amount of size, if that maximum is not available then there's probably some security mechanism in place you can try to see which volume is the biggest but really i'm not surprised by the space it's taking up since you have images and user data
nax
naxOPā€¢7mo ago
But that's the thing, everything I'm running currently isn't using the disk space, only coder is mm, seems I upped the disk space but didn't extend the partition, I still think 30g is plenty for what i had running but I'll expand it to 128gb and see whether the issue subsides
Phorcys
Phorcysā€¢7mo ago
what do you mean? do your workspaces not have persistent data?
nax
naxOPā€¢7mo ago
Sorry isn't using that much disk space Most things are a couple of mbs, coder exploded into using gb
Phorcys
Phorcysā€¢7mo ago
do you have a lot of user data ?
nax
naxOPā€¢7mo ago
no
Phorcys
Phorcysā€¢7mo ago
could you try to pinpoint what takes the most space in your volumes?
nax
naxOPā€¢7mo ago
uhh no I tried to expand the disk and now the server won't boot šŸ™ƒ
Phorcys
Phorcysā€¢6mo ago
as seen w/ @Nax, it was indeed a disk space issue
Codercord
Codercordā€¢6mo ago
@Phorcys closed the thread.

Did you find this page helpful?