Worker frozen during long running process

request ID: sync-f144b2f4-f9cd-4789-8651-491203e84175-u1 worker id: g9y8icaexnzrlr I have a process that should in theory take no longer than 90 seconds The template is configured to not timeout when i test the process via the requests tab in the UI, the logs for the process print smoothly until about halfway through the process, and then the logs disappear. The job never completes, and the worker goes idle after a minute or two. I cant see the logs to know if there is a failure or error. Does someone mind checking on this for me?
22 Replies
nerdylive
nerdylive2mo ago
@yhlong00000
zfmoodydub
zfmoodydubOP2mo ago
a little more context: given its a relatively long running process, i started testing with the run endpoint, to which i was not able to get a stream of logs past 15 seconds. So then i switched to runsync, which i understand is supposed to be for relatively quick processes, in attempt to see more of the logs. i was able to see more of the logs, but then the aforementioned original problem arose.
nerdylive
nerdylive2mo ago
Which runpodctl version are you using there
zfmoodydub
zfmoodydubOP2mo ago
going to take me 10 mins to build the image again to run a version command and check but will shoot it in here asap while im waiting to get you the exact version, im using python 3.8. so whatever version gets installed automatically in that python version. import runpod print(f"Runpod version: {runpod.version}") Runpod version: 1.7.3 so after reading about runpodclt, this library is meant for streaming logs to your machine, and you were not asking what version of the runpod library i was using in my code. i have only been using the UI for testing. I will now try to use runpodclt to stream logs to see if i can get a better idea of what is happenning later in my process runpodctl v1.14.4
nerdylive
nerdylive2mo ago
Oh I meant the runpod library, my bad I heard that new versions of the runpod library has bugs but I'm not sure if it causes rhis Yeah it might be
zfmoodydub
zfmoodydubOP2mo ago
no luck getting log stream with runpodclt
Poddy
Poddy2mo ago
@zfmoodydub
Escalated To Zendesk
The thread has been escalated to Zendesk!
zfmoodydub
zfmoodydubOP2mo ago
@yhlong00000 not sure if you can help... the requests are now getting lost as well, they get placed in a queue but then are never returned as failed. here is another example: pod id: h6uc0sa88m2n5t request id: e2980df5-7861-46b7-8c9a-65a4171c30ad-u1
yhlong00000
yhlong000002mo ago
hey, this is caused by sdk 1.7.3, try to downgrade to 1.6.2 should solve your problem, we should have 1.7.4 release soon and it will fix this.
zfmoodydub
zfmoodydubOP2mo ago
great thanks. just to be clear, im not able to retrieve any logs via the ctl tool. testing via the UI, the logs stop, perhaps they are too verbose, i get a message in the container logs saying: No Container logs yet, this usually means that the pod is still initializing and my process halts about halfway through, and the worker goes idle. so even if im downgrading the ctl tool to try to extract the logs, im not sure they will even be available. i will check though. also on github there is no release for 1.6.2, only 1.6.1 and 1.7.0. brew install returns this: brew install [email protected] No available formula with the name "[email protected]". Did you mean runpodctl?
nerdylive
nerdylive2mo ago
You ah the cli tool isn't for retrieving logs No it's "runpod" pip package Runpod only is SDK, runpodctl is the cli you can use for creating resources(pods, Network storage), delete resources , etc
zfmoodydub
zfmoodydubOP2mo ago
got it
nerdylive
nerdylive2mo ago
Nice
zfmoodydub
zfmoodydubOP2mo ago
i still think i am unable to retrieve container logs via the runpod library though, is that correct? thats what im trying to do as there seems to be a bug in the UI
nerdylive
nerdylive2mo ago
You can, via the website when it's running When your worker is running, click one of them then there Wil be a button called logs You can print messages from inside your worker to be out on the logs
zfmoodydub
zfmoodydubOP2mo ago
right, but this whole thread ive been talking about how those logs disappear halfway through the execution. ~15 seconds in. and i cant see any of my debug logs because of that
nerdylive
nerdylive2mo ago
That's unexpected behavior, what do you find now after downgrading runpod library to like 1.6.2 then re deploying the worker image How much do you print BTW? Is it spammy or Alot?
zfmoodydub
zfmoodydubOP2mo ago
the whole process would probably be abt 200 lines i see what you mean now. downgrade the version in the container i build. sorry for my misunderstanding
nerdylive
nerdylive2mo ago
Ah yeah it is inside the worker image, no problem maybe I didn't make it clear enough for you
zfmoodydub
zfmoodydubOP2mo ago
it seems as though downgrading the package worked. thank you very much for your help
nerdylive
nerdylive2mo ago
Nice your welcome
deanQ
deanQ2mo ago
SDK 1.7.4 has been released. Thank you for your patience.
Want results from more Discord servers?
Add your server