Restarting without error message
I'm deploying some code to serverless and it seems the code crashes and restarts the process, without an error message.
In the logs it just shows that it has restarted, I can tell by my own startup logging.
In the end I could make it work by using an specific version of CUDA and an specific version of a dependency, but I would like to know why it crashes, to fix it.
Everything works fine locally with nvidia-docker...
7 Replies
I have a custom template that can reproduce the issue. I deleted the broken workers and logs.
Its impossible to tell unless you add error logging to your handler. Then you can view the error logs in your logs tab.
I have error logging, but it shows nothing.
It prints the model path, and restarts.
Best to test it on GPU cloud to determine what the issue is then, maybe it can't find the
path.txt
file or something.I can fix it by downgrading https://github.com/abetlen/llama-cpp-python/releases to v0.2.23
The path etc work correctly, I'm testing it locally with nvidia docker too.
To me it feels like a bug in serverless UI, it can't report logs if the python process crashes, it seems.
I did not try to report this error with another process, the docker command is
CMD python3.11 -u /handler.py
Its can only report on exceptions once you actually call
runpod.serverless.start()
, it is not aware of any exceptions before thats called so its not a bug.It is very weird because logs work, so I can print "about to call the LammaCpp constructor" and it shows this message in the logs in the UI. But it doesn't show the error, is just shows the NVIDIA
CUDA Version 11.8.0
etc which it shows when the docker image starts.