R
RunPod3mo ago
Giray

Can't see training progress after reset

hello, i've started a new training on a notebook and then my computer restarted. after restarting, i sign in my runpod account and opened the traning instance. then i can't see any progress anymore. it shows gpu memory using, but how can see training progress?
No description
No description
Solution:
Jupiter notebooks do not save output on tab browser closing. Though the job continues to run one app finish to run it should update cell
Jump to solution
2 Replies
Giray
Giray3mo ago
i just add a new cell and then run it, but it doesn't give any output, probably the above cell is continuing to running but i can't see any progress of the traning.
No description
Solution
Madiator2011
Madiator20113mo ago
Jupiter notebooks do not save output on tab browser closing. Though the job continues to run one app finish to run it should update cell
Want results from more Discord servers?
Add your server
More Posts
Maintenance - only a Community Cloud issue?Hey there! I just started a new pod and noticed this maintenance window. Is this only a thing on cWebsite glitching when trying to create pod - on Chrome and BraveI am trying to create a new Community GPU pod, I clicked on templates, AVG pops up with warning, I cSDK GPU naming specificationWhen I am setting up a pod using the sdk how specific does the GPU name have to be? Is there a list How to get a general idea for max volume size on secure cloud?I have been able to deploy 2TB drives, but what is the standard here? How much storage is there geneWhich version of vLLM is installed on Serverless?There is currently a bug on vLLM that causes Llama3 to not utilising the stop tokens correctly. ThisWhen using vLLM on OpenAI endpoint, what is the point of runsync/run?I just managed to create a flexible worker on serverless. It works great and I can do text completioWhat is the CUDA version of the A6000 48GB endpoint?I keep running into the following error randomly when I get requests where the worker is stuck in anEfficient way to load the modelI'm migrating my service to RunPod and I need some advice on the best way to handle a 200MB model. Can we run aphrodite-engine on Serverless?aphrodite-engine is a fork from vLLM and also supports exl2 format, which gives it a huge advantage.Template pytorch-1.13.1 lists cuda 11.7.1 version but is actually cuda 11.8?I tried running a model that requires pytorch-1.13.1 and 11.7 but it said the cuda version doesn't m