Shubham Patel DJ
RRunPod
•Created by Shubham Patel DJ on 4/24/2025 in #⚡|serverless
Serverless instances are not assigned GPUs, resulting in job error in Production. Require Assist
Error Message 1 with Stack Trace:
Task Failed [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudnnStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudnnStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDNN failure 4: CUDNN_STATUS_INTERNAL_ERROR ; GPU=0 ; hostname=0220236a79a1 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=177 ; expr=cudnnCreate(&cudnnhandle); \n
Error Message 2:
Failed to get job. | Error Type: ClientOSError | Error Message: [Errno 104] Connection reset by peer
Will refreshing the worker help in this situation ?
12 replies