R
RunPod10mo ago
ashleyk

Broken serverless worker - can't find GPU

Serverless worker qbw30nmknd6cmh is broken can't can't find the GPU.
{
"dt":"2024-02-19 23:34:37.252459"
"endpointid":"qbw30nmknd6cmh"
"level":"error"
"message":"An exception was raised: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedConv node. Name:'Conv_24' Status Message: CUDNN failure 4: CUDNN_STATUS_INTERNAL_ERROR ; GPU=0 ; hostname=acb6f843d220 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/nn/conv.cc ; line=382 ; expr=cudnnFindConvolutionForwardAlgorithmEx( GetCudnnHandle(context), s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), max_ws_size); "
"workerId":"ptrh2jn7wjkcmd"
}
{
"dt":"2024-02-19 23:34:37.252459"
"endpointid":"qbw30nmknd6cmh"
"level":"error"
"message":"An exception was raised: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedConv node. Name:'Conv_24' Status Message: CUDNN failure 4: CUDNN_STATUS_INTERNAL_ERROR ; GPU=0 ; hostname=acb6f843d220 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/nn/conv.cc ; line=382 ; expr=cudnnFindConvolutionForwardAlgorithmEx( GetCudnnHandle(context), s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), max_ws_size); "
"workerId":"ptrh2jn7wjkcmd"
}
1 Reply
ashleyk
ashleykOP10mo ago
It might also be worth mentioning that this is the first time I've seen this error in almost 12,000 requests to the endpoint.
Want results from more Discord servers?
Add your server