R
RunPod12mo ago
ashleyk

Broken serverless worker - can't find GPU

Serverless worker qbw30nmknd6cmh is broken can't can't find the GPU.
{
"dt":"2024-02-19 23:34:37.252459"
"endpointid":"qbw30nmknd6cmh"
"level":"error"
"message":"An exception was raised: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedConv node. Name:'Conv_24' Status Message: CUDNN failure 4: CUDNN_STATUS_INTERNAL_ERROR ; GPU=0 ; hostname=acb6f843d220 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/nn/conv.cc ; line=382 ; expr=cudnnFindConvolutionForwardAlgorithmEx( GetCudnnHandle(context), s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), max_ws_size); "
"workerId":"ptrh2jn7wjkcmd"
}
{
"dt":"2024-02-19 23:34:37.252459"
"endpointid":"qbw30nmknd6cmh"
"level":"error"
"message":"An exception was raised: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedConv node. Name:'Conv_24' Status Message: CUDNN failure 4: CUDNN_STATUS_INTERNAL_ERROR ; GPU=0 ; hostname=acb6f843d220 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/nn/conv.cc ; line=382 ; expr=cudnnFindConvolutionForwardAlgorithmEx( GetCudnnHandle(context), s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.y_tensor, s_.y_data, 1, &algo_count, &perf, algo_search_workspace.get(), max_ws_size); "
"workerId":"ptrh2jn7wjkcmd"
}
1 Reply
ashleyk
ashleykOP12mo ago
It might also be worth mentioning that this is the first time I've seen this error in almost 12,000 requests to the endpoint.

Did you find this page helpful?