R
RunPod4mo ago
caseus

Linux kernel version is 5.4.0

per accelerate: https://github.com/huggingface/accelerate/blob/85a75d4c3d0deffde2fc8b917d9b1ae1cb580eb2/src/accelerate/utils/other.py#L314C1-L331C1
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
basically H100's are currently unusable as it hangs for me using accelerate to train models.
GitHub
accelerate/src/accelerate/utils/other.py at 85a75d4c3d0deffde2fc8b9...
🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision - huggingface/accelerate
1 Reply
caseus
caseus4mo ago
@Alpay Ariyak