caseus
caseus
RRunPod
Created by caseus on 3/24/2024 in #⛅|pods
Linux kernel version is 5.4.0
per accelerate: https://github.com/huggingface/accelerate/blob/85a75d4c3d0deffde2fc8b917d9b1ae1cb580eb2/src/accelerate/utils/other.py#L314C1-L331C1
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
basically H100's are currently unusable as it hangs for me using accelerate to train models.
3 replies