caseus
Linux kernel version is 5.4.0
per accelerate:
https://github.com/huggingface/accelerate/blob/85a75d4c3d0deffde2fc8b917d9b1ae1cb580eb2/src/accelerate/utils/other.py#L314C1-L331C1
basically H100's are currently unusable as it hangs for me using accelerate to train models.
3 replies