Cajoek
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memor
Hi I keep getting
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
when trying to train a model on RunPod with a large batch size. I can't reproduce the error locally.
I found this https://github.com/pytorch/pytorch#docker-image and this https://pytorch.org/docs/stable/multiprocessing.html#strategy-management but I'm not sure how to fix the problem.11 replies