Cajoek
Cajoek
RRunPod
Created by Cajoek on 4/11/2024 in #⛅|pods
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memor
Hi I keep getting ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). when trying to train a model on RunPod with a large batch size. I can't reproduce the error locally. I found this https://github.com/pytorch/pytorch#docker-image and this https://pytorch.org/docs/stable/multiprocessing.html#strategy-management but I'm not sure how to fix the problem.
11 replies