Too many Open Files Error on CPU Pod - Easy Repro
@flash-singh
I think I found an easy repro for the too many open files on CPU Pod:
1) Use the following docker: (you don't necessarily need to do this, it just what I am using for an exact repro)
justinwlin/runpod_pod_and_serverless:1.0
https://github.com/justinwlin/Runpod-GPU-And-Serverless-Base
https://hub.docker.com/layers/justinwlin/runpod_pod_and_serverless/1.0/images/sha256-b350b9e75cc9f32b6ca38fda623bd9b02072869611f8304c47ed33ddc4f37094?context=repo
Port 8888 for HTTP; Port 22 for TCP
Launch a CPU Pod
2) Can launch through terminal or jupyter notebook > terminal.
2) Run:
pip install setuptools-rust
3) Run:
pip install whisperx
Error:
GitHub
GitHub - justinwlin/Runpod-GPU-And-Serverless-Base
Contribute to justinwlin/Runpod-GPU-And-Serverless-Base development by creating an account on GitHub.
3 Replies
Just posting this, cause this has been a problem with CPU Pods in general when using it, and there was previously not an easy way to repro this error due to usually only showing up with a large about of file processing.
This looks similar to previous issues supporte
Just want to say, even trying to run a whisperx script after building a dockerfile with all the dependencies preinstalled, trying to get around this, i get the same errors:
root@09a589886e1e:/app# python preload.py
Traceback (most recent call last):
File "/app/preload.py", line 1, in <module>
File "/app/venv/lib/python3.10/site-packages/whisperx/init.py", line 1, in <module>
File "/app/venv/lib/python3.10/site-packages/whisperx/transcribe.py", line 7, in <module>
File "/app/venv/lib/python3.10/site-packages/torch/init.py", line 1631, in <module>
File "/app/venv/lib/python3.10/site-packages/torch/quantization/init.py", line 7, in <module>
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 879, in exec_module
File "<frozen importlib._bootstrap_external>", line 1016, in get_code
File "<frozen importlib._bootstrap_external>", line 1073, in get_data
OSError: [Errno 23] Too many open files in system: '/app/venv/lib/python3.10/site-packages/torch/quantization/quant_type.py'
root@09a589886e1e:/app#
😦 kind of makes CPU Pod hard to use comparatively to GPU Pod.
I can provide a repro for this if needed, but I think the installation repro in this post is good enough.
Old issue link for reference:
https://discord.com/channels/912829806415085598/1211077846513090640
testing this now
not able to reproduce it
I have noticed ulimit is set to 1million, you can increase it since you have root access with cpu nodes,
ulimit -n unlimited
is there a specific cpu flavor your using that causes this? i tested with 2vcpu and no issuesHm. I’ll try again later! Yeah, I was using 2vcpu 🧐 weird.
If i can repro, I’ll try to record it and get a better understanding of the conditions
Huh, I am unable to reproduce it now too... 😅. Well I guess that is great than haha. I can go ahead and create my whisperx cpu endpoint then~ yay! 😄 (It does work now!! AH EXCITED! Can't wait to share this template!~) Having a whisperx cpu serverless option is amazing!!
This is great to know! Thanks! I'll resolve this question!
For those in the future, the right command is a slightly different flag:
Example: