Nafi
Explore posts from serversNetwork Volume Integrity
Ever since last night every pod I deploy on my network volume:
fpomddpaq0
there are certain files that I cannot open (I believe they have been corrupted). I get a 'launcher error 524' (timeout) when I try to open these specific files (.ipynb). I have tried changing images to the latest pytorch image but that did not help. I have cross checked with a fresh volume in the same region and the error does not occur there. I have now confirmed the issue using the file
command via web terminal but it causes a timeout when trying to read those files, but not any other others. I am writing this post as those files had a lot of code that I will now have to rewrite from bits and pieces, a big waste of time. I am quite annoyed at this and am informing to prevent future incidents. For some additional context, I was running a CPU-intensive training and all of a sudden I was getting no response from the pod (there was a yellow exclamation warning on it on the pod deployments page) so after a while of waiting (an hour) I terminated the pod, and then when I tried to redeploy I couldn't (waiting for logs
) so I slept on it and when I woke up the corruption was there.3 replies
Cron task delay
I have a railway cron container running at 54 minutes past each hour. For some reason, sometimes there is a delay for the container start, and this delay can be very long (for example, this last execution it actually started the job at :06 past the next hour). The container image is not large.
6 replies
RRunPod
•Created by Nafi on 6/29/2024 in #⚡|serverless
What is meant by a runner?
I have created my worker template and I am configuring GH actions. I am just unsure of what
RUNNER_24GB
is supposed to be, as to create a serverless endpoints require a container image but building and testing is the point of the CI/CD pipeline?19 replies
0 GPU pod makes no sense
I have network storage attached to my pods. I don't care if a GPU gets taken from me, but it's very inconvenient that I have to spinup a completely new pod when it does. I am automating runpod via the CLI, and at the moment I dont see any way to deploy a fresh instance and GET the ssh endpoint. I think just slapping on a warning saying you have to start fresh when a GPU gets taken and finding the next available one makes so much more sense, especially when using network storage.
87 replies
requests dependency warning
I am running the project on Railway.
Warning:
venv/lib/python3.11/site-packages/requests/__init__.py:109: RequestsDependencyWarning: urllib3 (2.2.1) or chardet (2.3.0)/charset_normalizer (3.3.2) doesn't match a supported version!
Environment: Docker, Ubuntu. Building from Dockerfile.
Relevant Dockerfile lines:
Requirements:
Other requirements omitted for brevity.
The warning is shown regardless of whether the requests
library is used.
Reading online, the consensus is that updating the requests
library solves the problem, but I am using the latest version (2.31.0) and the warning is still there.
Originally, this was my pip install
line in the Dockerfile:
RUN /usr/src/app/venv/bin/pip install --no-cache-dir -r /usr/src/app/requirements.txt
I added the --upgrade
and --force-reinstall
flags to no avail.
It could be an issue with a library I have omitted, but there are 95 in total and many are likely unrelated.23 replies
Can't connect to redis internal but can over ipv4 addr
I am trying to use the railway internal network for a Redis store in a NodeJS server but I can only connect using the external ipv4 address. Here is the JS connection logic:
REDIS_URL
is a service variable defined in the NodeJS service.
ipv4 address: redis://default:(password here)@viaduct.proxy.rlwy.net:27261
ipv6 address: redis://default:(password here)@redis.railway.internal:6379
I have also tried redis://default:(password here)@redis:6379
as this is valid schema.
Error Log:
13 replies
Allow dynamic endpoint extensions
I am running a telegram bot api server to handle large files: https://github.com/aiogram/telegram-bot-api
I am successfully able to use the server and use telegram's
getFile
method to retrieve the file location, however when I navigate to the URL (the file is stored on an attached volume) I get a 404 method not allowed error. How can I fix this?17 replies
Railway occasionally fails to detect open ports
I've recently created the
SurrealDB
template for Railway, and upon further experimentation I've found that making changes to the original github repository, or publishing a new docker image, sometimes causes Railway to not handle incoming requests via any networking solution (railway TCP proxy, HTTPS cloudflare proxying). The issue seems to resolve itself after detaching the database volume, and creating a completely new service for the database, then reattaching the volume (but this method is not a guaranteed fix). I'm building an app that will be shipped to production very soon and critical outages like this cannot happen every time there is a SurrealDB update. I've tested everything on my local machine with no issues, so I'm starting to wonder whether it's a server-side latency issue on Railway?71 replies