Nafi
Nafi
Explore posts from servers
RRunPod
Created by Nafi on 10/30/2024 in #⛅|pods
Network Volume Integrity
Ever since last night every pod I deploy on my network volume: fpomddpaq0 there are certain files that I cannot open (I believe they have been corrupted). I get a 'launcher error 524' (timeout) when I try to open these specific files (.ipynb). I have tried changing images to the latest pytorch image but that did not help. I have cross checked with a fresh volume in the same region and the error does not occur there. I have now confirmed the issue using the file command via web terminal but it causes a timeout when trying to read those files, but not any other others. I am writing this post as those files had a lot of code that I will now have to rewrite from bits and pieces, a big waste of time. I am quite annoyed at this and am informing to prevent future incidents. For some additional context, I was running a CPU-intensive training and all of a sudden I was getting no response from the pod (there was a yellow exclamation warning on it on the pod deployments page) so after a while of waiting (an hour) I terminated the pod, and then when I tried to redeploy I couldn't (waiting for logs) so I slept on it and when I woke up the corruption was there.
3 replies
RRailway
Created by Nafi on 9/14/2024 in #✋|help
Indestructible Project
Been trying to delete a project for a couple weeks and it just won’t go away.
4 replies
RRailway
Created by Nafi on 8/19/2024 in #✋|help
cron runs fine, rare but inconvenient bug
4 hours ago I had a project wide cron timeout, which prevented further deploys since (I had to deploy manually). How can I fix this without having to doing an intracontainer cron?
22 replies
RRailway
Created by Nafi on 8/5/2024 in #✋|help
Cron deployment failure
Every time I add a cron schedule to a specific instance in a project it fails. It works fine when I remove the cron.
15 replies
RRailway
Created by Nafi on 7/18/2024 in #✋|help
Cron task delay
I have a railway cron container running at 54 minutes past each hour. For some reason, sometimes there is a delay for the container start, and this delay can be very long (for example, this last execution it actually started the job at :06 past the next hour). The container image is not large.
6 replies
RRunPod
Created by Nafi on 6/29/2024 in #⚡|serverless
What is meant by a runner?
I have created my worker template and I am configuring GH actions. I am just unsure of what RUNNER_24GB is supposed to be, as to create a serverless endpoints require a container image but building and testing is the point of the CI/CD pipeline?
19 replies
RRunPod
Created by Nafi on 6/23/2024 in #⛅|pods
0 GPU pod makes no sense
I have network storage attached to my pods. I don't care if a GPU gets taken from me, but it's very inconvenient that I have to spinup a completely new pod when it does. I am automating runpod via the CLI, and at the moment I dont see any way to deploy a fresh instance and GET the ssh endpoint. I think just slapping on a warning saying you have to start fresh when a GPU gets taken and finding the next available one makes so much more sense, especially when using network storage.
87 replies
RRailway
Created by Nafi on 4/13/2024 in #✋|help
requests dependency warning
I am running the project on Railway. Warning: venv/lib/python3.11/site-packages/requests/__init__.py:109: RequestsDependencyWarning: urllib3 (2.2.1) or chardet (2.3.0)/charset_normalizer (3.3.2) doesn't match a supported version! Environment: Docker, Ubuntu. Building from Dockerfile. Relevant Dockerfile lines:
RUN apt-get update
RUN apt-get install -y python3 python3-pip python3-venv libcairo2-dev pkg-config python3-dev tesseract-ocr ffmpeg poppler-utils libportaudio2 swig libpulse-dev libpango1.0-dev
RUN apt-get clean

# Use the virtual environment
RUN python3 -m venv /usr/src/app/venv

# Install dependencies from requirements.txt
RUN /usr/src/app/venv/bin/pip install --no-cache-dir --upgrade --force-reinstall -r /usr/src/app/requirements.txt
RUN apt-get update
RUN apt-get install -y python3 python3-pip python3-venv libcairo2-dev pkg-config python3-dev tesseract-ocr ffmpeg poppler-utils libportaudio2 swig libpulse-dev libpango1.0-dev
RUN apt-get clean

# Use the virtual environment
RUN python3 -m venv /usr/src/app/venv

# Install dependencies from requirements.txt
RUN /usr/src/app/venv/bin/pip install --no-cache-dir --upgrade --force-reinstall -r /usr/src/app/requirements.txt
Requirements:
wheel

# -- Problematic section --
requests
urllib3>=2.1.0
charset_normalizer
# -------------------------
wheel

# -- Problematic section --
requests
urllib3>=2.1.0
charset_normalizer
# -------------------------
Other requirements omitted for brevity. The warning is shown regardless of whether the requests library is used. Reading online, the consensus is that updating the requests library solves the problem, but I am using the latest version (2.31.0) and the warning is still there. Originally, this was my pip install line in the Dockerfile: RUN /usr/src/app/venv/bin/pip install --no-cache-dir -r /usr/src/app/requirements.txt I added the --upgrade and --force-reinstall flags to no avail. It could be an issue with a library I have omitted, but there are 95 in total and many are likely unrelated.
23 replies
RRailway
Created by Nafi on 3/27/2024 in #✋|help
Can't connect to redis internal but can over ipv4 addr
I am trying to use the railway internal network for a Redis store in a NodeJS server but I can only connect using the external ipv4 address. Here is the JS connection logic:
const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();
const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();
REDIS_URL is a service variable defined in the NodeJS service. ipv4 address: redis://default:(password here)@viaduct.proxy.rlwy.net:27261 ipv6 address: redis://default:(password here)@redis.railway.internal:6379 I have also tried redis://default:(password here)@redis:6379 as this is valid schema. Error Log:
node:internal/process/promises:289

triggerUncaughtException(err, true /* fromPromise */);

^

Error: getaddrinfo ENOTFOUND redis

at GetAddrInfoReqWrap.onlookupall [as oncomplete] (node:dns:118:26)

Emitted 'error' event on Commander instance at:

at RedisSocket.<anonymous> (/usr/src/app/node_modules/@redis/client/dist/lib/client/index.js:412:14)

at RedisSocket.emit (node:events:519:28)

at RedisSocket._RedisSocket_connect (/usr/src/app/node_modules/@redis/client/dist/lib/client/socket.js:166:18)

at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

at async Commander.connect (/usr/src/app/node_modules/@redis/client/dist/lib/client/index.js:185:9)

at async /usr/src/app/index.js:38:3 {

errno: -3008,

code: 'ENOTFOUND',

syscall: 'getaddrinfo',

hostname: 'redis'

}

Node.js v21.7.1
node:internal/process/promises:289

triggerUncaughtException(err, true /* fromPromise */);

^

Error: getaddrinfo ENOTFOUND redis

at GetAddrInfoReqWrap.onlookupall [as oncomplete] (node:dns:118:26)

Emitted 'error' event on Commander instance at:

at RedisSocket.<anonymous> (/usr/src/app/node_modules/@redis/client/dist/lib/client/index.js:412:14)

at RedisSocket.emit (node:events:519:28)

at RedisSocket._RedisSocket_connect (/usr/src/app/node_modules/@redis/client/dist/lib/client/socket.js:166:18)

at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

at async Commander.connect (/usr/src/app/node_modules/@redis/client/dist/lib/client/index.js:185:9)

at async /usr/src/app/index.js:38:3 {

errno: -3008,

code: 'ENOTFOUND',

syscall: 'getaddrinfo',

hostname: 'redis'

}

Node.js v21.7.1
13 replies
RRailway
Created by Nafi on 3/8/2024 in #✋|help
Service IP
Are service IPs rotating? I need to whitelist one of my services IPs to a proxy provider.
9 replies
RRailway
Created by Nafi on 2/28/2024 in #✋|help
Allow dynamic endpoint extensions
I am running a telegram bot api server to handle large files: https://github.com/aiogram/telegram-bot-api I am successfully able to use the server and use telegram's getFile method to retrieve the file location, however when I navigate to the URL (the file is stored on an attached volume) I get a 404 method not allowed error. How can I fix this?
17 replies
RRailway
Created by Nafi on 2/23/2024 in #✋|help
Cannot delete unpublished template
When I try to delete an old template I get this error Error Not Authorized:
{"response":{"errors":[{"message":"Not Authorized","locations":[{"line":2,"column":3}],"path":["templateDelete"],"extensions":{"code":"INTERNAL_SERVER_ERROR","exception":{"status":400}}}],"data":null,"status":200,"headers":{}},"request":{"query":"mutation templateDelete($id: String!) {\n templateDelete(id: $id)\n}","variables":{"id":"1822d872-71ac-496f-9a36-5cdefab7617c"}}}
{"response":{"errors":[{"message":"Not Authorized","locations":[{"line":2,"column":3}],"path":["templateDelete"],"extensions":{"code":"INTERNAL_SERVER_ERROR","exception":{"status":400}}}],"data":null,"status":200,"headers":{}},"request":{"query":"mutation templateDelete($id: String!) {\n templateDelete(id: $id)\n}","variables":{"id":"1822d872-71ac-496f-9a36-5cdefab7617c"}}}
28 replies
RRailway
Created by Nafi on 2/23/2024 in #✋|help
Kickback Scope
Is kickback Railway account scoped or team scoped? If team scoped, is it possible to make it account scoped?
10 replies
RRailway
Created by Nafi on 2/2/2024 in #✋|help
Replicas on services with attached volumes
Is this possible?
10 replies
RRailway
Created by Nafi on 1/31/2024 in #✋|help
Europe Outage
A lot of posts about this just wanted to report it on my end also. I have some containers in eu and I’ve been getting timeouts for the last 10-20 mins. Have not tested in other regions
14 replies
RRailway
Created by Nafi on 1/22/2024 in #✋|help
Not Authorized to delete my own template (not published)
No description
4 replies
RRailway
Created by Nafi on 12/23/2023 in #✋|help
Railway occasionally fails to detect open ports
I've recently created the SurrealDB template for Railway, and upon further experimentation I've found that making changes to the original github repository, or publishing a new docker image, sometimes causes Railway to not handle incoming requests via any networking solution (railway TCP proxy, HTTPS cloudflare proxying). The issue seems to resolve itself after detaching the database volume, and creating a completely new service for the database, then reattaching the volume (but this method is not a guaranteed fix). I'm building an app that will be shipped to production very soon and critical outages like this cannot happen every time there is a SurrealDB update. I've tested everything on my local machine with no issues, so I'm starting to wonder whether it's a server-side latency issue on Railway?
71 replies