Maximum number of A40s that can run at one time
I'm looking to run as many A40s to finish a large-scale inference/LLM generation job. How many could I run at one time? 40, 80, 100?
51 Replies
In practice, many setups use between 2 to 8 GPUs, but some high-performance computing environments may use even more, depending on the specific needs and configuration of the system.
We split our inference jobs into batches
So we can run on any amount of GPUs as we like, but just need someone from RunPod to confirm this is allowed.
it sounds interesting
I am familiar with RunPod
I wanna know how larget is your inference job
200 million rows
can run maybe... 120,000 rows per hour
per instance
i think so
but i need to check
could u gimme runpod account?
?
i mean paid account
Just to be clear, you don't work for runpod
and all I need is an answer from runpod that I can run 100x A40s
so I'm not gonna give you access to my runpod acocunt
lol
i can understand u
maybe go to https://contact.runpod.io/hc/en-us/requests/new
And ask your question there 🙂 they would be able to give you an official info about that
yup
We have a good amount of A40 GPUs available. In the ticket let us know the time you usually need them, the duration you plan to run them, and whether this is just for your current project or a longer-term, ongoing need. This information will help us better plan our capacity.😄
Ah, thanks! We just spun up 100, seems like may have consumed all of them, lol.
I got paged by the alert policy we setup internally for A40 availability as our product currently relies on that.
Of course it is your right to spin up as many as you want but can you kindly let me know if this is going to be a one-off thingie or something you will be running in long-term? We've been happily enjoying the high availability of A40 but there are now only ~27 GPUs left lol
Our job should finish in an hour.
Sorry about that!
Support told me it was okay haha
I’m happy to ping y’all ahead of time
lol yeah for sure it won't be an issue of course, not your fault no need to say sorry!
it's just our thing that we only have added A40 to the autoscaling pool for now, cuz seemed like there were plenty of A40s couple days/weeks back.
I think we anyways need to add more GPU types to the pool to adapt to any case.
and 🤞 for your batch job 😉
There’s a bug so!
Back to figuring it out and spinning up tomorrow
I’m sorry, I don’t have specific details about the future plans, but I know that we’re continuously working with suppliers to add more based on demand. The more you use a particular one, the more likely we are to expand it.😀
We're just splitting our database into chunks, downloads, processes, then uploads when complete
BTW, are you in the same region? It might be worth checking availability in other regions as well.😆
I just spun down our job so released 100 GPUs back
@yhlong00000 is there a way to see quantity of GPUs? rather than just high/low
For customers, it’s not available. Let me check if there’s a specific reason why we don’t display it. Will get back to you later.
We're pooling from CA-MTL-1 and EU-SE-1, as they are the only datacenters with network volume support with A40s.
There is way to do that if you use GraphQL. The doc states that there is
totalCount
and rentedCount
.
If you run the query:
with variable:
you will be able to see the rented count and total count:
but seems like the rented count and total count is not strictly from that specific datacenter, but aggregated tho..Yeah, cool. I can work off that
😂 ok, you guys are smart than me
haha soulmind has an idea of howmuch gpu's in a dc damn
👍 the only thing is, it seems like the GraphQL API is responding with the combined # of GPUs, not the # of GPUs in the specific dc...
ah that shouldn't be right if you're filtering with a dc id
if it does like that please report it as a bug in #🧐|feedback
yeah I will do, cuz I've been monitoring the values for a while, and seems like the
totalCount
, and rentedCount
for same GPU but different dc shows the same value:
Starting that batch job again
We might take all the A40 capacity
or the remainder of it
how long u guys gonna be running for ?
3h
~3h
okies
is this gonna b an ongoing thing ?
yes
ok
but mostly 1-2 times per week
all good dude , i specificly set my workflow up in eu because when i get to use it no one is using them
lol
are you using spot?
I feel like it's releasing spot instances right now lol
@sluzorz I see you're using up all the A40s! Do you know if there's a way to transfer all my data from one pod to another? I'm happy using another gpu, but I have a lot of stuff downloaded to my current pod which is on A40
Cloud sync and rclone
I think their cloud sync is just rclone
wew bro what are you doing with the a40's
training something?
hey also sluorz, if this goes on commonly, try to tell runpod about this so they can accomodate other user's needs on a40
Large batch inference jobs with Bart
Oooh
Yeah, we’ve let runpod know.
@sluzorz will you be finished about now?
Some of our batches are finishing now
But we still have about 30 remaining since we couldn't spin up 100 A40s
Oooh
okay please let me know when you've finished
We're mostly done
But we will probably consume more in a few hours for embedding
The A40s in runpod as just too good of an offering