R
RunPod3mo ago
sluzorz

Maximum number of A40s that can run at one time

I'm looking to run as many A40s to finish a large-scale inference/LLM generation job. How many could I run at one time? 40, 80, 100?
51 Replies
Steve Wozniak
Steve Wozniak3mo ago
In practice, many setups use between 2 to 8 GPUs, but some high-performance computing environments may use even more, depending on the specific needs and configuration of the system.
sluzorz
sluzorz3mo ago
We split our inference jobs into batches So we can run on any amount of GPUs as we like, but just need someone from RunPod to confirm this is allowed.
Steve Wozniak
Steve Wozniak3mo ago
it sounds interesting I am familiar with RunPod I wanna know how larget is your inference job
sluzorz
sluzorz3mo ago
200 million rows can run maybe... 120,000 rows per hour per instance
Steve Wozniak
Steve Wozniak3mo ago
i think so but i need to check could u gimme runpod account?
sluzorz
sluzorz3mo ago
?
Steve Wozniak
Steve Wozniak3mo ago
i mean paid account
sluzorz
sluzorz3mo ago
Just to be clear, you don't work for runpod and all I need is an answer from runpod that I can run 100x A40s so I'm not gonna give you access to my runpod acocunt lol
Steve Wozniak
Steve Wozniak3mo ago
i can understand u
nerdylive
nerdylive3mo ago
maybe go to https://contact.runpod.io/hc/en-us/requests/new And ask your question there 🙂 they would be able to give you an official info about that
sluzorz
sluzorz3mo ago
yup
yhlong00000
yhlong000003mo ago
We have a good amount of A40 GPUs available. In the ticket let us know the time you usually need them, the duration you plan to run them, and whether this is just for your current project or a longer-term, ongoing need. This information will help us better plan our capacity.😄
sluzorz
sluzorz3mo ago
Ah, thanks! We just spun up 100, seems like may have consumed all of them, lol.
Soulmind
Soulmind3mo ago
I got paged by the alert policy we setup internally for A40 availability as our product currently relies on that. Of course it is your right to spin up as many as you want but can you kindly let me know if this is going to be a one-off thingie or something you will be running in long-term? We've been happily enjoying the high availability of A40 but there are now only ~27 GPUs left lol
sluzorz
sluzorz3mo ago
Our job should finish in an hour. Sorry about that! Support told me it was okay haha I’m happy to ping y’all ahead of time
Soulmind
Soulmind3mo ago
lol yeah for sure it won't be an issue of course, not your fault no need to say sorry! it's just our thing that we only have added A40 to the autoscaling pool for now, cuz seemed like there were plenty of A40s couple days/weeks back. I think we anyways need to add more GPU types to the pool to adapt to any case. and 🤞 for your batch job 😉
sluzorz
sluzorz3mo ago
There’s a bug so! Back to figuring it out and spinning up tomorrow
Soulmind
Soulmind3mo ago
hope it's easy to debug! seems like now there are ~121 GPUs available. btw which backend are you using for your batch job? I heard SGLang is pretty good for batch jobs. @yhlong00000 any plans on adding more A40s to the pool?
yhlong00000
yhlong000003mo ago
I’m sorry, I don’t have specific details about the future plans, but I know that we’re continuously working with suppliers to add more based on demand. The more you use a particular one, the more likely we are to expand it.😀
sluzorz
sluzorz3mo ago
We're just splitting our database into chunks, downloads, processes, then uploads when complete
yhlong00000
yhlong000003mo ago
BTW, are you in the same region? It might be worth checking availability in other regions as well.😆
yhlong00000
yhlong000003mo ago
No description
sluzorz
sluzorz3mo ago
I just spun down our job so released 100 GPUs back @yhlong00000 is there a way to see quantity of GPUs? rather than just high/low
yhlong00000
yhlong000003mo ago
For customers, it’s not available. Let me check if there’s a specific reason why we don’t display it. Will get back to you later.
Soulmind
Soulmind3mo ago
We're pooling from CA-MTL-1 and EU-SE-1, as they are the only datacenters with network volume support with A40s. There is way to do that if you use GraphQL. The doc states that there is totalCount and rentedCount. If you run the query:
query gpuAvailability($gpuTypesInput: GpuTypeFilter, $lowestPriceInput: GpuLowestPriceInput) {
gpuTypes(input: $gpuTypesInput) {
lowestPrice(input: $lowestPriceInput) {
uninterruptablePrice
rentalPercentage
rentedCount
totalCount
}
}
}
query gpuAvailability($gpuTypesInput: GpuTypeFilter, $lowestPriceInput: GpuLowestPriceInput) {
gpuTypes(input: $gpuTypesInput) {
lowestPrice(input: $lowestPriceInput) {
uninterruptablePrice
rentalPercentage
rentedCount
totalCount
}
}
}
with variable:
variables: {
gpuTypesInput: {
id: 'NVIDIA A40',
},
lowestPriceInput: {
gpuCount: 1,
secureCloud: true,
dataCenterId: 'CA-MTL-1',
},
}
variables: {
gpuTypesInput: {
id: 'NVIDIA A40',
},
lowestPriceInput: {
gpuCount: 1,
secureCloud: true,
dataCenterId: 'CA-MTL-1',
},
}
you will be able to see the rented count and total count:
{
"data": {
"gpuTypes": [
{
"lowestPrice": {
"uninterruptablePrice": 0.35,
"rentalPercentage": 0.8745,
"rentedCount": 885,
"totalCount": 1012
}
}
]
}
}
{
"data": {
"gpuTypes": [
{
"lowestPrice": {
"uninterruptablePrice": 0.35,
"rentalPercentage": 0.8745,
"rentedCount": 885,
"totalCount": 1012
}
}
]
}
}
but seems like the rented count and total count is not strictly from that specific datacenter, but aggregated tho..
sluzorz
sluzorz3mo ago
Yeah, cool. I can work off that
yhlong00000
yhlong000003mo ago
😂 ok, you guys are smart than me
nerdylive
nerdylive3mo ago
haha soulmind has an idea of howmuch gpu's in a dc damn
Soulmind
Soulmind3mo ago
👍 the only thing is, it seems like the GraphQL API is responding with the combined # of GPUs, not the # of GPUs in the specific dc...
nerdylive
nerdylive3mo ago
ah that shouldn't be right if you're filtering with a dc id if it does like that please report it as a bug in #🧐|feedback
Soulmind
Soulmind3mo ago
yeah I will do, cuz I've been monitoring the values for a while, and seems like the totalCount, and rentedCount for same GPU but different dc shows the same value:
Datacenter: CA-MTL-1
GPU Types: NVIDIA A40
{
"data": {
"gpuTypes": [
{
"lowestPrice": {
"uninterruptablePrice": 0.7,
"rentalPercentage": 0.8423,
"rentedCount": 844,
"totalCount": 1002,
"stockStatus": "High"
},
"oneMonthPrice": 0.35,
"threeMonthPrice": 0.35,
"sixMonthPrice": null
}
]
}
}
Datacenter: EU-SE-1
GPU Types: NVIDIA A40
{
"data": {
"gpuTypes": [
{
"lowestPrice": {
"uninterruptablePrice": 0.7,
"rentalPercentage": 0.8423,
"rentedCount": 844,
"totalCount": 1002,
"stockStatus": "Medium"
},
"oneMonthPrice": 0.35,
"threeMonthPrice": 0.35,
"sixMonthPrice": null
}
]
}
}
Datacenter: CA-MTL-1
GPU Types: NVIDIA A40
{
"data": {
"gpuTypes": [
{
"lowestPrice": {
"uninterruptablePrice": 0.7,
"rentalPercentage": 0.8423,
"rentedCount": 844,
"totalCount": 1002,
"stockStatus": "High"
},
"oneMonthPrice": 0.35,
"threeMonthPrice": 0.35,
"sixMonthPrice": null
}
]
}
}
Datacenter: EU-SE-1
GPU Types: NVIDIA A40
{
"data": {
"gpuTypes": [
{
"lowestPrice": {
"uninterruptablePrice": 0.7,
"rentalPercentage": 0.8423,
"rentedCount": 844,
"totalCount": 1002,
"stockStatus": "Medium"
},
"oneMonthPrice": 0.35,
"threeMonthPrice": 0.35,
"sixMonthPrice": null
}
]
}
}
sluzorz
sluzorz3mo ago
Starting that batch job again We might take all the A40 capacity or the remainder of it
utmostmick0
utmostmick03mo ago
how long u guys gonna be running for ?
sluzorz
sluzorz3mo ago
3h ~3h
utmostmick0
utmostmick03mo ago
okies is this gonna b an ongoing thing ?
sluzorz
sluzorz3mo ago
yes
utmostmick0
utmostmick03mo ago
ok
sluzorz
sluzorz3mo ago
but mostly 1-2 times per week
utmostmick0
utmostmick03mo ago
all good dude , i specificly set my workflow up in eu because when i get to use it no one is using them lol
sluzorz
sluzorz3mo ago
are you using spot? I feel like it's releasing spot instances right now lol
Flynn
Flynn3mo ago
@sluzorz I see you're using up all the A40s! Do you know if there's a way to transfer all my data from one pod to another? I'm happy using another gpu, but I have a lot of stuff downloaded to my current pod which is on A40
sluzorz
sluzorz3mo ago
Cloud sync and rclone I think their cloud sync is just rclone
nerdylive
nerdylive3mo ago
wew bro what are you doing with the a40's training something? hey also sluorz, if this goes on commonly, try to tell runpod about this so they can accomodate other user's needs on a40
sluzorz
sluzorz3mo ago
Large batch inference jobs with Bart
nerdylive
nerdylive3mo ago
Oooh
sluzorz
sluzorz3mo ago
Yeah, we’ve let runpod know.
Flynn
Flynn3mo ago
@sluzorz will you be finished about now?
sluzorz
sluzorz3mo ago
Some of our batches are finishing now But we still have about 30 remaining since we couldn't spin up 100 A40s
nerdylive
nerdylive3mo ago
Oooh
Flynn
Flynn3mo ago
okay please let me know when you've finished
sluzorz
sluzorz3mo ago
We're mostly done But we will probably consume more in a few hours for embedding The A40s in runpod as just too good of an offering
Want results from more Discord servers?
Add your server