R
RunPod4w ago
Acrack

Real Performance Comparison: H100 vs RX 6000 Ada

Hi, I’m experiencing some confusion or perhaps misunderstanding regarding the performance of the H100 and RX 6000 Ada GPUs during model training. Lately, I’ve been working with both GPUs to train a model using 9 GB of training data and 8 GB of testing data. The model has 2.6M parameters. On the RX 6000 Ada, I’m observing an average speed of around 200 ms/step in my current tests: Epoch 2/15 601/1427 ━━━━━━━━━━━━━━━━━━━━ 2:41 195ms/step - binary_accuracy: 0.8878 - loss: 0.2556 yesterday using the h100 i was having more than 300ms/step... somethings 400ms/step and rarely 200ms/step. With the same script, same date, same everything. Are the H1000 and RX6000 ada the same thing ? Regards.
6 Replies
nerdylive
nerdylive4w ago
no its not the same, i guess h100 is faster than rtx6000 in most points
Acrack
AcrackOP4w ago
ok running again on RTX 6000 : 498/1427 ━━━━━━━━━━━━━━━━━━━━ 2:58 192ms/step - binary_accuracy: 0.8825 - loss: 0.2695 I ll make another try on H100 today and I will post the result here
nerdylive
nerdylive4w ago
im not sure too what affects that ms/step
Acrack
AcrackOP4w ago
I guess the dataset and the number of parameters your model handles but in both runs everything is same
nerdylive
nerdylive4w ago
ooo yea those strongly affects, the batch size for training and the model params
Acrack
AcrackOP3w ago
yep i don't understand those numbers yet... That's on a H100, maybe my script isn't optimized 2005/2005 ━━━━━━━━━━━━━━━━━━━━ 688s 341ms/step - binary_accuracy: 0.8848 - loss: 0.2727 - val_binary_accuracy: 0.8486 - val_loss: 0.4044 - learning_rate: 0.0010 Epoch 3/15 1391/2005 ━━━━━━━━━━━━━━━━━━━━ 2:57 290ms/step - binary_accuracy: 0.9031 - loss: 0.2306
Want results from more Discord servers?
Add your server