Interesting! I will experiment with

Interesting! I will experiment with AdamW today.
2 Replies
Renzo
RenzoOP3mo ago
This didn't go as expected. I experimented with Adafactor and AdamW and Adafactor produced much better results at 150 epochs. So, AdamW didn't converge quicker to go results as I hoped. I could be the case that I didn't dialed correctly some hyperparameters.
Furkan Gözükara SECourses
yes it is hyper parameters sensitive

Did you find this page helpful?