Interesting! I will experiment with
Interesting! I will experiment with AdamW today.
2 Replies
This didn't go as expected. I experimented with Adafactor and AdamW and Adafactor produced much better results at 150 epochs. So, AdamW didn't converge quicker to go results as I hoped. I could be the case that I didn't dialed correctly some hyperparameters.
yes it is hyper parameters sensitive