Learning Rate Schedulers¶
Learning Rate Schedulers update the learning rate over the course of training.
Learning rates can be updated after each update via step_update()
or at
epoch boundaries via step()
.
isort:skip_file
-
class
fairseq.optim.lr_scheduler.inverse_square_root_schedule.
InverseSquareRootSchedule
(cfg: fairseq.optim.lr_scheduler.inverse_square_root_schedule.InverseSquareRootLRScheduleConfig, optimizer)[source]¶ Decay the LR based on the inverse square root of the update number.
We also support a warmup phase where we linearly increase the learning rate from some initial learning rate (
--warmup-init-lr
) until the configured learning rate (--lr
). Thereafter we decay proportional to the number of updates, with a decay factor set to align with the configured learning rate.During warmup:
lrs = torch.linspace(cfg.warmup_init_lr, cfg.lr, cfg.warmup_updates) lr = lrs[update_num]
After warmup:
decay_factor = cfg.lr * sqrt(cfg.warmup_updates) lr = decay_factor / sqrt(update_num)