Learning Rate Schedulers

Learning Rate Schedulers update the learning rate over the course of training. Learning rates can be updated after each update via step_update() or at epoch boundaries via step().


class fairseq.optim.lr_scheduler.FairseqLRScheduler(cfg, optimizer)[source]
classmethod add_args(parser)[source]

Add arguments to the parser for this LR scheduler.


Load an LR scheduler state dict.


Return the LR scheduler state dict.

step(epoch, val_loss=None)[source]

Update the learning rate at the end of the given epoch.


Update the learning rate at the beginning of the given epoch.


Update the learning rate after each update.

class fairseq.optim.lr_scheduler.inverse_square_root_schedule.InverseSquareRootSchedule(cfg: fairseq.optim.lr_scheduler.inverse_square_root_schedule.InverseSquareRootLRScheduleConfig, optimizer)[source]

Decay the LR based on the inverse square root of the update number.

We also support a warmup phase where we linearly increase the learning rate from some initial learning rate (--warmup-init-lr) until the configured learning rate (--lr). Thereafter we decay proportional to the number of updates, with a decay factor set to align with the configured learning rate.

During warmup:

lrs = torch.linspace(cfg.warmup_init_lr, cfg.lr, cfg.warmup_updates)
lr = lrs[update_num]

After warmup:

decay_factor = cfg.lr * sqrt(cfg.warmup_updates)
lr = decay_factor / sqrt(update_num)
step(epoch, val_loss=None)[source]

Update the learning rate at the end of the given epoch.


Update the learning rate after each update.