Learning Rate Schedulers

Learning Rate Schedulers update the learning rate over the course of training. Learning rates can be updated after each update via step_update() or at epoch boundaries via step().

isort:skip_file

class fairseq.optim.lr_scheduler.FairseqLRScheduler(cfg, optimizer)[source]
classmethod add_args(parser)[source]

Add arguments to the parser for this LR scheduler.

load_state_dict(state_dict)[source]

Load an LR scheduler state dict.

state_dict()[source]

Return the LR scheduler state dict.

step(epoch, val_loss=None)[source]

Update the learning rate at the end of the given epoch.

step_begin_epoch(epoch)[source]

Update the learning rate at the beginning of the given epoch.

step_update(num_updates)[source]

Update the learning rate after each update.

class fairseq.optim.lr_scheduler.inverse_square_root_schedule.InverseSquareRootSchedule(cfg: fairseq.optim.lr_scheduler.inverse_square_root_schedule.InverseSquareRootLRScheduleConfig, optimizer)[source]

Decay the LR based on the inverse square root of the update number.

We also support a warmup phase where we linearly increase the learning rate from some initial learning rate (--warmup-init-lr) until the configured learning rate (--lr). Thereafter we decay proportional to the number of updates, with a decay factor set to align with the configured learning rate.

During warmup:

lrs = torch.linspace(cfg.warmup_init_lr, cfg.lr, cfg.warmup_updates)
lr = lrs[update_num]

After warmup:

decay_factor = cfg.lr * sqrt(cfg.warmup_updates)
lr = decay_factor / sqrt(update_num)
step(epoch, val_loss=None)[source]

Update the learning rate at the end of the given epoch.

step_update(num_updates)[source]

Update the learning rate after each update.