Learning Rate Schedulers¶

Learning Rate Schedulers update the learning rate over the course of training. Learning rates can be updated after each update via step_update() or at epoch boundaries via step().

isort:skip_file

class fairseq.optim.lr_scheduler.FairseqLRScheduler(cfg, optimizer)[source]¶

classmethod add_args(parser)[source]¶: Add arguments to the parser for this LR scheduler.

load_state_dict(state_dict)[source]¶: Load an LR scheduler state dict.

state_dict()[source]¶: Return the LR scheduler state dict.

step(epoch, val_loss=None)[source]¶: Update the learning rate at the end of the given epoch.

step_begin_epoch(epoch)[source]¶: Update the learning rate at the beginning of the given epoch.

step_update(num_updates)[source]¶: Update the learning rate after each update.

class fairseq.optim.lr_scheduler.inverse_square_root_schedule.InverseSquareRootSchedule(cfg: fairseq.optim.lr_scheduler.inverse_square_root_schedule.InverseSquareRootLRScheduleConfig, optimizer)[source]¶

Decay the LR based on the inverse square root of the update number.

We also support a warmup phase where we linearly increase the learning rate from some initial learning rate (--warmup-init-lr) until the configured learning rate (--lr). Thereafter we decay proportional to the number of updates, with a decay factor set to align with the configured learning rate.

During warmup:

lrs = torch.linspace(cfg.warmup_init_lr, cfg.lr, cfg.warmup_updates)
lr = lrs[update_num]

After warmup:

decay_factor = cfg.lr * sqrt(cfg.warmup_updates)
lr = decay_factor / sqrt(update_num)

step(epoch, val_loss=None)[source]¶: Update the learning rate at the end of the given epoch.

step_update(num_updates)[source]¶: Update the learning rate after each update.