Learning Rate Schedulers

Learning Rate Schedulers update the learning rate over the course of training. Learning rates can be updated after each update via step_update() or at epoch boundaries via step().

class fairseq.optim.lr_scheduler.FairseqLRScheduler(args, optimizer)[source]
static add_args(parser)[source]

Add arguments to the parser for this LR scheduler.

load_state_dict(state_dict)[source]

Load an LR scheduler state dict.

state_dict()[source]

Return the LR scheduler state dict.

step(epoch, val_loss=None)[source]

Update the learning rate at the end of the given epoch.

step_update(num_updates)[source]

Update the learning rate after each update.

class fairseq.optim.lr_scheduler.cosine_lr_scheduler.CosineSchedule(args, optimizer)[source]

Assign LR based on a cyclical schedule that follows the cosine function.

See https://arxiv.org/pdf/1608.03983.pdf for details.

We also support a warmup phase where we linearly increase the learning rate from some initial learning rate (--warmup-init-lr) until the configured max learning rate (--max-lr).

During warmup:

lrs = torch.linspace(args.warmup_init_lr, args.lr, args.warmup_updates)
lr = lrs[update_num]

After warmup:

lr = lr_min + 0.5*(lr_max - lr_min)*(1 + cos(t_curr / t_i))

where t_curr is current percentage of updates within the current period range and t_i is the current period range, which is scaled by t_mul after every iteration.

static add_args(parser)[source]

Add arguments to the parser for this LR scheduler.

step(epoch, val_loss=None)[source]

Update the learning rate at the end of the given epoch.

step_update(num_updates)[source]

Update the learning rate after each update.

class fairseq.optim.lr_scheduler.fixed_schedule.FixedSchedule(args, optimizer)[source]

Decay the LR on a fixed schedule.

static add_args(parser)[source]

Add arguments to the parser for this LR scheduler.

get_next_lr(epoch)[source]
step(epoch, val_loss=None)[source]

Update the learning rate at the end of the given epoch.

step_update(num_updates)[source]

Update the learning rate after each update.

class fairseq.optim.lr_scheduler.inverse_square_root_schedule.InverseSquareRootSchedule(args, optimizer)[source]

Decay the LR based on the inverse square root of the update number.

We also support a warmup phase where we linearly increase the learning rate from some initial learning rate (--warmup-init-lr) until the configured learning rate (--lr). Thereafter we decay proportional to the number of updates, with a decay factor set to align with the configured learning rate.

During warmup:

lrs = torch.linspace(args.warmup_init_lr, args.lr, args.warmup_updates)
lr = lrs[update_num]

After warmup:

decay_factor = args.lr * sqrt(args.warmup_updates)
lr = decay_factor / sqrt(update_num)
static add_args(parser)[source]

Add arguments to the parser for this LR scheduler.

step(epoch, val_loss=None)[source]

Update the learning rate at the end of the given epoch.

step_update(num_updates)[source]

Update the learning rate after each update.

class fairseq.optim.lr_scheduler.reduce_lr_on_plateau.ReduceLROnPlateau(args, optimizer)[source]

Decay the LR by a factor every time the validation loss plateaus.

static add_args(parser)[source]

Add arguments to the parser for this LR scheduler.

load_state_dict(state_dict)[source]

Load an LR scheduler state dict.

state_dict()[source]

Return the LR scheduler state dict.

step(epoch, val_loss=None)[source]

Update the learning rate at the end of the given epoch.

class fairseq.optim.lr_scheduler.triangular_lr_scheduler.TriangularSchedule(args, optimizer)[source]

Assign LR based on a triangular cyclical schedule.

See https://arxiv.org/pdf/1506.01186.pdf for details.

static add_args(parser)[source]

Add arguments to the parser for this LR scheduler.

step(epoch, val_loss=None)[source]

Update the learning rate at the end of the given epoch.

step_update(num_updates)[source]

Update the learning rate after each update.