Learning Rate Schedulers¶
Learning Rate Schedulers update the learning rate over the course of training.
Learning rates can be updated after each update via step_update()
or at
epoch boundaries via step()
.
isort:skip_file
-
class
fairseq.optim.lr_scheduler.cosine_lr_scheduler.
CosineSchedule
(args, optimizer)[source]¶ Assign LR based on a cyclical schedule that follows the cosine function.
See https://arxiv.org/pdf/1608.03983.pdf for details.
We also support a warmup phase where we linearly increase the learning rate from some initial learning rate (
--warmup-init-lr
) until the configured max learning rate (--max-lr
).During warmup:
lrs = torch.linspace(args.warmup_init_lr, args.lr, args.warmup_updates) lr = lrs[update_num]
After warmup:
lr = lr_min + 0.5*(lr_max - lr_min)*(1 + cos(t_curr / t_i))
where
t_curr
is current percentage of updates within the current period range andt_i
is the current period range, which is scaled byt_mul
after every iteration.
-
class
fairseq.optim.lr_scheduler.fixed_schedule.
FixedSchedule
(args, optimizer)[source]¶ Decay the LR on a fixed schedule.
-
class
fairseq.optim.lr_scheduler.inverse_square_root_schedule.
InverseSquareRootSchedule
(args, optimizer)[source]¶ Decay the LR based on the inverse square root of the update number.
We also support a warmup phase where we linearly increase the learning rate from some initial learning rate (
--warmup-init-lr
) until the configured learning rate (--lr
). Thereafter we decay proportional to the number of updates, with a decay factor set to align with the configured learning rate.During warmup:
lrs = torch.linspace(args.warmup_init_lr, args.lr, args.warmup_updates) lr = lrs[update_num]
After warmup:
decay_factor = args.lr * sqrt(args.warmup_updates) lr = decay_factor / sqrt(update_num)
-
class
fairseq.optim.lr_scheduler.reduce_lr_on_plateau.
ReduceLROnPlateau
(args, optimizer)[source]¶ Decay the LR by a factor every time the validation loss plateaus. Also comes with optional warmup phase, where we linearly increase the learning rate from some initial learning rate (
--warmup-init-lr
) until the configured learning rate (--lr
). Thereafter the lr is adjusted according to original reduce_on_plateau scheme.During warmup:
lrs = torch.linspace( args.warmup_init_lr, args.lr, args.warmup_updates ) lr = lrs[update_num]
-
class
fairseq.optim.lr_scheduler.triangular_lr_scheduler.
TriangularSchedule
(args, optimizer)[source]¶ Assign LR based on a triangular cyclical schedule.
See https://arxiv.org/pdf/1506.01186.pdf for details.