Optimizers

Optimizers update the Model parameters based on the gradients.

fairseq.optim.register_optimizer(name)[source]

Decorator to register a new optimizer.

class fairseq.optim.FairseqOptimizer(args, params)[source]
static add_args(parser)[source]

Add optimizer-specific arguments to the parser.

backward(loss)[source]

Computes the sum of gradients of the given tensor w.r.t. graph leaves.

clip_grad_norm(max_norm)[source]

Clips gradient norm.

get_lr()[source]

Return the current learning rate.

load_state_dict(state_dict, optimizer_overrides=None)[source]

Load an optimizer state dict.

In general we should prefer the configuration of the existing optimizer instance (e.g., learning rate) over that found in the state_dict. This allows us to resume training from a checkpoint using a new set of optimizer args.

multiply_grads(c)[source]

Multiplies grads by a constant c.

optimizer

Return a torch.optim.optimizer.Optimizer instance.

optimizer_config

Return a kwarg dictionary that will be used to override optimizer args stored in checkpoints. This allows us to load a checkpoint and resume training using a different set of optimizer args, e.g., with a different learning rate.

set_lr(lr)[source]

Set the learning rate.

state_dict()[source]

Return the optimizer’s state dict.

step(closure=None)[source]

Performs a single optimization step.

zero_grad()[source]

Clears the gradients of all optimized parameters.

class fairseq.optim.adagrad.Adagrad(args, params)[source]
optimizer_config

Return a kwarg dictionary that will be used to override optimizer args stored in checkpoints. This allows us to load a checkpoint and resume training using a different set of optimizer args, e.g., with a different learning rate.

class fairseq.optim.adam.FairseqAdam(args, params)[source]
static add_args(parser)[source]

Add optimizer-specific arguments to the parser.

optimizer_config

Return a kwarg dictionary that will be used to override optimizer args stored in checkpoints. This allows us to load a checkpoint and resume training using a different set of optimizer args, e.g., with a different learning rate.

class fairseq.optim.fp16_optimizer.FP16Optimizer(args, params, fp32_optimizer, fp32_params)[source]

Wrap an optimizer to support FP16 (mixed precision) training.

backward(loss)[source]

Computes the sum of gradients of the given tensor w.r.t. graph leaves.

Compared to fairseq.optim.FairseqOptimizer.backward(), this function additionally dynamically scales the loss to avoid gradient underflow.

classmethod build_optimizer(args, params)[source]
Parameters:
  • args (argparse.Namespace) – fairseq args
  • params (iterable) – iterable of parameters to optimize
clip_grad_norm(max_norm)[source]

Clips gradient norm and updates dynamic loss scaler.

get_lr()[source]

Return the current learning rate.

load_state_dict(state_dict, optimizer_overrides=None)[source]

Load an optimizer state dict.

In general we should prefer the configuration of the existing optimizer instance (e.g., learning rate) over that found in the state_dict. This allows us to resume training from a checkpoint using a new set of optimizer args.

multiply_grads(c)[source]

Multiplies grads by a constant c.

optimizer

Return a torch.optim.optimizer.Optimizer instance.

optimizer_config

Return a kwarg dictionary that will be used to override optimizer args stored in checkpoints. This allows us to load a checkpoint and resume training using a different set of optimizer args, e.g., with a different learning rate.

set_lr(lr)[source]

Set the learning rate.

state_dict()[source]

Return the optimizer’s state dict.

step(closure=None)[source]

Performs a single optimization step.

zero_grad()[source]

Clears the gradients of all optimized parameters.

class fairseq.optim.nag.FairseqNAG(args, params)[source]
optimizer_config

Return a kwarg dictionary that will be used to override optimizer args stored in checkpoints. This allows us to load a checkpoint and resume training using a different set of optimizer args, e.g., with a different learning rate.

class fairseq.optim.sgd.SGD(args, params)[source]
optimizer_config

Return a kwarg dictionary that will be used to override optimizer args stored in checkpoints. This allows us to load a checkpoint and resume training using a different set of optimizer args, e.g., with a different learning rate.