Criterions¶

Criterions compute the loss function given the model and batch, roughly:

loss = criterion(model, batch)

isort:skip_file

class fairseq.criterions.FairseqCriterion(task)[source]¶

classmethod add_args(parser)[source]¶: Add criterion-specific arguments to the parser.

static aggregate_logging_outputs(logging_outputs: List[Dict[str, Any]]) → Dict[str, Any][source]¶: Aggregate logging outputs from data parallel training.

classmethod build_criterion(cfg: fairseq.dataclass.configs.FairseqDataclass, task)[source]¶: Construct a criterion from command-line args.

forward(model, sample, reduce=True)[source]¶

Compute the loss for the given sample.

Returns a tuple with three elements: 1) the loss 2) the sample size, which is used as the denominator for the gradient 3) logging outputs to display while training

static logging_outputs_can_be_summed() → bool[source]¶: Whether the logging outputs returned by forward can be summed across workers prior to calling reduce_metrics. Setting this to True will improves distributed training speed.

classmethod reduce_metrics(logging_outputs: List[Dict[str, Any]]) → None[source]¶: Aggregate logging outputs from data parallel training.

class fairseq.criterions.adaptive_loss.AdaptiveLoss(task, sentence_avg)[source]¶

This is an implementation of the loss function accompanying the adaptive softmax approximation for graphical processing units (GPU), described in the paper “Efficient softmax approximation for GPUs” (http://arxiv.org/abs/1609.04309).

classmethod build_criterion(cfg: fairseq.criterions.adaptive_loss.AdaptiveLossConfig, task)[source]¶: Construct a criterion from command-line args.

forward(model, sample, reduce=True)[source]¶

Compute the loss for the given sample.

Returns a tuple with three elements: 1) the loss 2) the sample size, which is used as the denominator for the gradient 3) logging outputs to display while training

static logging_outputs_can_be_summed() → bool[source]¶: Whether the logging outputs returned by forward can be summed across workers prior to calling reduce_metrics. Setting this to True will improves distributed training speed.

static reduce_metrics(logging_outputs) → None[source]¶: Aggregate logging outputs from data parallel training.

class fairseq.criterions.composite_loss.CompositeLoss(args, task)[source]¶

This is a composite loss that, given a list of model outputs and a list of targets, computes an average of losses for each output-target pair

static add_args(parser)[source]¶: Add criterion-specific arguments to the parser.

classmethod build_criterion(args, task)[source]¶: Construct a criterion from command-line args.

static build_underlying_criterion(args, task)[source]¶

class fairseq.criterions.cross_entropy.CrossEntropyCriterion(task, sentence_avg)[source]¶

compute_loss(model, net_output, sample, reduce=True)[source]¶

forward(model, sample, reduce=True)[source]¶

Compute the loss for the given sample.

Returns a tuple with three elements: 1) the loss 2) the sample size, which is used as the denominator for the gradient 3) logging outputs to display while training

static logging_outputs_can_be_summed() → bool[source]¶: Whether the logging outputs returned by forward can be summed across workers prior to calling reduce_metrics. Setting this to True will improves distributed training speed.

static reduce_metrics(logging_outputs) → None[source]¶: Aggregate logging outputs from data parallel training.

class fairseq.criterions.label_smoothed_cross_entropy.LabelSmoothedCrossEntropyCriterion(task, sentence_avg, label_smoothing, ignore_prefix_size=0, report_accuracy=False)[source]¶

compute_accuracy(model, net_output, sample)[source]¶

compute_loss(model, net_output, sample, reduce=True)[source]¶

forward(model, sample, reduce=True)[source]¶

Compute the loss for the given sample.

Returns a tuple with three elements: 1) the loss 2) the sample size, which is used as the denominator for the gradient 3) logging outputs to display while training

get_lprobs_and_target(model, net_output, sample)[source]¶

static logging_outputs_can_be_summed() → bool[source]¶: Whether the logging outputs returned by forward can be summed across workers prior to calling reduce_metrics. Setting this to True will improves distributed training speed.

classmethod reduce_metrics(logging_outputs) → None[source]¶: Aggregate logging outputs from data parallel training.