Criterions¶
Criterions compute the loss function given the model and batch, roughly:
loss = criterion(model, batch)

class
fairseq.criterions.
FairseqCriterion
(task)[source]¶ 

static
aggregate_logging_outputs
(logging_outputs: List[Dict[str, Any]]) → Dict[str, Any][source]¶ Aggregate logging outputs from data parallel training.

forward
(model, sample, reduce=True)[source]¶ Compute the loss for the given sample.
Returns a tuple with three elements: 1) the loss 2) the sample size, which is used as the denominator for the gradient 3) logging outputs to display while training

static

class
fairseq.criterions.adaptive_loss.
AdaptiveLoss
(task, sentence_avg)[source]¶ This is an implementation of the loss function accompanying the adaptive softmax approximation for graphical processing units (GPU), described in the paper “Efficient softmax approximation for GPUs” (http://arxiv.org/abs/1609.04309).

forward
(model, sample, reduce=True)[source]¶ Compute the loss for the given sample.
Returns a tuple with three elements: 1) the loss 2) the sample size, which is used as the denominator for the gradient 3) logging outputs to display while training


class
fairseq.criterions.composite_loss.
CompositeLoss
(task, underlying_criterion)[source]¶ This is a composite loss that, given a list of model outputs and a list of targets, computes an average of losses for each outputtarget pair

class
fairseq.criterions.cross_entropy.
CrossEntropyCriterion
(task, sentence_avg)[source]¶ 

forward
(model, sample, reduce=True)[source]¶ Compute the loss for the given sample.
Returns a tuple with three elements: 1) the loss 2) the sample size, which is used as the denominator for the gradient 3) logging outputs to display while training


class
fairseq.criterions.label_smoothed_cross_entropy.
LabelSmoothedCrossEntropyCriterion
(task, sentence_avg, label_smoothing)[source]¶ 

forward
(model, sample, reduce=True)[source]¶ Compute the loss for the given sample.
Returns a tuple with three elements: 1) the loss 2) the sample size, which is used as the denominator for the gradient 3) logging outputs to display while training
