Criterions

Criterions compute the loss function given the model and batch, roughly:

loss = criterion(model, batch)

isort:skip_file

class fairseq.criterions.FairseqCriterion(task)[source]
classmethod add_args(parser)[source]

Add criterion-specific arguments to the parser.

static aggregate_logging_outputs(logging_outputs: List[Dict[str, Any]]) → Dict[str, Any][source]

Aggregate logging outputs from data parallel training.

classmethod build_criterion(args, task)[source]

Construct a criterion from command-line args.

forward(model, sample, reduce=True)[source]

Compute the loss for the given sample.

Returns a tuple with three elements: 1) the loss 2) the sample size, which is used as the denominator for the gradient 3) logging outputs to display while training

static logging_outputs_can_be_summed() → bool[source]

Whether the logging outputs returned by forward can be summed across workers prior to calling reduce_metrics. Setting this to True will improves distributed training speed.

classmethod reduce_metrics(logging_outputs: List[Dict[str, Any]]) → None[source]

Aggregate logging outputs from data parallel training.

class fairseq.criterions.adaptive_loss.AdaptiveLoss(task, sentence_avg)[source]

This is an implementation of the loss function accompanying the adaptive softmax approximation for graphical processing units (GPU), described in the paper “Efficient softmax approximation for GPUs” (http://arxiv.org/abs/1609.04309).

classmethod build_criterion(args, task)[source]

Construct a criterion from command-line args.

forward(model, sample, reduce=True)[source]

Compute the loss for the given sample.

Returns a tuple with three elements: 1) the loss 2) the sample size, which is used as the denominator for the gradient 3) logging outputs to display while training

static logging_outputs_can_be_summed() → bool[source]

Whether the logging outputs returned by forward can be summed across workers prior to calling reduce_metrics. Setting this to True will improves distributed training speed.

static reduce_metrics(logging_outputs) → None[source]

Aggregate logging outputs from data parallel training.

class fairseq.criterions.composite_loss.CompositeLoss(args, task)[source]

This is a composite loss that, given a list of model outputs and a list of targets, computes an average of losses for each output-target pair

static add_args(parser)[source]

Add criterion-specific arguments to the parser.

classmethod build_criterion(args, task)[source]

Construct a criterion from command-line args.

static build_underlying_criterion(args, task)[source]
class fairseq.criterions.cross_entropy.CrossEntropyCriterion(task, sentence_avg)[source]
compute_loss(model, net_output, sample, reduce=True)[source]
forward(model, sample, reduce=True)[source]

Compute the loss for the given sample.

Returns a tuple with three elements: 1) the loss 2) the sample size, which is used as the denominator for the gradient 3) logging outputs to display while training

static logging_outputs_can_be_summed() → bool[source]

Whether the logging outputs returned by forward can be summed across workers prior to calling reduce_metrics. Setting this to True will improves distributed training speed.

static reduce_metrics(logging_outputs) → None[source]

Aggregate logging outputs from data parallel training.

class fairseq.criterions.label_smoothed_cross_entropy.LabelSmoothedCrossEntropyCriterion(task, sentence_avg, label_smoothing, ignore_prefix_size=0, report_accuracy=False)[source]
static add_args(parser)[source]

Add criterion-specific arguments to the parser.

compute_accuracy(model, net_output, sample)[source]
compute_loss(model, net_output, sample, reduce=True)[source]
forward(model, sample, reduce=True)[source]

Compute the loss for the given sample.

Returns a tuple with three elements: 1) the loss 2) the sample size, which is used as the denominator for the gradient 3) logging outputs to display while training

get_lprobs_and_target(model, net_output, sample)[source]
static logging_outputs_can_be_summed() → bool[source]

Whether the logging outputs returned by forward can be summed across workers prior to calling reduce_metrics. Setting this to True will improves distributed training speed.

classmethod reduce_metrics(logging_outputs) → None[source]

Aggregate logging outputs from data parallel training.