sign_language_translator.text.metrics module

text evaluation metrics

class sign_language_translator.text.metrics.Perplexity(all_tokens: Set[Any], regularizing_constant=1.0)[source]

Bases: object

A class for calculating the perplexity of sequences based on token frequencies in a corpus.

Perplexity of a sequence measures how well a language model assigns probabilities to each token in the sequence. A lower perplexity indicates that the language model assigns higher probabilities to the tokens in the sequence, and therefore, the sequence is more likely to be generated by the language model. Conversely, a higher perplexity indicates that the language model assigns lower probabilities to the tokens in the sequence, and therefore, the sequence is less likely to be generated by the language model.

Parameters:
  • all_tokens (set) – A set of all tokens in the corpus.

  • regularization_constant (float, optional) – initial non-zero frequency of tokens.

evaluate(sequence: Iterable[Any]) float[source]

Calculate the perplexity of a given sequence.

Parameters:

sequence (iterable) – The sequence of tokens for which perplexity needs to be calculated.

Returns:

The perplexity value for the given sequence.

Return type:

float

update_frequencies(corpus: Iterable[Iterable[Any]])[source]

Update the token frequencies based on the given corpus.

Parameters:

corpus (Iterable[Iterable[Any]]) – An iterable containing sequences of tokens.