sign_language_translator.models package

Subpackages

sign_language_translator.models.language_models package
sign_language_translator.models.sign_to_text package
- Module contents
sign_language_translator.models.text_embedding package
- Submodules
  - sign_language_translator.models.text_embedding.text_embedding_model module
    - TextEmbeddingModel
  - sign_language_translator.models.text_embedding.vector_lookup_model module
    - VectorLookupModel
- Module contents
  - TextEmbeddingModel
    - TextEmbeddingModel.embed()
  - VectorLookupModel
sign_language_translator.models.text_to_sign package
- Submodules
  - sign_language_translator.models.text_to_sign.concatenative_synthesis module
    - ConcatenativeSynthesis
  - sign_language_translator.models.text_to_sign.t2s_model module
    - TextToSignModel
- Module contents
  - ConcatenativeSynthesis
  - TextToSignModel
sign_language_translator.models.video_embedding package
- Submodules
  - sign_language_translator.models.video_embedding.mediapipe_landmarks_model module
    - MediaPipeLandmarksModel
  - sign_language_translator.models.video_embedding.video_embedding_model module
    - VideoEmbeddingModel
- Module contents

Submodules

sign_language_translator.models.utils module

Module contents

sign_language_translator.models

This module contains the various models in the sign language translator system and their associated components.

Language Models:

NgramLanguageModel: A language model based on n-grams.
TransformerLanguageModel: A transformer-based language model.
MixerLM: A language model that combines multiple language models using mixing weights.
LanguageModel: An abstract base class for all language models in this package.
BeamSampling: A utility class that performs beam search during text generation.

Text to Sign Translation:

TextToSignModel: An abstract base class for all model that translates text into sign language gestures in this package.
ConcatenativeSynthesis: A rule-based model for synthesizing sign language gestures from text.

Video Embedding:

VideoEmbeddingModel: An abstract model that embeds video frames into a vector space.
MediaPipeLandmarksModel: A video embedding model that utilizes MediaPipe for pose and hand landmark extraction.

Text Embedding:

TextEmbeddingModel: An abstract model that embeds text into a vector space.
VectorLookupModel: A text embedding model that looks up vectors from a pre-trained embedding matrix.

Utilities:

get_model: A utility function to get any model by string name.
utils: Miscellaneous utility functions for the sign language translator system.

class sign_language_translator.models.BeamSampling(model: ~sign_language_translator.models.language_models.abstract_language_model.LanguageModel, beam_width: int = 3, start_of_sequence_token='[', end_of_sequence_token=']', max_length: int = 37, scoring_function: ~typing.Callable[[~typing.Iterable, float], float] = <function BeamSampling.<lambda>>, return_log_of_probability: bool = True)[source]

Bases: object

BeamSampling class for generating completions using beam search sampling.

Parameters:

model (LanguageModel) – The language model used for generating completions.
beam_width (int, optional) – The beam width for beam search. Defaults to 3.
start_of_sequence_token (str, optional) – The start of sequence token. Defaults to “[“.
end_of_sequence_token (str, optional) – The end of sequence token. Defaults to “]”.
max_length (int, optional) – The maximum length of the generated completions. Defaults to 33.
scoring_function (Callable[[Iterable, float], float], optional) – The scoring function used to score the completions. It should accept the generated sequence and its overall log probability as arguments. Defaults to a linear function.
return_log_of_probability (bool, optional) – A flag indicating whether to return the probability of the completions or log2 of it. Defaults to True.

complete(initial_context: ~typing.Iterable | None = None, append_func: ~typing.Callable[[~typing.Any, ~typing.Any], ~typing.Any] = <function BeamSampling.<lambda>>) → Tuple[Iterable, float][source]

Generate completions based on the given initial context.

Parameters:

initial_context (Iterable | None, optional) – The initial context for completion generation. Defaults to None.
append_func (Callable[[Any, Any], Any], optional) – a function that can append the generated next token to provided context. Defaults to a lambda function that can append to list, tuple & str.

Returns:

One generated completion and its score.

Return type:

Tuple[Iterable, float]

Bases: TextToSignModel

A class representing a Rule-Based model for translating text to sign language by concatenating sign language videos.

property sign_embedding_model: str | None: The name of the model which was used for extracting features from the signs. This name is used in the filenames of the preprocessed signs dataset.

property sign_format: Type[Sign]

The format of the sign language (e.g. slt.Vision.sign.sign.Sign or subclass).

Class that wraps the sign language features e.g. raw videos or landmarks. This class can load the signs from available datasets and concatenate its objects. e.g. slt.Video or slt.Landmarks class.

property sign_language: SignLanguage: An object of slt.languages.sign.SignLanguage class or its child that defines the mapping rules & grammar of a sign language.

property text_language: TextLanguage: An object of slt.languages.text.TextLanguage class or its child that defines preprocessing, tokenization & other NLP functions.

translate(text: str, *args, **kwargs) → Sign[source]

Translate text to sign language.

Parameters:: text – The input text to be translated.
Returns:: The translated sign language sentence.

class sign_language_translator.models.LanguageModel(unknown_token='<unk>', name=None)[source]

Bases: ABC

Abstract Base Class for Language Models

LanguageModel is an abstract base class that defines the common interface and methods for language models. It provides functionality for sampling the next token based on the given context.

Attributes: - unknown_token (str): The token representation used for unknown or out-of-vocabulary tokens. - name (str): The name of the language model (optional).

Methods: - next(self, context: Iterable) -> Tuple[Any, float]: Abstract method that should be implemented

by subclasses to generate the next token and provide its probability.

next_all(self, context: Iterable) -> Tuple[Iterable[Any], Iterable[float]]: Abstract method
that should be implemented by subclasses to return all next tokens and their probabilities.

abstract next(context: Iterable) → Tuple[Any, float][source]

Generates the next token based on the given context and also returns its probability.

Parameters:

context (Iterable) – A piece of sequence used as the context for generating the next token.

Returns:

The next token and its associated probability.: Token has the same type as the items in the context iterable.

Return type:

Tuple[Any, float]

abstract next_all(context: Iterable) → Tuple[Iterable[Any], Iterable[float]][source]

Computes probabilities for all next tokens based on the given context and returns them both.

Parameters:

context (Iterable) – A piece of sequence used as the context for generating the next tokens.

Returns:

All next tokens and their probabilities.: The tokens have the same type as the items in the context iterable.

Return type:

Tuple[Iterable[Any], Iterable[float]]

class sign_language_translator.models.MediaPipeLandmarksModel(pose_model_name='pose_landmarker_heavy.task', hand_model_name='hand_landmarker.task', number_of_persons: int = 1)[source]

Bases: VideoEmbeddingModel

A video embedding model using MediaPipe to extract pose and hand landmarks from video frames.

Parameters:

pose_model_name (str) – The name of the pose estimation model.
hand_model_name (str) – The name of the hand estimation model.
number_of_persons (int) – The maximum number of persons to detect in each frame.

n_persons

The maximum number of persons to detect in each frame.

Type:: int

embed()[source]: Embeds a sequence of frames using pose and hand landmarks.

embed(frame_sequence: Iterable[Tensor | ndarray[Any, dtype[uint8]]], landmark_type: str = 'world', progress_callback: ProgressStatusCallback | None = None, total_frames: int | None = None, **kwargs) → Tensor[source]

Embed a sequence of frames (video) into a sequence of pose & hand landmarks.

Parameters:

frame_sequence (Iterable[torch.Tensor | NDArray[np.uint8]]) – A sequence of video frames as 3D arrays (W, H, c).
landmark_type (str) – The type of landmarks to include in the embedding (“world”, “image”, “all”).

Returns:

A tensor containing the frame embeddings.

Return type:

torch.Tensor

class sign_language_translator.models.MixerLM(models: List[LanguageModel], selection_probabilities: List[float] | None = None, unknown_token='<unk>', name=None, model_selection_strategy='choose')[source]

Bases: LanguageModel

The MixerLM class is a language model that combines multiple language models using a mixing strategy. It extends the abstract base class LanguageModel.

Attributes: - models (List[LanguageModel]): List of language models to be combined. - selection_probabilities (List[float] | None): The selection probabilities for each language model.

If not provided, equal probabilities are assigned.

unknown_token (str): The token representation used for unknown or out-of-vocabulary tokens.
model_selection_strategy (str): The strategy for selecting the next token from the language models.

Possible values: “choose” (selects one model and infers through it).
“merge” (infers through all models & combines their output probabilities).
name (str): The name of the mixer language model object (optional).

Methods: - next(self, context: Iterable) -> Tuple[Any, float]: Generates the next token based on the given context. - next_all(self, context: Iterable) -> Tuple[List[Any], List[float]]: Generates all next

tokens and their associated probabilities based on the given context.

save(self, model_path: str) -> None: saves the mixer model as a pickle file.
load(model_path: str) -> MixerLM: loads the mixer model from a pickle file.
__str__(self) -> str: Returns a string representation of the MixerLM instance.

static load(model_path: str) → MixerLM[source]

Loads a MixerLM model from the given model path.

Parameters:: model_path (str) – The path to the model file.
Returns:: The loaded MixerLM model.
Return type:: MixerLM

next(context: Iterable) → Tuple[Any, float][source]

Generates the next token based on the given context and also returns its probability.

Parameters:

context (Iterable) – A piece of sequence used as the context for generating the next token.

Returns:

The next token and its associated probability.: Token has the same type as the items in the context iterable.

Return type:

Tuple[Any, float]

next_all(context: Iterable) → Tuple[List[Any], List[float]][source]

Computes probabilities for all next tokens based on the given context and returns them both.

If model selection strategy is “choose” then selects one model and infers through it. If model_selection_strategy is “merge” then for each language model, it generates the all next tokens and probabilities. It combines the tokens and probabilities from all models to create a list of unique next tokens and their corresponding weighted probabilities.

Parameters:: context (Iterable) – A piece of sequence like the training examples.
Returns:: A tuple containing a list of unique next tokens and their corresponding probabilities.
Return type:: Tuple[List[Any], List[float]]

save(model_path: str, overwrite=False) → None[source]

Save the model to a file.

Parameters:

model_path (str) – The path to save the model.
overwrite (bool, optional) – Whether to overwrite an existing file. Defaults to False.

Raises:

FileExistsError – If a file already exists at model_path and overwrite is False.

class sign_language_translator.models.NgramLanguageModel(window_size=1, unknown_token='<unk>', sampling_temperature=1.0, name=None)[source]

Bases: LanguageModel

NgramLanguageModel is a statistical language model based on n-grams. It provides functionality for training the model on a given training corpus, generating the next token based on a context, and saving/loading the model.

Attributes: - window_size (int): The size of the context window for predicting the next token. - unknown_token (str): The token representation used for unknown or out-of-vocabulary tokens. - sampling_temperature (float): A temperature parameter controlling the sampling probabilities during token generation. - name (str): The name of the language model object (optional).

Methods: - train(self, training_corpus): Alias for the fit() method. Trains the language model on the given training corpus. - fit(self, training_corpus): Trains the language model on the given training corpus. - finetune(self, training_corpus, weightage: float): Fine-tunes the language model on an additional training corpus with a specified weightage. - next(self, context: Iterable) -> Tuple[Any, float]: Samples the next token from the learned distribution based on the given context. - next_all(self, context: Iterable) -> Tuple[List[Any], List[float]]: Returns a list of possible next tokens and their associated probabilities based on the given context. - load(model_path: str) -> NgramLanguageModel: Deserializes the model from a JSON file. - save(self, model_path: str, indent=None, ensure_ascii=False): Serializes the model to a JSON file. - __str__(self) -> str: Returns a string representation of the NgramLanguageModel instance.

Private Methods: - _to_key_datatype(self, item: Iterable) -> Tuple: Converts an iterable item to the appropriate datatype for use as a key in the model dictionary. - _count_ngrams(self, training_corpus: List[Iterable], n: int) -> Dict[Tuple, int]: Counts the occurrences of n-grams in the training corpus. - _group_by_context(self, counts: Dict[Tuple, int]): Groups the n-grams by context and calculates the weights for each next token. - _count_parameters(self): Counts the total number of weights/probabilities in the model.

finetune(training_corpus, weightage: float) → None[source]

Fine-tunes the language model on an additional training corpus with a specified weightage.

Parameters:

training_corpus (Iterable[Iterable]) – The additional training corpus, an iterable of sequences representing the text data.
weightage (float) – The weightage for the additional training corpus, a value between 0.0 and 1.0 (inclusive). A weightage of 0.0 means no impact from the additional corpus, while a weightage of 1.0 means the model is completely updated based on the additional corpus.

Returns:

None

Raises:

AssertionError – If the weightage is outside the valid range [0.0, 1.0].

fit(training_corpus) → None[source]

Trains the language model on the given training corpus.

Parameters:: training_corpus (Iterable[Iterable]) – The training corpus, an iterable of sequences representing the text data.
Returns:: None

static load(model_path: str) → NgramLanguageModel[source]

Deserializes the model (from JSON).

Parameters:: model_path (str) – The source file path.
Returns:: The deserialized NgramLanguageModel instance.
Return type:: NgramLanguageModel

next(context: Iterable) → Tuple[Any, float][source]

Generates the next token based on the given context and also returns its probability.

Parameters:

context (Iterable) – A piece of sequence used as the context for generating the next token.

Returns:

The next token and its associated probability.: Token has the same type as the items in the context iterable.

Return type:

Tuple[Any, float]

next_all(context: Iterable) → Tuple[List[Any], List[float]][source]

Computes probabilities for all next tokens based on the given context and returns them both.

Parameters:

context (Iterable) – A piece of sequence used as the context for generating the next tokens.

Returns:

All next tokens and their probabilities.: The tokens have the same type as the items in the context iterable.

Return type:

Tuple[Iterable[Any], Iterable[float]]

save(model_path: str, indent=None, ensure_ascii=False, overwrite=False) → None[source]

Serializes the model (as JSON).

Parameters:

model_path (str) – The target file path. It will silently overwrite if a file already exists at this path.
indent (Optional[int]) – The indentation level for formatting the JSON data (optional).
ensure_ascii (bool) – Controls whether non-ASCII characters are escaped (optional).
overwrite (bool) – If False, raises FileExistsError if the model already exists. Defaults to False.

train(training_corpus)[source]

Alias for fit(). Trains the language model on the given training corpus.

Parameters:: training_corpus (Iterable[Iterable]) – The training corpus, an iterable of sequences representing the text data.
Returns:: None

class sign_language_translator.models.TextEmbeddingModel[source]

Bases: ABC

Abstract class for text embedding models.

embed(text: str) -> torch.Tensor: Embeds text into a vector.

abstract embed(text: str) → Tensor[source]

Embeds text into a vector.

Parameters:: text (str) – Text to embed.
Returns:: A vector representation of a text.
Return type:: torch.Tensor

class sign_language_translator.models.TextToSignModel[source]

Bases: ABC

abstract property sign_format: The format of the sign language (e.g. slt.Vision.sign.sign.Sign).

abstract property sign_language: The target sign language of the model.

abstract property text_language: The source text language of the model.

abstract translate(text: str | Iterable[str], *args, **kwargs) → Sign[source]: Translate the text to sign language.

class sign_language_translator.models.TransformerLanguageModel(token_to_id: Dict[str, int], vocab_size: int, unknown_token='<unk>', padding_token='<pad>', start_of_sequence_token='<sos>', window_size: int = 64, embed_size: int = 768, hidden_size: int = 3072, n_heads: int = 6, n_blocks: int = 6, dropout: float = 0.25, activation='gelu', device='cpu', sampling_temperature: float = 1.0, top_k: int | None = None, top_p: float | None = 0.9, name: str | None = None, pretrained_token_embeddings: Tensor | None = None, randomly_shift_position_embedding_during_training: bool = False)[source]

Bases: LanguageModel, Module

Transformer-based language model for text generation.

This class implements a Transformer-based language model for text generation tasks. It takes in a sequence of token IDs and generates the next token in the sequence. The model consists of two embedding layers, multiple decoder blocks, and a language modeling head.

- token_embedding

The embedding layer for token IDs.

Type:: torch.nn.Embedding

- position_embedding

The embedding layer for positional IDs.

Type:: torch.nn.Embedding

- decoder_blocks

The sequence of decoder blocks.

Type:: torch.nn.Sequential

- final_layer_norm

The layer normalization for the final output.

Type:: torch.nn.LayerNorm

- language_modeling_head

The linear layer for language modeling.

Type:: torch.nn.Linear

- n_parameters

The total number of parameters in the model.

Type:: int

- device

The device to run the model on.

Type:: str

- training_history

The training history of the model such as loss and other metrics.

Type:: Dict[str, Any]

- forward(token_ids: torch.Tensor) -> torch.Tensor: Performs a forward pass through the model.

- next(self, context: Iterable) -> Tuple[Any, float]: generates the next token and its probability.

- next_all(self, context: Iterable) -> Tuple[List[Any], List[float]]: returns all next tokens and their probabilities.

- load(model_path: str) -> TransformerLanguageModel: (static_method) Deserializes the model from a pt file.

- save(self, model_path: str, device: str | Torch.device): Serializes the model to a pt file.

- get_model_state() -> Dict[str, Any]: Returns the model state consisting of constructor arguments and pytorch state_dict.

- tokens_to_ids(tokens: Iterable[str]) -> List[int]: Converts tokens to IDs.

- ids_to_tokens(ids: Iterable[int] | torch.Tensor) -> List[str]: Converts IDs to tokens.

forward(token_ids: Tensor) → Tensor[source]

Forward pass of the model.

This method embeds the token_ids into vectors. It also embeds their positions into vectors. Depending upon the training & randomly_shift flags, it may shift sequences’ position by a random amount. The embeddings are added together and passed to transformer decoder block containing causal multi-head self attention. The output is passed through LayerNorm and finally to a language-modeling-head which converts the vectors into logits for each token.

Parameters:: token_ids (torch.Tensor) – Tensor containing the token IDs. Shape is ([batch,] time).
Returns:: Tensor containing the logits. Shape is ([batch,] time, vocab_size).
Return type:: torch.Tensor

get_model_state() → Dict[str, Any][source]

Returns the current state of the model as a dictionary.

Returns:

A dictionary mapping strings to the class arguments,: pytorch model’s state_dict and other attributes.

Return type:

Dict[str, Any]

ids_to_tokens(ids: Iterable[int] | Tensor)[source]

Convert a sequence of token IDs to tokens.

Parameters:: ids (Iterable[int] | torch.Tensor) – An iterable of token IDs.
Returns:: A list of tokens corresponding to the input IDs.
Return type:: List[str]

static load(model_path, device='cpu') → TransformerLanguageModel[source]

Loads a TransformerLanguageModel from a given model path.

Parameters:

model_path (str) – The path to the saved model file.
device (str, optional) – The device to load the model on. Defaults to “cuda” if a CUDA device is available, else “cpu”.

Returns:

The loaded TransformerLanguageModel object.

Return type:

TransformerLanguageModel

next(context: Iterable) → Tuple[Any, float][source]

Generates the next token based on the given context and also returns its probability.

Parameters:

context (Iterable) – A piece of sequence used as the context for generating the next token.

Returns:

The next token and its associated probability.: Token has the same type as the items in the context iterable.

Return type:

Tuple[Any, float]

next_all(context) → Tuple[List[Any], List[float]][source]

Computes probabilities for all next tokens based on the given context and returns them both.

Parameters:

context (Iterable) – A piece of sequence used as the context for generating the next tokens.

Returns:

All next tokens and their probabilities.: The tokens have the same type as the items in the context iterable.

Return type:

Tuple[Iterable[Any], Iterable[float]]

save(model_path: str, overwrite: bool = False) → None[source]

Save the model to a file.

Parameters:

model_path (str) – The path to save the model.
overwrite (bool, optional) – Whether to overwrite an existing file. Defaults to False.

Raises:

FileExistsError – If there is already a file at the specified path and overwrite is set to False.

to(device, *args, **kwargs)[source]

Move and/or cast the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)[source]

to(dtype, non_blocking=False)[source]

to(tensor, non_blocking=False)[source]

to(memory_format=torch.channels_last)[source]

Its signature is similar to torch.Tensor.to(), but only accepts floating point or complex dtypes. In addition, this method will only cast the floating point or complex parameters and buffers to dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters:

device (torch.device) – the desired device of the parameters and buffers in this module
dtype (torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this module
tensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
memory_format (torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns:

self

Return type:

Module

Examples:

>>> # xdoctest: +IGNORE_WANT("non-deterministic")
>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA1)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j,  0.2382+0.j],
        [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)

tokens_to_ids(tokens: Iterable[str]) → List[int][source]

Convert a list of tokens into a list of corresponding token IDs.

Parameters:: tokens (Iterable[str]) – A list of tokens.
Returns:: A list of token IDs. If a token is not found in the token_to_id dictionary, the unknown_token_id is used instead.
Return type:: List[int]

class sign_language_translator.models.VectorLookupModel(tokens: List[str], vectors: Tensor, alignment_matrix: Tensor | None = None, description: str = '')[source]

Bases: TextEmbeddingModel

VectorLookupModel class extends TextEmbeddingModel to provide text embedding based on pre-defined token vectors.

- index_to_token

A list containing tokens in the same order as the vectors.

Type:: List[str]

- known_tokens

A frozenset containing unique known tokens.

Type:: frozenset

- token_to_index

A dictionary mapping tokens to their corresponding indices.

Type:: Dict[str, int]

- vectors

A 2D tensor representing the token vectors.

Type:: torch.Tensor

- update(self, tokens: List[str], vectors: torch.Tensor) -> None: Updates existing tokens & hash-table with new vectors.

- embed(self, text

str, pre_normalize=False, post_normalize=False,: tokenizer: Callable[[str], Iterable[str]] = lambda x: x.split()) -> torch.Tensor:

Returns the pretrained embedding vector for a token or average embedding of sub tokens.

- __getitem__(self, token: str) -> torch.Tensor: Returns the vector for a specific token.

- save(self, path: str): Saves the model state (tokens & vectors) to a file.

- load(cls, path: str): Loads a saved model state (tokens & vectors) from a file.

Example:

..code-block:: python

from sign_language_translator.models import VectorLookupModel import torch

tokens = [“example”, “text”] vectors = torch.tensor([[1, 2, 3], [4, 5, 6]]) model = VectorLookupModel(tokens, vectors)

embedding = model.embed(“example text”) # [2.5, 3.5, 4.5]

model.update([“hello”], torch.tensor([[7, 8, 9]]))

model.save(“model.pt”) loaded_model = VectorLookupModel.load(“model.pt”)

embed(text: str, pre_normalize=False, post_normalize=False, align=False, tokenizer: ~typing.Callable[[str], ~typing.Iterable[str]] = <function VectorLookupModel.<lambda>>) → Tensor[source]

Embeds the given text into a vector representation by lookup or averaging pre-computed embeddings.

Parameters:

text (str) – The input text to be embedded, (can be in the model vocabulary or be a string of tokens from the model dictionary). If unknown, returns a zero vector.
pre_normalize (bool, optional) – Whether to normalize the vectors of tokens in the text before averaging. Defaults to False.
post_normalize (bool, optional) – Whether to normalize the vector after embedding. Defaults to False.
align (bool, optional) – Whether to transform the final vector using the alignment matrix. Defaults to False.
tokenizer (Callable[[str], Iterable[str]], optional) – A callable function to tokenize the text. Only used if the text is not present in the model vocabulary. Defaults to splitting on whitespace.

Returns:

The embedded vector representation of the input text.

Return type:

torch.Tensor

classmethod load(path: str)[source]

Load a VectorLookupModel from a saved checkpoint. If the path ends with ‘.zip’ the file will be decompressed.

Parameters:: path (str) – The path to the saved checkpoint.
Returns:: The loaded VectorLookupModel instance.
Return type:: VectorLookupModel

property normalized_vectors

save(path: str)[source]

Serialize the tokens list and corresponding vectors to a file. If the path ends with ‘.zip’ the file will be compressed.

Parameters:: path (str) – The path to save the model file.

similar(vector: Tensor, k: int = 1) → Tuple[List[str], List[float]][source]

Find the k most similar tokens to the given vector.

Parameters:

vector (torch.Tensor) – The 1D vector for which to find similar tokens.
k (int, optional) – The number of similar tokens to return. Defaults to 1.

Returns:

A tuple containing the k most similar tokens and their corresponding cosine similarities.

Return type:

Tuple[List[str], List[float]]

property tokens_array

update(tokens: List[str], vectors: Tensor) → None[source]

Update the vector lookup model with new tokens and their corresponding vectors.

Parameters:

tokens (List[str]) – The list of new tokens to be added or updated.
vectors (torch.Tensor) – The tensor of corresponding vectors for the new tokens.
alignment_matrix (Optional[torch.Tensor], optional) – A 2D Tensor to transform the final vectors. (e.g. some orthogonal matrix can be used to align the word vector to an embedding for some other language or model). Defaults to None.
description (str, optional) – A description of the model. Defaults to “”.

Raises:

ValueError – If the dimensions of the new vectors do not match the dimensions of the existing vectors.

class sign_language_translator.models.VideoEmbeddingModel[source]

Bases: ABC

Abstract base class for video embedding models.

This class defines the interface for video embedding models, which transform a sequence of video frames into an embedding tensor.

None

embed(frame_sequence, **kwargs)[source]: Abstract method to embed a sequence of video frames.

abstract embed(frame_sequence: Iterable[Tensor | ndarray[Any, dtype[uint8]]], **kwargs) → Tensor[source]

Embed a sequence of video frames into an embedding tensor.

Parameters:

frame_sequence (Iterable[Union[Tensor, NDArray[uint8]]]) – A sequence of video frames, where each frame can be either a Tensor or a numpy array of uint8 values of shape (W, H, C).
**kwargs – Additional keyword arguments specific to the embedding model.

Returns:

An embedding tensor representing the sequence of video frames.

Return type:

Tensor

sign_language_translator.models.get_model(model_code: str | Enum, *args, **kwargs)[source]

Get the model based on the provided model code and optional parameters. See sign_language_translator.config.enums.ModelCodes (or slt.ModelCodes) for a list of supported model codes.

Parameters:: model_code (str) – The code representing the desired model.
Returns:: The instantiated model object if successful, or None if no model found.
Return type:: Any
Raises:: ValueError – If inappropriate argument values are provided for text_language, sign_language, or video_feature_model.