sign_language_translator.models.language_models.transformer_language_model.model module
This module defines a Transformer-based Language Model that can be used for text generation and language modeling.
- Class:
TransformerLanguageModel: A class that implements a Transformer-based Language Model.
- class sign_language_translator.models.language_models.transformer_language_model.model.TransformerLanguageModel(token_to_id: Dict[str, int], vocab_size: int, unknown_token='<unk>', padding_token='<pad>', start_of_sequence_token='<sos>', window_size: int = 64, embed_size: int = 768, hidden_size: int = 3072, n_heads: int = 6, n_blocks: int = 6, dropout: float = 0.25, activation='gelu', device='cpu', sampling_temperature: float = 1.0, top_k: int | None = None, top_p: float | None = 0.9, name: str | None = None, pretrained_token_embeddings: Tensor | None = None, randomly_shift_position_embedding_during_training: bool = False)[source]
Bases:
LanguageModel,ModuleTransformer-based language model for text generation.
This class implements a Transformer-based language model for text generation tasks. It takes in a sequence of token IDs and generates the next token in the sequence. The model consists of two embedding layers, multiple decoder blocks, and a language modeling head.
- - token_embedding
The embedding layer for token IDs.
- Type:
torch.nn.Embedding
- - position_embedding
The embedding layer for positional IDs.
- Type:
torch.nn.Embedding
- - decoder_blocks
The sequence of decoder blocks.
- Type:
torch.nn.Sequential
- - final_layer_norm
The layer normalization for the final output.
- Type:
torch.nn.LayerNorm
- - language_modeling_head
The linear layer for language modeling.
- Type:
torch.nn.Linear
- - n_parameters
The total number of parameters in the model.
- Type:
int
- - device
The device to run the model on.
- Type:
str
- - training_history
The training history of the model such as loss and other metrics.
- Type:
Dict[str, Any]
- - forward(token_ids
torch.Tensor) -> torch.Tensor: Performs a forward pass through the model.
- - next(self, context
Iterable) -> Tuple[Any, float]: generates the next token and its probability.
- - next_all(self, context
Iterable) -> Tuple[List[Any], List[float]]: returns all next tokens and their probabilities.
- - load(model_path
str) -> TransformerLanguageModel: (static_method) Deserializes the model from a pt file.
- - save(self, model_path
str, device: str | Torch.device): Serializes the model to a pt file.
- - get_model_state() -> Dict[str, Any]
Returns the model state consisting of constructor arguments and pytorch state_dict.
- - tokens_to_ids(tokens
Iterable[str]) -> List[int]: Converts tokens to IDs.
- - ids_to_tokens(ids
Iterable[int] | torch.Tensor) -> List[str]: Converts IDs to tokens.
- forward(token_ids: Tensor) Tensor[source]
Forward pass of the model.
This method embeds the token_ids into vectors. It also embeds their positions into vectors. Depending upon the training & randomly_shift flags, it may shift sequences’ position by a random amount. The embeddings are added together and passed to transformer decoder block containing causal multi-head self attention. The output is passed through LayerNorm and finally to a language-modeling-head which converts the vectors into logits for each token.
- Parameters:
token_ids (torch.Tensor) – Tensor containing the token IDs. Shape is ([batch,] time).
- Returns:
Tensor containing the logits. Shape is ([batch,] time, vocab_size).
- Return type:
torch.Tensor
- get_model_state() Dict[str, Any][source]
Returns the current state of the model as a dictionary.
- Returns:
- A dictionary mapping strings to the class arguments,
pytorch model’s state_dict and other attributes.
- Return type:
Dict[str, Any]
- ids_to_tokens(ids: Iterable[int] | Tensor)[source]
Convert a sequence of token IDs to tokens.
- Parameters:
ids (Iterable[int] | torch.Tensor) – An iterable of token IDs.
- Returns:
A list of tokens corresponding to the input IDs.
- Return type:
List[str]
- static load(model_path, device='cpu') TransformerLanguageModel[source]
Loads a TransformerLanguageModel from a given model path.
- Parameters:
model_path (str) – The path to the saved model file.
device (str, optional) – The device to load the model on. Defaults to “cuda” if a CUDA device is available, else “cpu”.
- Returns:
The loaded TransformerLanguageModel object.
- Return type:
- next(context: Iterable) Tuple[Any, float][source]
Generates the next token based on the given context and also returns its probability.
- Parameters:
context (Iterable) – A piece of sequence used as the context for generating the next token.
- Returns:
- The next token and its associated probability.
Token has the same type as the items in the context iterable.
- Return type:
Tuple[Any, float]
- next_all(context) Tuple[List[Any], List[float]][source]
Computes probabilities for all next tokens based on the given context and returns them both.
- Parameters:
context (Iterable) – A piece of sequence used as the context for generating the next tokens.
- Returns:
- All next tokens and their probabilities.
The tokens have the same type as the items in the context iterable.
- Return type:
Tuple[Iterable[Any], Iterable[float]]
- save(model_path: str, overwrite: bool = False) None[source]
Save the model to a file.
- Parameters:
model_path (str) – The path to save the model.
overwrite (bool, optional) – Whether to overwrite an existing file. Defaults to False.
- Raises:
FileExistsError – If there is already a file at the specified path and overwrite is set to False.
- to(device, *args, **kwargs)[source]
Move and/or cast the parameters and buffers.
This can be called as
- to(device=None, dtype=None, non_blocking=False)[source]
- to(dtype, non_blocking=False)[source]
- to(tensor, non_blocking=False)[source]
- to(memory_format=torch.channels_last)[source]
Its signature is similar to
torch.Tensor.to(), but only accepts floating point or complexdtypes. In addition, this method will only cast the floating point or complex parameters and buffers todtype(if given). The integral parameters and buffers will be moveddevice, if that is given, but with dtypes unchanged. Whennon_blockingis set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Parameters:
device (
torch.device) – the desired device of the parameters and buffers in this moduledtype (
torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this moduletensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
memory_format (
torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)
- Returns:
self
- Return type:
Module
Examples:
>>> # xdoctest: +IGNORE_WANT("non-deterministic") >>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA1) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
- tokens_to_ids(tokens: Iterable[str]) List[int][source]
Convert a list of tokens into a list of corresponding token IDs.
- Parameters:
tokens (Iterable[str]) – A list of tokens.
- Returns:
A list of token IDs. If a token is not found in the token_to_id dictionary, the unknown_token_id is used instead.
- Return type:
List[int]