sign_language_translator.utils.arrays module

class sign_language_translator.utils.arrays.ArrayOps[source]

Bases: object

static abs(x: ndarray[Any, dtype[_ScalarType_co]]) → ndarray[Any, dtype[_ScalarType_co]][source]

static abs(x: Tensor) → Tensor

Compute the absolute value of a given array or tensor.

Parameters:: x (Union[NDArray, Tensor]) – The input array or tensor.
Returns:: The absolute value of the input array or tensor.
Return type:: Union[NDArray, Tensor]
Raises:: TypeError – If the input type is not supported.

Typecast some multidimensional data to numpy array or torch Tensor.

Parameters:

x (Union[NDArray, Tensor, Sequence[Union[float, int]]]) – The input array or tensor.
data_type (Type[Union[np.ndarray, Tensor]]) – The data type to cast the input array or tensor to.
_dtype (Optional[Union[Type[torch.dtype], Type[np.dtype], Type]], optional) – The new data type of the values inside the array. None means original dtype is kept. Defaults to None.

Raises:

ValueError – If the data_type is not np.ndarray or Tensor.

Returns:

The casted array or tensor.

Return type:

Union[NDArray, Tensor]

static ceil(array: ndarray[Any, dtype[_ScalarType_co]] | Tensor | Sequence[float | int] | float | int) → ndarray[Any, dtype[_ScalarType_co]] | Tensor[source]

static concatenate(arrays: Sequence[ndarray[Any, dtype[_ScalarType_co]]], dim: int = 0) → ndarray[Any, dtype[_ScalarType_co]][source]

static concatenate(arrays: Sequence[Tensor], dim: int = 0) → Tensor

Concatenate a sequence of arrays or tensors along a specified dimension.

Parameters:

arrays (Union[Sequence[NDArray], Sequence[Tensor]]) – The sequence of arrays or tensors to concatenate.
dim (int, optional) – The dimension along which to concatenate the arrays or tensors. Default is 0.

Returns:

The concatenated array or tensor.

Return type:

Union[NDArray, Tensor]

Raises:

TypeError – If the input type is not supported.

static copy(x: ndarray[Any, dtype[_ScalarType_co]]) → ndarray[Any, dtype[_ScalarType_co]][source]

static copy(x: Tensor) → Tensor

Create a copy of a given array or tensor.

Parameters:: x (Union[NDArray, Tensor]) – The input array or tensor.
Returns:: A deep copy of the input array or tensor.
Return type:: Union[NDArray, Tensor]
Raises:: TypeError – If the input type is not supported.

static floor(array: ndarray[Any, dtype[_ScalarType_co]] | Tensor | Sequence[float | int] | float | int) → ndarray[Any, dtype[_ScalarType_co]] | Tensor[source]

static linspace(start: float | int, end: float | int, n_steps: int, data_type: ~typing.Type[~numpy.ndarray | ~torch.Tensor] = <class 'numpy.ndarray'>, endpoint=True) → ndarray[Any, dtype[_ScalarType_co]] | Tensor[source]

Generate an array or tensor with equally spaced values between start and end.

Parameters:

start (Union[float, int]) – The starting value of the sequence. The value is inclusive.
end (Union[float, int]) – The end value of the sequence. The value is inclusive if endpoint is True.
n_steps (int) – The number of samples to generate. Must be non-negative.
data_type (Type[Union[np.ndarray, Tensor]], optional) – The data type of the output array. Defaults to np.ndarray.
endpoint (bool, optional) – Whether to include the end value in the sequence. Defaults to True.

Raises:

ValueError – If data_type is not np.ndarray or Tensor.

Returns:

The generated array or tensor.

Return type:

Union[NDArray, Tensor]

Compute the norm of a given array or tensor along a specified dimension.

Parameters:

x (Union[NDArray, Tensor, Sequence[Union[float, int]]]) – The input array or tensor.
dim (Optional[int]) – The dimension along which to compute the norm. If None, the norm is computed over the entire array or tensor. Default is None.
keepdim (bool) – Whether to keep the dimension of the input array or tensor after computing the norm. Default is False.

Returns:

The norm of the input array or tensor.

Return type:

Union[NDArray, Tensor]

Raises:

TypeError – If the input type is not supported.

Generate an array or tensor of the specified shape filled with random values from a normal (Gaussian) distribution. Optionally truncate the distribution to the range [start, end].

Parameters:

size (Sequence[int]) – The shape of the output array or tensor.
loc (Union[float, int], optional) – The mean (“centre”) of the distribution. Defaults to 0.
scale (Union[float, int], optional) – The standard deviation (spread or “width”) of the distribution. Must be non-negative. Defaults to 1.
start (Union[float, int], optional) – The lower bound of the distribution. Defaults to float(“-inf”).
end (Union[float, int], optional) – The upper bound of the distribution. Defaults to float(“inf”).
data_type (Type[Union[np.ndarray, Tensor]], optional) – The data type of the output array. Defaults to np.ndarray.

Raises:

ValueError – If data_type is not np.ndarray or torch.Tensor.

Returns:

The random values filled array or tensor.

Return type:

Union[NDArray, Tensor]

Note

Uses torch’s random number generator to generate random values even for NumPy arrays.

static random_uniform(size: ~typing.Sequence[int], start: float | int = 0, end: float | int = 1, data_type: ~typing.Type[~numpy.ndarray | ~torch.Tensor] = <class 'numpy.ndarray'>) → ndarray[Any, dtype[_ScalarType_co]] | Tensor[source]

Generate a random array of the specified size with values uniformly distributed between [start, end).

Parameters:

size (Sequence[int]) – The shape of the output array or tensor.
start (Union[float, int], optional) – The lower bound of the uniform distribution. The value is inclusive. Defaults to 0.
end (Union[float, int], optional) – The upper bound of the uniform distribution. The value is exclusive. Defaults to 1.
data_type (Type[Union[np.ndarray, Tensor]], optional) – The data type of the output array. Defaults to np.ndarray.

Raises:

ValueError – If data_type is not np.ndarray or Tensor.

Returns:

The array or tensor filled with random values.

Return type:

Union[NDArray, Tensor]

Note

Uses torch’s random number generator to generate random values even for NumPy arrays.

static steps(n_steps: int, anchors: Tensor = torch.Tensor([0, -1, 2]), random_uniform_frac: float = 0.2, random_normal_frac: float = 0.3, n_clusters: int = 1, cluster_std: float | None = None, anchor_spacing_blend: float = 0.5) → Tensor[source]

static steps(n_steps: int, anchors: ndarray[Any, dtype[_ScalarType_co]] | Sequence[float | int] = (0, 1), random_uniform_frac: float = 0.2, random_normal_frac: float = 0.3, n_clusters: int = 1, cluster_std: float | None = None, anchor_spacing_blend: float = 0.5) → ndarray[Any, dtype[_ScalarType_co]]

Generates a sequence of steps based on a combination of linear interpolation, random uniform distribution, and random normal distribution.

Parameters:

n_steps (int) – The total number of steps to generate.
anchors (Union[NDArray, Tensor, Sequence[Union[float, int]]], optional) – The points between & through which the steps are interpolated. Defaults to (0, 1).
random_uniform_frac (float, optional) – The fraction of steps generated using a random uniform distribution. Must be between 0 and 1. Defaults to 0.2.
random_normal_frac (float, optional) – The fraction of steps generated using a random normal distribution. Must be between 0 and 1. Defaults to 0.3.
n_clusters (int, optional) – The number of concentrated spots to add using the random normal (gaussian distribution) steps (around cluster centroids selected from a uniform distribution). Defaults to 1.
cluster_std (Optional[float], optional) – The standard deviation (spread) of the normal distribution generating the concentrated spots. If None, it is calculated based on the anchor gap and number of clusters (std(gaps)/10/n_clusters). Defaults to None.
anchor_spacing_blend (float, optional) – A blend factor between equal anchor spacing (1) and spacing based on the distances between anchor points (0). Defaults to 0.5.

Raises:

ValueError – If the sum of random_uniform_frac and random_normal_frac exceeds 1, or if either is negative.

Returns:

The generated sequence of steps.

Return type:

Union[NDArray, Tensor]

Examples:

import torch
from sign_language_translator.utils import ArrayOps

# you should plot the following arrays on a graph for better understanding
anchors = [0, 1, -2, 0, 5, 2]

# Basic linear interpolation with no randomness and equal anchor spacing
steps = ArrayOps.steps(9, anchors, 0, 0, 0, 0, anchor_spacing_blend=0)
# array([ 0.  ,  0.25, -1.5 , -0.75,  1.  ,  2.75,  4.5 ,  3.75,  2.  ])

# Linear interpolation with no randomness and anchor spacing based on distances
steps = ArrayOps.steps(9, anchors, 0, 0, 0, 0, anchor_spacing_blend=1)
# array([ 0.   ,  0.625,  0.25 , -1.625, -1.   ,  0.625,  3.75 ,  3.875,  2.   ])

# A blend of equal and distance-based anchor spacing with no randomness
steps = ArrayOps.steps(9, torch.Tensor(anchors), 0, 0, 0, 0, anchor_spacing_blend=0.5)
# Tensor([ 0.   ,  0.921, -0.655, -1.625, -0.167,  1.987,  4.231,  3.81 ,  2.   ])

# Adding uniform randomness to the steps
steps = ArrayOps.steps(9, anchors, 0.5, 0, 0, 0, anchor_spacing_blend=1)
# array([ 0.   ,  0.25 , -1.   , -0.895,  0.214,  1.346,  3.75 ,  3.777,  2.   ])

# Adding 2 concentration spots using gaussian randomness
steps = ArrayOps.steps(9, anchors, 0, 0.5, 2, 0.1, anchor_spacing_blend=0)
# array([ 0.   ,  0.99 ,  0.924, -1.5  ,  1.   ,  4.5  ,  4.872,  4.025,  2.   ])

# Combining uniform and normal randomness
steps = ArrayOps.steps(9, anchors, 0.2, 0.3, 2, 0.1, anchor_spacing_blend=0.5)
# array([ 0.   ,  0.069, -1.333,  0.468,  1.538,  3.835,  4.897,  4.267,  2.   ])

Compute the singular value decomposition of a given array or tensor.

Parameters:: x (Union[NDArray, Tensor, Sequence[Sequence[Union[float, int]]]]) – The input array or tensor.
Returns:: The (Rotation, coordinate scaling, reflection) matrices of the input array or tensor.
Return type:: Tuple[Union[NDArray, Tensor], Union[NDArray, Tensor], Union[NDArray, Tensor]]
Raises:: TypeError – If the input type is not supported.

static take(array: ndarray[Any, dtype[_ScalarType_co]] | Tensor | List, index: ndarray[Any, dtype[_ScalarType_co]] | Tensor | List | int, dim: int = 0) → ndarray[Any, dtype[_ScalarType_co]] | Tensor[source]

static top_k(x: ndarray[Any, dtype[_ScalarType_co]] | Tensor | Sequence[float | int], k: int, dim: int = -1, largest=True) → Tuple[ndarray[Any, dtype[_ScalarType_co]], ndarray[Any, dtype[_ScalarType_co]]] | Tuple[Tensor, Tensor][source]

Compute the top k values and their indices along a specified dimension of a given array or tensor.

Parameters:

x (Union[NDArray, Tensor, Sequence[Union[float, int]]]) – The input array or tensor.
k (int) – The number of top values to return.
dim (int, optional) – The dimension along which to compute the top k values. Default is -1.
largest (bool, optional) – Whether to return the largest or smallest k values. Default is True.

Returns:

The top k values and their indices along the specified dimension.

Return type:

Tuple[Union[NDArray, Tensor], Union[NDArray, Tensor]]

Raises:

TypeError – If the input type is not supported.

sign_language_translator.utils.arrays.adjust_vector_angle(vector_1: ndarray[Any, dtype[_ScalarType_co]] | Sequence[float], vector_2: ndarray[Any, dtype[_ScalarType_co]] | Sequence[float], scaling_factor: float, post_normalize: bool = False) → Tuple[ndarray[Any, dtype[_ScalarType_co]], ndarray[Any, dtype[_ScalarType_co]]][source]

sign_language_translator.utils.arrays.adjust_vector_angle(vector_1: Tensor, vector_2: Tensor, scaling_factor: float, post_normalize: bool = False) → Tuple[Tensor, Tensor]

Move a pair of vectors away or towards each other in the same plane.

Converge or Diverge a pair of vectors by increasing or decreasing their distance from each other. The norm or the length of the vectors is preserved.

Parameters:

vector_1 (NDArray | Tensor) – A 1D array of size n representing a word in an n dimensional vector space.
vector_2 (NDArray | Tensor) – A 1D array of size n representing another word in an n dimensional vector space.
scaling_factor (float) – The scaling factor by which the vector difference should be enhanced or diminished. The fraction of distance between the vectors where new vector should land. (sf > 1 diverges the two vectors. sf = 1 leaves the two vectors unchanged. 0.5 < sf < 1 converges the two vectors. sf = 0.5 makes the two vectors equal to their mean. sf = 0 swaps the two vectors. sf < 0.5 move the vectors away from their mean but in opposite direction.)
post_normalize (bool, optional) – Make the magnitude of both output vectors equal to 1 after they have been rotated. Defaults to False.

Returns:

moved vectors.

Return type:

Tuple[NDArray | Tensor, NDArray | Tensor]

Notes:

# sf > 1 diverges the two vectors
# new_v1 = v2 + 2.00 * (v1 - v2) = 2 * v1 - v2     # more v1, less v2.
# new_v2 = v1 - 2.00 * (v1 - v2) = 2 * v2 - v1     # more v2, less v1.

# sf = 1 leaves the two vectors unchanged
# new_v1 = v2 + 1.00 * (v1 - v2) = v1
# new_v2 = v1 - 1.00 * (v1 - v2) = v2

# 0.5 < sf < 1 converges the two vectors
# new_v1 = v2 + 0.75 * (v1 - v2) = 0.75 * v1 + 0.25 * v2    # weighted average
# new_v1 = v1 - 0.75 * (v1 - v2) = 0.75 * v2 + 0.25 * v1    # weighted average

# sf = 0.5 makes the two vectors equal
# new_v1 = v2 + 0.50 * (v1 - v2) = 0.5 * v1 + 0.5 * v2   # mean
# new_v1 = v1 - 0.50 * (v1 - v2) = 0.5 * v2 + 0.5 * v1   # mean

# sf = 0. swaps the two vectors
# new_v1 = v2 + 0.00 * (v1 - v2) = v2
# new_v2 = v1 + 0.00 * (v1 - v2) = v1

# sf < 0.5 move the vectors away from their mean but in opposite direction
# new_v1 = v2 + (-1) * (v1 - v2) = 2 * v2 - v1    # more v2, less v1.
# new_v2 = v1 - (-1) * (v1 - v2) = 2 * v1 - v2    # more v1, less v2.

sign_language_translator.utils.arrays.align_vectors(source_matrix: ndarray[Any, dtype[_ScalarType_co]], target_matrix: ndarray[Any, dtype[_ScalarType_co]], pre_normalize: bool = True) → ndarray[Any, dtype[_ScalarType_co]][source]

sign_language_translator.utils.arrays.align_vectors(source_matrix: Tensor, target_matrix: Tensor, pre_normalize: bool = True) → Tensor

Align the source matrix to the target matrix using the orthogonal transformation.

Parameters:

source_matrix (NDArray | Tensor) – A 2D array of shape (dictionary_length, embedding_dimension) containing word vectors from source model (or language).
target_matrix (NDArray | Tensor) – A 2D array of shape (dictionary_length, embedding_dimension) containing word vectors from target model (or language).
normalize_vectors (bool, optional) – Whether to normalize the training vectors before SVD. Defaults to True.

Returns:

An orthogonal transformation which aligns the source language to the target language.

Return type:

NDArray | Tensor

Note

This function supports both NumPy arrays and PyTorch tensors as input. (Based on: https://github.com/babylonhealth/fastText_multilingual)

Perform linear interpolation on a multidimensional array or tensor along a dimension.

This function essentially connects all consecutive values in a multidimensional array with straight lines along a specified dimension, so that intermediate values can be calculated. It takes the input array, a set of new indexes or alternatively new & old coordinate values, and a dimension along which to perform interpolation.

Parameters:

array (NDArray[np.number] | Tensor | List) – The input array or tensor to interpolate.
new_x (Sequence[int | float] | NDArray[np.number] | Tensor) – The new index values or coordinate values at which to calculate the intermediate values from array. Must be 1D. Order of values does not matter. if old_x is not provided, these values are relative to the index of the data in array i.e. [0, 1, 2, …] and negative indexes are allowed. If old_x is provided, all new_x values must be within it’s bounds.
old_x (Sequence[int | float] | NDArray[np.number] | Tensor | None, optional) – The old coordinate values corresponding to the data in array along the dim. Must be 1D and strictly sorted ascending. Can contain negative numbers. If None, method assumes it to be a linear sequence starting at 0 and growing with step +1 i.e. [0, 1, 2, …] like the index of array.
dim (int, optional) – The dimension along which to perform interpolation. Default is 0.

Returns:

The result of linear interpolation along the specified dimension.

Return type:

NDArray | Tensor

Raises:

ValueError – If new_x or old_x is not 1 dimensional.

Examples

data = np.array([1, 2, 3, 5])
new_indexes = np.array([1.5, 0.5, 2.5])
interpolated_data = linear_interpolation(data, new_indexes)
print(interpolated_data)
# array([2.5, 1.5, 4. ])

old_x = [0, 4, 4.5, 5]
new_x = [0, 1, 2, 2.5, 3, 4, 5]
interpolated_data = linear_interpolation(data, new_x, old_x=old_x)
print(interpolated_data)
# array([1.   , 1.25 , 1.5  , 1.625, 1.75 , 2.   , 5.   ])

positional_embedding_table = torch.randn(100, 768)  # (max_seq_len, embedding_dim)
intermediate_positions = torch.linspace(0, 99, 500)
new_embedding_table = linear_interpolation(positional_embedding_table, intermediate_positions, dim=0)
# new_embedding_table.shape -> (500, 768) # (new_max_seq_len, embedding_dim)

Note

This function supports both NumPy arrays and PyTorch tensors as input and preserves gradient.