sign_language_translator.vision.landmarks.landmarks module

This module defines the Landmarks wrapper class which inherits from the Sign class. It is used to represent and manipulate landmarks data extracted from a video featuring 1 person. It provides methods for saving & loading landmarks data from various sources (including CSV, NPY, PT, and PTH files, numpy arrays, PyTorch tensors, and nested lists). It enables data augmentation using the .transform() method and smart concatenation across time dimension. It can visualize the sequence of 3D landmarks using the .show() method.

Classes:: Landmarks: A class to represent and manipulate landmarks data.

Example

from sign_language_translator.vision.landmarks.landmarks import Landmarks

landmarks = Landmarks([[[0,1,2], [1,2,3]]])  # 1 frames, 2 landmarks, 3 coordinates
landmarks.show()
landmarks.save('landmarks_file.csv')
# landmarks = Landmarks.load('landmarks_file.csv')

landmarks = Landmarks.load_asset('landmarks/pk-hfad-1_car.landmarks-mediapipe-world.csv')
print(landmarks.shape)
# (60, 75, 5)

Bases: Sign

A class to represent and manipulate landmarks data. Inherits from the Sign class.

Parameters:: sign (NDArray | Tensor | str) – It can be provided as a path to a file (csv, npy, pt, pth), a NumPy array, a PyTorch tensor, or a sequence of arrays or tensors or numbers (3D: n_frames, n_landmarks, n_features).

load(path: str, **kwargs): Class method to load landmarks data from a file and return a new Landmarks object.

save(path: str, overwrite=False, precision=4, **kwargs): Saves the landmarks data to a file.

name()[source][source]: Static method which returns the string code of the sign format.

numpy(*args, **kwargs)[source][source]: Returns the landmarks data as a NumPy array.

torch(dtype=None, device=None)[source][source]: Returns the landmarks data as a PyTorch tensor.

tolist()[source][source]: Returns the landmarks data as a nested list.

concatenate(objects: Iterable[Landmarks]): Concatenates a sequence of Landmarks objects along the first dimension (time) and returns a new Landmarks object

transform(transformation: Callable): Applies a transformation function to the landmarks data.

show(**kwargs)[source][source]: Displays the landmarks data.

__getitem__(indices)[source][source]: Returns a new Landmarks object with the specified indices.

__iter__()[source][source]: Initializes the iteration over the frames of the landmarks data.

__next__()[source][source]: Returns the next frame of the landmarks data.

data[source]: The landmarks data as a NumPy array or PyTorch tensor depending upon what it was initialized with.

n_frames[source]: The number of frames or time-steps in the data.

n_landmarks[source]: The number of landmarks in each frame of the data.

n_features[source]: The number of features per landmark (same as n_coordinates).

shape[source]: The shape of the landmarks data array as a tuple of integers.

ndim[source]: The number of dimensions of the landmarks data array (should be 3).

property animation: FuncAnimation[source]: Visualization of the landmarks on a 3D graph.

Note

For interactive display in a Jupyter notebook, use %matplotlib widget magic command and then run a cell with landmarks_obj.animation on last line.

static concatenate(objects: Iterable[Landmarks]) → Landmarks[source][source]

Concatenates a sequence of Landmarks objects along the time dimension (dim=0) and returns a new Landmarks object.

Parameters:: objects (Iterable[Landmarks]) – An iterable of Landmarks objects to concatenate.
Returns:: A new Landmarks object containing the data concatenated in time dimension.
Return type:: Landmarks
Raises:: ValueError – If the connections of the Landmarks objects are not the same.

property connections: BaseConnections[source]

Object defining the order in which landmarks are connected during display and other properties depending on the model used to extract the landmarks.

Raises:: ValueError – If this property is accessed before landmarks connections have been defined.

copy() → Landmarks[source][source]

Creates a deep copy of the Landmarks object.

Returns:: A new Landmarks object with the same data and connections.
Return type:: Landmarks

property data: ndarray[Any, dtype[_ScalarType_co]] | Tensor[source]: The landmarks data which is a 3D array or tensor of shape (n_frames, n_landmarks, n_features).

classmethod load(path: str, **kwargs) → Landmarks[source][source]

Class method to load landmarks data from a file and return a new Landmarks object. The supported file extensions are .npy & .pt with must contain 3D arrays (n_frames, n_landmarks, n_features) and .csv which must have n_frames rows and n_landmarks * n_features columns.

The header row in .csv is optional if the filename contains the name of a supported embedding model (see load_asset function for example models). The columns in the .csv are expected to be in the format: [<axis-letter><landmark-number>,…] (e.g. x0, y0, z0, x1, y1, z1, …, xn, yn, zn). Possible axis-letters: x, y, z, a-w, aa-zz, … (only the first 3 are required to be in that order).

Parameters:: path (str) – The file path to load the data from.
Returns:: A new Landmarks object containing the loaded data.
Return type:: Landmarks

classmethod load_asset(label: str, archive_name: str | None = None, overwrite=False, progress_bar=True, leave=True, **kwargs) → Landmarks[source][source]

Class method to load a landmarks file from a one-time-auto-downloaded dataset archive and return a new Landmarks object.

Parameters:

label (str) – The filename of the landmarks asset to load. ‘landmarks/’ is prepended to the label if it does not start with it. An example is ‘landmarks/pk-hfad-1_airport.landmarks-mediapipe.csv’) for embedding of a dictionary video. General syntax is landmarks/country-organization-number_text[_person_camera].landmarks-model.extension.
archive_name (Optional[str], optional) – The name of the archive which contains the landmarks asset. If None, the archive name is inferred from the label. An example is datasets/pk-hfad-1.landmarks-mediapipe-csv.zip. General syntax is datasets/country-organization-number[_person_camera].landmarks-model-extension.zip. Defaults to None.
overwrite (bool, optional) – Whether to overwrite the landmarks asset if it is already extracted. Defaults to False.
progress_bar (bool, optional) – Whether to display a progress bar while downloading the archive or extracting the asset. Defaults to True.
leave (bool, optional) – Whether to leave the progress bar after the operation is complete. Defaults to True.
**kwargs – Additional keyword arguments to be passed to the Landmarks constructor.

Raises:

FileNotFoundError – If no landmarks assets are found for the given label.

Warns:

UserWarning – If multiple landmarks assets match the given label and the only first asset is used.

Returns:

An instance of the Landmarks class representing the dataset video embedding that matched the label.

Return type:

Landmarks

Example

import sign_language_translator as slt

# Load a dictionary video's landmark embedding asset
landmarks = slt.Landmarks.load_asset("pk-hfad-1_airplane.landmarks-mediapipe.csv")

# Load a replication video's landmarks from the built-in datasets
landmarks = slt.Landmarks.load_asset("landmarks/pk-hfad-1_airplane_dm0001_front.landmarks-mediapipe.csv", archive_name="datasets/pk-hfad-1_dm0001_front.landmarks-mediapipe-csv.zip")

property n_coordinates: int[source]: The number of axes/coordinates (features) for each landmark.

property n_features: int[source]: The number of features (coordinates) for each landmark.

property n_frames: int[source]: The number of frames or time-steps in the landmarks data object.

property n_landmarks: int[source]: The number of landmarks in each frame.

static name() → str[source][source]: the string code of the sign format

property ndim: int[source]: The number of dimensions of the landmarks data array (should be 3).

new_animation(title: str | None = '{frame_number}', style: Literal['dark_background', 'default'] = 'default', azimuth: float = 20, elevation: float = 15, roll: float = 0, azimuth_delta: float = 0, elevation_delta: float = 0, roll_delta: float = 0, scatter_size: float = 2, figure_scale: float | None = 5, interval: float | int = 37, repeat_delay: float | int = 200, blit: bool = True) → FuncAnimation[source][source]

Creates a new 3D animation object of the landmarks.

Parameters:

title (Optional[str]) – The title of the animation. Can include the placeholder “{frame_number}” to display the frame number. Defaults to “{frame_number}”.
style (Literal["dark_background", "default"]) – The color theme of the animation. Defaults to “default”.
azimuth (float) – The azimuth angle (rotation around the vertical axis) of the camera view point. Defaults to 20.
elevation (float) – The elevation angle (amount of rise from the horizontal plane) of the camera view point. Defaults to 15.
roll (float) – The roll angle (rotation around the line of sight) of the camera view point. Defaults to 0.
azimuth_delta (float) – The change in azimuth angle per frame. Defaults to 0.
elevation_delta (float) – The change in elevation angle per frame. Defaults to 0.
roll_delta (float) – The change in roll angle per frame. Defaults to 0.
scatter_size (float) – The size of the scatter points. Defaults to 2.
figure_scale (Optional[float]) – The size of the figure. Defaults to 5.
interval (Union[float, int]) – The interval between frames in milliseconds. Defaults to 37.
repeat_delay (Union[float, int]) – The delay between animation replays in milliseconds. Defaults to 200.
blit (bool) – Whether to use blitting for faster updates (non-changing graphic elements are rendered once into a background image). Defaults to True.

Returns:

The created animation.

Return type:

FuncAnimation

numpy(*args, **kwargs) → ndarray[Any, dtype[_ScalarType_co]][source][source]

Returns the landmarks data as a numpy array. Additional arguments are passed to the numpy.array constructor.

Returns:: The sign data as a NumPy array.
Return type:: NDArray

Example:

import sign_language_translator as slt

landmarks = slt.Landmarks([[[0,1,2], [1,2,3]]])
landmarks.numpy()
# array([[[0, 1, 2], [1, 2, 3]]])

save(path: str, overwrite=False, precision=4, **kwargs) → None[source][source]

Saves the current object’s data to a file. Supported formats include .npy, .pt/.pth (which contain 3D data) and .csv which flattens each frame and puts it into a separate row. CSV files also contain a header with letters representing the coordinate axes and numbers identifying the landmark.

Parameters:

path (str) – The file path to save the data to.
overwrite (bool, optional) – Whether to overwrite the file if it already exists. Defaults to False.
precision (int, optional) – The number of decimal places for saving floating-point values in CSV. Defaults to 4.

Raises:

FileExistsError – If the file already exists and overwrite is False.
ValueError – If the file format is not supported.

save_animation(path, overwrite=True, writer: str | None = None, **kwargs) → None[source][source]

Save the video animation of the landmarks data to a file.

Parameters:

path (str) – The path to save the animation file.
overwrite (bool, optional) – Whether to overwrite the file if it already exists. Defaults to True.
writer (Optional[str], optional) – The name of the matplotlib writer to use for saving the animation. Defaults to None.
**kwargs – Additional keyword arguments to be passed to the new_animation method.

save_frames_grid(path: str, rows: int = 3, columns: int = 5, overwrite=True, **kwargs) → None[source][source]

Save an image file of a grid of 3D visualizations of the landmarks data.

Parameters:

path (str) – The path to save the image.
rows (int, optional) – The number of rows in the grid. Defaults to 3.
columns (int, optional) – The number of columns in the grid. Defaults to 5.
overwrite (bool, optional) – Whether to overwrite the file if it already exists. Defaults to True.
**kwargs – Additional keyword arguments to customize the grid passed to the slt.vision.landmarks.MatPlot3D.frames_grid function.

property shape: Tuple[int, ...][source]: number of elements in each of the data array’s dimensions e.g. (n_frames, n_landmarks, n_features)

show(player: Literal['jshtml', 'html5'] = 'jshtml', **kwargs) → None[source][source]

Displays the landmarks data as a 3D animation in a Jupyter notebook or as a video in a separate window if run from the terminal.

Parameters:

player (Literal['jshtml', 'html5'], optional) – The visualization tool to use for displaying the animation. Defaults to “jshtml”.
**kwargs – Additional keyword arguments to pass to the new_animation method. See its docstring for details.

show_frames_grid(rows=3, columns=5, **kwargs)[source][source]

Displays a grid of frames equally spaced in time drawn as 3D scatter plots & lines connecting the points.

Parameters:

rows (int) – The number of rows in the grid. Default is 3.
columns (int) – The number of columns in the grid. Default is 5.
**kwargs – Additional keyword arguments to be passed to the slt.vision.landmarks.MatPlot3D.frames_grid function.

tolist() → List[List[List[float | int]]][source][source]

Returns the landmarks data as a 3D nested list of numbers.

Returns:: The sign data as a nested list.
Return type:: List[List[List[Union[float, int]]]]

torch(dtype: dtype | None = None, device: device | str | None = None) → Tensor[source][source]

Returns the landmarks data as a PyTorch tensor.

Parameters:

dtype (torch.dtype, optional) – The desired data type of the tensor. Defaults to None.
device (Union[torch.device, str], optional) – The desired device for the tensor. Defaults to None.

Returns:

The sign data as a PyTorch tensor.

Return type:

torch.Tensor

transform(transformation: Callable[[ndarray[Any, dtype[_ScalarType_co]]], ndarray[Any, dtype[_ScalarType_co]]] | Callable[[Tensor], Tensor], inplace=False) → Landmarks[source][source]: apply some transformation to the sign to change its appearance