sign_language_translator.vision.landmarks package

Submodules

Module contents

class sign_language_translator.vision.landmarks.BaseConnections[source][source]

Bases: ABC

A class containing information about the connections between landmarks generated from various models.

abstract property connections: List[Connection][source]: indexes of landmarks that are connected

property line_colors: List[Tuple[int, int, int]][source]: list of colors for each connection

property line_indexes: List[Sequence[int]][source]: list of sequence of indexes that are connected with single line

property line_labels: List[str][source]: list of labels for each connection

property matplot3d_config: Dict[str, Any][source]: Configuration arguments for 3D matplotlib plot

abstract property n_features: int[source]: Total number of features per landmark

abstract property n_landmarks: int[source]: Total number of landmarks

abstract static name() → str[source][source]: The name of the connection format

Bases: Sign

A class to represent and manipulate landmarks data. Inherits from the Sign class.

Parameters:: sign (NDArray | Tensor | str) – It can be provided as a path to a file (csv, npy, pt, pth), a NumPy array, a PyTorch tensor, or a sequence of arrays or tensors or numbers (3D: n_frames, n_landmarks, n_features).

load(path: str, **kwargs): Class method to load landmarks data from a file and return a new Landmarks object.

save(path: str, overwrite=False, precision=4, **kwargs): Saves the landmarks data to a file.

name()[source][source]: Static method which returns the string code of the sign format.

numpy(*args, **kwargs)[source][source]: Returns the landmarks data as a NumPy array.

torch(dtype=None, device=None)[source][source]: Returns the landmarks data as a PyTorch tensor.

tolist()[source][source]: Returns the landmarks data as a nested list.

concatenate(objects: Iterable[Landmarks]): Concatenates a sequence of Landmarks objects along the first dimension (time) and returns a new Landmarks object

transform(transformation: Callable): Applies a transformation function to the landmarks data.

show(**kwargs)[source][source]: Displays the landmarks data.

__getitem__(indices)[source][source]: Returns a new Landmarks object with the specified indices.

__iter__()[source][source]: Initializes the iteration over the frames of the landmarks data.

__next__()[source][source]: Returns the next frame of the landmarks data.

data[source]: The landmarks data as a NumPy array or PyTorch tensor depending upon what it was initialized with.

n_frames[source]: The number of frames or time-steps in the data.

n_landmarks[source]: The number of landmarks in each frame of the data.

n_features[source]: The number of features per landmark (same as n_coordinates).

shape[source]: The shape of the landmarks data array as a tuple of integers.

ndim[source]: The number of dimensions of the landmarks data array (should be 3).

property animation: FuncAnimation[source]: Visualization of the landmarks on a 3D graph.

Note

For interactive display in a Jupyter notebook, use %matplotlib widget magic command and then run a cell with landmarks_obj.animation on last line.

static concatenate(objects: Iterable[Landmarks]) → Landmarks[source][source]

Concatenates a sequence of Landmarks objects along the time dimension (dim=0) and returns a new Landmarks object.

Parameters:: objects (Iterable[Landmarks]) – An iterable of Landmarks objects to concatenate.
Returns:: A new Landmarks object containing the data concatenated in time dimension.
Return type:: Landmarks
Raises:: ValueError – If the connections of the Landmarks objects are not the same.

property connections: BaseConnections[source]

Object defining the order in which landmarks are connected during display and other properties depending on the model used to extract the landmarks.

Raises:: ValueError – If this property is accessed before landmarks connections have been defined.

copy() → Landmarks[source][source]

Creates a deep copy of the Landmarks object.

Returns:: A new Landmarks object with the same data and connections.
Return type:: Landmarks

property data: ndarray[Any, dtype[_ScalarType_co]] | Tensor[source]: The landmarks data which is a 3D array or tensor of shape (n_frames, n_landmarks, n_features).

classmethod load(path: str, **kwargs) → Landmarks[source][source]

Class method to load landmarks data from a file and return a new Landmarks object. The supported file extensions are .npy & .pt with must contain 3D arrays (n_frames, n_landmarks, n_features) and .csv which must have n_frames rows and n_landmarks * n_features columns.

The header row in .csv is optional if the filename contains the name of a supported embedding model (see load_asset function for example models). The columns in the .csv are expected to be in the format: [<axis-letter><landmark-number>,…] (e.g. x0, y0, z0, x1, y1, z1, …, xn, yn, zn). Possible axis-letters: x, y, z, a-w, aa-zz, … (only the first 3 are required to be in that order).

Parameters:: path (str) – The file path to load the data from.
Returns:: A new Landmarks object containing the loaded data.
Return type:: Landmarks

classmethod load_asset(label: str, archive_name: str | None = None, overwrite=False, progress_bar=True, leave=True, **kwargs) → Landmarks[source][source]

Class method to load a landmarks file from a one-time-auto-downloaded dataset archive and return a new Landmarks object.

Parameters:

label (str) – The filename of the landmarks asset to load. ‘landmarks/’ is prepended to the label if it does not start with it. An example is ‘landmarks/pk-hfad-1_airport.landmarks-mediapipe.csv’) for embedding of a dictionary video. General syntax is landmarks/country-organization-number_text[_person_camera].landmarks-model.extension.
archive_name (Optional[str], optional) – The name of the archive which contains the landmarks asset. If None, the archive name is inferred from the label. An example is datasets/pk-hfad-1.landmarks-mediapipe-csv.zip. General syntax is datasets/country-organization-number[_person_camera].landmarks-model-extension.zip. Defaults to None.
overwrite (bool, optional) – Whether to overwrite the landmarks asset if it is already extracted. Defaults to False.
progress_bar (bool, optional) – Whether to display a progress bar while downloading the archive or extracting the asset. Defaults to True.
leave (bool, optional) – Whether to leave the progress bar after the operation is complete. Defaults to True.
**kwargs – Additional keyword arguments to be passed to the Landmarks constructor.

Raises:

FileNotFoundError – If no landmarks assets are found for the given label.

Warns:

UserWarning – If multiple landmarks assets match the given label and the only first asset is used.

Returns:

An instance of the Landmarks class representing the dataset video embedding that matched the label.

Return type:

Landmarks

Example

import sign_language_translator as slt

# Load a dictionary video's landmark embedding asset
landmarks = slt.Landmarks.load_asset("pk-hfad-1_airplane.landmarks-mediapipe.csv")

# Load a replication video's landmarks from the built-in datasets
landmarks = slt.Landmarks.load_asset("landmarks/pk-hfad-1_airplane_dm0001_front.landmarks-mediapipe.csv", archive_name="datasets/pk-hfad-1_dm0001_front.landmarks-mediapipe-csv.zip")

property n_coordinates: int[source]: The number of axes/coordinates (features) for each landmark.

property n_features: int[source]: The number of features (coordinates) for each landmark.

property n_frames: int[source]: The number of frames or time-steps in the landmarks data object.

property n_landmarks: int[source]: The number of landmarks in each frame.

static name() → str[source][source]: the string code of the sign format

property ndim: int[source]: The number of dimensions of the landmarks data array (should be 3).

new_animation(title: str | None = '{frame_number}', style: Literal['dark_background', 'default'] = 'default', azimuth: float = 20, elevation: float = 15, roll: float = 0, azimuth_delta: float = 0, elevation_delta: float = 0, roll_delta: float = 0, scatter_size: float = 2, figure_scale: float | None = 5, interval: float | int = 37, repeat_delay: float | int = 200, blit: bool = True) → FuncAnimation[source][source]

Creates a new 3D animation object of the landmarks.

Parameters:

title (Optional[str]) – The title of the animation. Can include the placeholder “{frame_number}” to display the frame number. Defaults to “{frame_number}”.
style (Literal["dark_background", "default"]) – The color theme of the animation. Defaults to “default”.
azimuth (float) – The azimuth angle (rotation around the vertical axis) of the camera view point. Defaults to 20.
elevation (float) – The elevation angle (amount of rise from the horizontal plane) of the camera view point. Defaults to 15.
roll (float) – The roll angle (rotation around the line of sight) of the camera view point. Defaults to 0.
azimuth_delta (float) – The change in azimuth angle per frame. Defaults to 0.
elevation_delta (float) – The change in elevation angle per frame. Defaults to 0.
roll_delta (float) – The change in roll angle per frame. Defaults to 0.
scatter_size (float) – The size of the scatter points. Defaults to 2.
figure_scale (Optional[float]) – The size of the figure. Defaults to 5.
interval (Union[float, int]) – The interval between frames in milliseconds. Defaults to 37.
repeat_delay (Union[float, int]) – The delay between animation replays in milliseconds. Defaults to 200.
blit (bool) – Whether to use blitting for faster updates (non-changing graphic elements are rendered once into a background image). Defaults to True.

Returns:

The created animation.

Return type:

FuncAnimation

numpy(*args, **kwargs) → ndarray[Any, dtype[_ScalarType_co]][source][source]

Returns the landmarks data as a numpy array. Additional arguments are passed to the numpy.array constructor.

Returns:: The sign data as a NumPy array.
Return type:: NDArray

Example:

import sign_language_translator as slt

landmarks = slt.Landmarks([[[0,1,2], [1,2,3]]])
landmarks.numpy()
# array([[[0, 1, 2], [1, 2, 3]]])

save(path: str, overwrite=False, precision=4, **kwargs) → None[source][source]

Saves the current object’s data to a file. Supported formats include .npy, .pt/.pth (which contain 3D data) and .csv which flattens each frame and puts it into a separate row. CSV files also contain a header with letters representing the coordinate axes and numbers identifying the landmark.

Parameters:

path (str) – The file path to save the data to.
overwrite (bool, optional) – Whether to overwrite the file if it already exists. Defaults to False.
precision (int, optional) – The number of decimal places for saving floating-point values in CSV. Defaults to 4.

Raises:

FileExistsError – If the file already exists and overwrite is False.
ValueError – If the file format is not supported.

save_animation(path, overwrite=True, writer: str | None = None, **kwargs) → None[source][source]

Save the video animation of the landmarks data to a file.

Parameters:

path (str) – The path to save the animation file.
overwrite (bool, optional) – Whether to overwrite the file if it already exists. Defaults to True.
writer (Optional[str], optional) – The name of the matplotlib writer to use for saving the animation. Defaults to None.
**kwargs – Additional keyword arguments to be passed to the new_animation method.

save_frames_grid(path: str, rows: int = 3, columns: int = 5, overwrite=True, **kwargs) → None[source][source]

Save an image file of a grid of 3D visualizations of the landmarks data.

Parameters:

path (str) – The path to save the image.
rows (int, optional) – The number of rows in the grid. Defaults to 3.
columns (int, optional) – The number of columns in the grid. Defaults to 5.
overwrite (bool, optional) – Whether to overwrite the file if it already exists. Defaults to True.
**kwargs – Additional keyword arguments to customize the grid passed to the slt.vision.landmarks.MatPlot3D.frames_grid function.

property shape: Tuple[int, ...][source]: number of elements in each of the data array’s dimensions e.g. (n_frames, n_landmarks, n_features)

show(player: Literal['jshtml', 'html5'] = 'jshtml', **kwargs) → None[source][source]

Displays the landmarks data as a 3D animation in a Jupyter notebook or as a video in a separate window if run from the terminal.

Parameters:

player (Literal['jshtml', 'html5'], optional) – The visualization tool to use for displaying the animation. Defaults to “jshtml”.
**kwargs – Additional keyword arguments to pass to the new_animation method. See its docstring for details.

show_frames_grid(rows=3, columns=5, **kwargs)[source][source]

Displays a grid of frames equally spaced in time drawn as 3D scatter plots & lines connecting the points.

Parameters:

rows (int) – The number of rows in the grid. Default is 3.
columns (int) – The number of columns in the grid. Default is 5.
**kwargs – Additional keyword arguments to be passed to the slt.vision.landmarks.MatPlot3D.frames_grid function.

tolist() → List[List[List[float | int]]][source][source]

Returns the landmarks data as a 3D nested list of numbers.

Returns:: The sign data as a nested list.
Return type:: List[List[List[Union[float, int]]]]

torch(dtype: dtype | None = None, device: device | str | None = None) → Tensor[source][source]

Returns the landmarks data as a PyTorch tensor.

Parameters:

dtype (torch.dtype, optional) – The desired data type of the tensor. Defaults to None.
device (Union[torch.device, str], optional) – The desired device for the tensor. Defaults to None.

Returns:

The sign data as a PyTorch tensor.

Return type:

torch.Tensor

transform(transformation: Callable[[ndarray[Any, dtype[_ScalarType_co]]], ndarray[Any, dtype[_ScalarType_co]]] | Callable[[Tensor], Tensor], inplace=False) → Landmarks[source][source]: apply some transformation to the sign to change its appearance

class sign_language_translator.vision.landmarks.MatPlot3D[source][source]

Bases: object

classmethod animate(frames: Sequence[Sequence[Tuple[float, float, float]]] | ndarray[Any, dtype[_ScalarType_co]], line_indexes: Sequence[Sequence[int]] | None = None, line_colors: Sequence[Tuple[float, float, float] | None] = (), line_labels: Sequence[str | None] = (), scatter_color: Tuple[float, float, float] = (0, 0, 0), scatter_size: float = 2, title: str | None = '{frame_number}', vertical_axis: Literal['x', 'y', 'z'] = 'z', ticks_scale: float | None = None, azimuth: float = 20, elevation: float = 15, roll: float = 0, azimuth_delta: float = 0, elevation_delta: float = 0, roll_delta: float = 0, invert_x: bool = False, invert_y: bool = False, invert_z: bool = False, show_grid: bool = True, show_axis: bool = True, figure_scale: float | None = None, style: Literal['dark_background', 'default'] = 'default', layout: Literal['constrained', 'compressed', 'tight', 'none'] = 'none', interval: float | int = 37, repeat_delay: float | int = 100, blit: bool = True) → FuncAnimation[source][source]

Animates the given frames representing 3D coordinates with 3D scatter plot and lines connecting those points.

Parameters:

frames (Union[Sequence[Sequence[Tuple[float, float, float]]], NDArray]) – The frames to animate, represented as a sequence of collection of 3D coordinates.
line_indexes (Optional[Sequence[Sequence[int]]]) – The indexes of the points in a frame to connect in lines. If not provided, connects the points in a cycle [0, 1, 2, …, n-1, 0].
line_colors (Sequence[Union[Tuple[float, float, float], None]]) – The colors of the lines in RGB format normalized to [0.0, 1.0] range. If not provided, default to a gradient of blue to pink to blue.
line_labels (Sequence[Union[str, None]]) – The labels of the lines.
scatter_color (Tuple[float, float, float]) – The color of the scatter points in RGB format normalized to [0.0, 1.0] range. Default is black.
title (Optional[str]) – The title of the animation. Can include the placeholder “{frame_number}” to display the frame number. Defaults to “{frame_number}”.
vertical_axis (Literal["x", "y", "z"]) – The vertical axis in the plot. Default is “z”.
ticks_scale (Optional[float]) – The scale of the ticks. Defaults to the nearest power of 10 under the range in data.
layout (Literal["constrained", "compressed", "tight", "none"]) – The layout of the plot. Default is “none”.
interval (Union[float, int]) – The interval between frames in milliseconds. Default is 37.
repeat_delay (Union[float, int]) – The delay between replays in milliseconds. Default is 100.
blit (bool) – Whether to use blitting for faster updates. Default is True.

Returns:

The animation object.

Return type:

FuncAnimation

classmethod frames_grid(frames: Sequence[Sequence[Tuple[float, float, float]]] | ndarray[Any, dtype[_ScalarType_co]], subplots: Tuple[int, int], line_indexes: Sequence[Sequence[int]] | None = None, line_colors: Sequence[Tuple[float, float, float] | None] = (), line_labels: Sequence[str | None] = (), scatter_color: Tuple[float, float, float] = (0, 0, 0), scatter_size: float = 2, title: str | None = '{frame_number}', figure_title: str | None = None, figure_title_font_size: float = 20, vertical_axis: Literal['x', 'y', 'z'] = 'z', ticks_scale: float | None = None, azimuth: float = 20, elevation: float = 15, roll: float = 0, azimuth_delta: float = 0, elevation_delta: float = 0, roll_delta: float = 0, invert_x: bool = False, invert_y: bool = False, invert_z: bool = False, show_grid: bool = True, show_axis: bool = True, figure_scale: float | None = 4, style: Literal['dark_background', 'default'] = 'default', layout: Literal['constrained', 'compressed', 'tight', 'none'] = 'tight') → Figure[source][source]

Generates a grid of frames with 3D scatter plots and lines connecting the points.

Parameters:

frames (Union[Sequence[Sequence[Tuple[float, float, float]]], NDArray]) – The frames containing the 3D coordinates of the points.
subplots (Tuple[int, int]) – The number of rows and columns in the figure. Each cell is a 3D plot containing one frame.
line_indexes (Optional[Sequence[Sequence[int]]]) – The indexes of points to be connected with lines.
line_colors (Sequence[Union[Tuple[float, float, float], None]]) – The colors of the lines connecting the points. Color should be in RGB format and in range [0.0, 1.0].
line_labels (Sequence[Union[str, None]]) – The labels for the lines connecting the points.
scatter_color (Tuple[float, float, float]) – The color of the scatter points. Color should be in RGB format and in range [0.0, 1.0].
scatter_size (float) – The size of the scatter points.
title (Optional[str]) – The title of each subplot. Can include the placeholder “{frame_number}” to display the frame number.
figure_title (Optional[str]) – The title of the entire figure.
figure_title_font_size (float) – The font size of the figure title.
vertical_axis (Literal["x", "y", "z"]) – The vertical axis in the 3D plots.
azimuth (float) – The azimuth angle (rotation around the vertical axis) of the initial view in the plot. Value must be in degrees.
elevation (float) – The elevation angle (amount of rise from the horizontal plane) of the initial view in the plot. Value must be in degrees.
roll (float) – The roll angle (rotation around the line of sight) of the initial view in the plot. Value must be in degrees.
azimuth_delta (float) – The change in azimuth angle for each subplot. Value must be in degrees.
elevation_delta (float) – The change in elevation angle for each subplot. Value must be in degrees.
roll_delta (float) – The change in roll angle for each subplot. Value must be in degrees.
invert_x (bool) – Whether to invert the x-axis.
invert_y (bool) – Whether to invert the y-axis.
invert_z (bool) – Whether to invert the z-axis.
show_grid (bool) – Whether to show the grid lines on the axes.
show_axis (bool) – Whether to show the axis lines.
figure_scale (Optional[float]) – The size of the entire figure.
style (Literal["dark_background", "default"]) – The color theme of the plot.
layout (Literal["constrained", "compressed", "tight", "none"]) – The spacing between the subplots.

Returns:

The generated matplotlib figure.

Return type:

Figure

static initialize_Axes3D(ax: Axes, x_limits: Tuple[float, float], y_limits: Tuple[float, float], z_limits: Tuple[float, float], ticks_scale: float = 1.0, azimuth: float = 20, elevation: float = 15, roll: float = 0, vertical_axis: str = 'y', invert_x: bool = False, invert_y: bool = False, invert_z: bool = False, show_grid: bool = True, show_axis: bool = True) → None[source][source]

Initializes a 3D Axes object with specified limits, ticks, and settings.

Parameters:

ax (Axes) – The 3D Axes object to be initialized.
x_limits (Tuple[float, float]) – The range of the x-axis from minimum to maximum value.
y_limits (Tuple[float, float]) – The range of the y-axis from minimum to maximum value.
z_limits (Tuple[float, float]) – The range of the z-axis from minimum to maximum value.

static new_figure(x_limits: Tuple[float, float], y_limits: Tuple[float, float], z_limits: Tuple[float, float], vertical_axis: Literal['x', 'y', 'z'] = 'z', figure_scale: float | None = 5, style: Literal['dark_background', 'default'] = 'default', layout: Literal['constrained', 'compressed', 'tight', 'none'] = 'compressed', subplots: Tuple[int, int] = (1, 1)) → Tuple[Figure, List[Axes]][source][source]: Creates a new 3D figure with the specified subplots and settings.

static placeholder_scatter_and_lines(ax: Axes, n_lines: int, line_colors: Sequence[Tuple[float, float, float] | None] = (), line_labels: Sequence[str | None] = (), scatter_color: Tuple[float, float, float] = (0, 0, 0), scatter_size: float = 2) → Tuple[Path3DCollection, List[Line3D]][source][source]

Update a 3D plot with empty Path3DCollection (scatter) and Line3D objects.

Parameters:

ax (Axes) – The 3D axes object to plot on.
n_lines (int) – The number of placeholder lines to create.
line_colors (Sequence[Union[Tuple[float, float, float], None]], optional) – The colors of the lines. If not provided, a gradient of colors will be used.
line_labels (Sequence[Union[str, None]], optional) – The labels for the lines.
scatter_color (Tuple[float, float, float], optional) – The RGB color of the scatter points normalized to [0.0, 1.0] range. Defaults to black.
scatter_size (float, optional) – The size of the scatter points. Defaults to 2.

Returns:

A tuple containing the scatter plot and the list of lines.

Return type:

Tuple[Path3DCollection, List[Line3D]]

static set_frame_data(points: Sequence[Tuple[float, float, float]] | ndarray[Any, dtype[_ScalarType_co]], scatter: Path3DCollection, lines: Sequence[Line3D], line_indexes: Sequence[Sequence[int]] = (), ax: Axes | None = None, azimuth_delta: float = 0, elevation_delta: float = 0, roll_delta: float = 0) → List[Path3DCollection | Line3D][source][source]

Sets the frame data for visualization.

Parameters:

points (Union[Sequence[Tuple[float, float, float]], NDArray]) – A collection of tuples or a 2D NDArray representing the (x, y, z) points.
scatter (Path3DCollection) – Object representing the scatter plot.
lines (Sequence[Line3D]) – A sequence of Line3D objects representing the lines to be plotted.
line_indexes (Sequence[Sequence[int]], optional) – indexes of points to connect with lines. Defaults to ().
ax (Optional[Axes], optional) – An optional Axes object to update the view. Defaults to None.

Returns:

A list containing the updated scatter plot and lines objects.

Return type:

List[Union[Path3DCollection, Line3D]]

class sign_language_translator.vision.landmarks.MediapipeConnections[source][source]

Bases: BaseConnections

Represents the connections for the Mediapipe landmark model.

property connections: List[Connection][source]: indexes of landmarks that are connected

property n_features: int[source]: Total number of features per landmark

property n_landmarks: int[source]: Total number of landmarks

static name() → str[source][source]: The name of the connection format

sign_language_translator.vision.landmarks.get_connections(connections: str) → BaseConnections[source][source]

Create a connections object based on the given string

Parameters:: connections (str) – The name of the connections format to use.
Returns:: The connections object.
Return type:: BaseConnections
Raises:: ValueError – If the connections format is not recognized.