sign_language_translator.vision.video package

Submodules

Module contents

Bases: Sign, VideoFrames

A class to represent and manipulate videos or sequence of images. Inherits from the slt.vision.Sign class.

Parameters:: sign (Union[ str, Sequence[Union[NDArray[np.uint8], torch.Tensor]], NDArray[np.uint8], torch.Tensor, Generator[NDArray[np.uint8], None, None], ]) – The video source. Can be a path to a video or image file, a sequence of frames, a single frame, or a generator of frames.

load()[source][source]: Load a video from the specified path.

load_asset()[source][source]: Load a video asset by its label from the built-in datasets.

name()[source]: Returns the name of the sign format.

numpy()[source]: Convert the video frames to a NumPy array.

torch()[source]: Convert the video frames to a PyTorch tensor.

trim()[source]: Cut the video to a specific time range or index range.

stack()[source]: Stack a list of Video objects along a specified dimension.

concatenate()[source]: Concatenate a sequence of Video objects into a single Video.

append()[source]: Append another Video object to the end of the current Video.

transform()[source]: Apply a transformation function to the individual frames of this object (lazy & in-place).

iter_frames()[source]: Iterate over the frames in the video from a start point to an end point with a certain step size.

__iter__()[source]: Makes the Video object an iterable

__next__()[source]: Get the next frame in the video.

get_frame()[source]: Get a frame from the video at a specific timestamp or index.

__getitem__()[source]: Get a frame or a sub-clip or a cropped clip from the video.

frames_grid()[source]: Create a grid of frames from the video as a single stacked image.

show()[source]: Display the video.

show_frame()[source]: Display a specific frame from the video.

show_frames_grid()[source]: Display a grid of frames from the video as a single stacked image.

save()[source]: Save the video frames to a file.

save_frame()[source]: Save a single frame from the video object to the specified path.

save_frames_grid()[source]: Write a grid of frames from the video to a single image file.

close()[source]: Release the resources occupied by the object and reset the properties.

Properties:: shape: Tuple of array dimensions (n_frames, height, width, n_channels). n_frames: The number of frames in the video. height: Number of vertical pixels in a video frame. width: Number of horizontal pixels in a video frame. n_channels: Number of color channels in a video frame. duration: Total time that the frames would take to play in a sequence. codec: The video codec used to encode the video.

Example

import sign_language_translator as slt

# Load a video from a file
# video = slt.Video("path/to/video.mp4")
# video = slt.Video("path/to/image.jpg")

# load from a numpy array
import numpy as np
noise = slt.Video(np.random.randint(0, 255, (20,100,160,3), dtype=np.uint8), fps=5)

# load a dataset file (auto-download)
video = slt.Video.load_asset("videos/pk-hfad-1_airplane.mp4")
print(video.duration, video.n_frames)  # 1.72 43

# trim and concatenate
video = video + video.trim(start_time=0.5, end_time=1.0)

# crop
video = video[:, 100:-100, 50:, :]

# apply a transformation (flip horizontally)
video.transform(lambda frame: frame[..., ::-1, :])

# save & display
video.save("new_video.mp4", overwrite=True)
video.save_frames_grid("frames_grid.jpg")
video.show()

append(other: Video)[source]

Append another Video object to the end of the current Video.

Parameters:: other (Video) – The Video object to append.
Returns:: None

close()[source]: Release the resources occupied by the object and reset the properties.

property codec: str: The video codec used to encode the video. (e.g. “mp4v”, “h264”, “xvid”, “avc1”, “hvc1”)

static concatenate(objects: Iterable[Video]) → Video[source]

Concatenate a sequence of Video objects in time dimension into a single Video.

Parameters:: objects (Iterable[Video]) – A sequence of Video objects to concatenate.
Returns:: A new Video object that is a linked list of all the input videos.
Return type:: Video
Raises:: ValueError – If the input sequence of videos is empty.

property duration: float: total time that the frames would take to play in a sequence. depends on fps.

frames_grid(rows=2, columns=3, width: int | None = None, height: int | None = None) → ndarray[Any, dtype[uint8]][source]

Create a grid of frames from the video as a single stacked image. Equally spaced timestamps are chosen across the video and arranged into a rows x columns grid.

First frame is placed in the top-right cell. The immediately next frame is placed in the same row and in the column on the right and so on until columns in that row run out and then the row below is chosen

Parameters:

rows (int, optional) – The number of rows in the grid. Defaults to 2.
columns (int, optional) – The number of columns in the grid. Defaults to 3.
width (Optional[int], optional) – The width of the grid. If only height is given, the resized width is calculated by maintaining the aspect ratio of the grid cell. If both are None, the grid is not resized. Defaults to None.
height (Optional[int], optional) – The height of the grid. If only width is given, the resized height is calculated by maintaining the aspect ratio of the grid cell. If both are None, the grid is not resized. Defaults to None.

Returns:

an RGB 3D numpy array containing the stacked frames. shape: (height, width, color_channels).

Return type:

NDArray[np.uint8]

get_frame(timestamp: float | None = None, index: int | None = None) → ndarray[Any, dtype[uint8]][source]

Get a frame from the video at a specific timestamp or index.

Parameters:

timestamp (Optional[float], optional) – The timestamp of the frame to get. If not provided, index will be used. If negative, selects the frame from the end of the video. Defaults to None.
index (Optional[int], optional) – The index of the frame to get. If not provided, timestamp will be used. if negative, selects the frame from the end of the video (-1 is the last frame). Defaults to None.

Raises:

IndexError – If the specified timestamp or index is out of range.

Returns:

The 3D RGB frame at the specified timestamp or index from the video.

Return type:

NDArray[np.uint8]

property height: int: number of vertical pixels in a video frame (dimension=1)

iter_frames(start: int = 0, end: int | None = None, step: int | None = None) → Generator[ndarray[Any, dtype[uint8]], None, None][source]

Iterate over the frames in the video from start index to end index with a certain step size.

Parameters:

start (int, optional) – The index of the start frame. Defaults to 0.
end (Optional[int], optional) – The index of the end frame. None will iterate till the end of the video. Defaults to None.
step (Optional[int], optional) – The step size for iteration. If None, uses the default step size of 1. Defaults to None.

Yields:

NDArray[np.uint8] – 3D array representing frames from the video with shape: (height, width, color_channels).

classmethod load(path: str, **kwargs) → Video[source]

Load a video from the specified path.

Parameters:

path (str) – The path to the video file.
**kwargs – Additional keyword arguments to be passed to the Video constructor.

Returns:

The loaded video object.

Return type:

Video

classmethod load_asset(label: str, archive_name: str | None = None, overwrite=False, progress_bar=True, leave=True, **kwargs) → Video[source]

Class method to load a video asset identified by the given label from the built-in datasets and return it as a new Video object.

This method downloads dictionary videos from direct URLs if the corresponding archive is not already downloaded. Otherwise the video asset will be extracted from an archive which will be auto-downloaded once.

Notes: - To view valid asset IDs, run slt.Assets.get_ids(r”^videos/.*mp4$”) for dictionary videos

or slt.Assets.get_ids(r”^datasets/.*videos.*zip$”) for archived videos.

See slt.Assets.ROOT_DIR for download directory

Parameters:

label (str) – The filename of the video asset to load. ‘videos/’ is prepended to the label if it does not start with it and ‘.mp4’ is appended to the label if it does not end with it. An example is videos/pk-hfad-1_airplane.mp4 for a dictionary video. General syntax is videos/country-organization-number_text[_person_camera].mp4.
archive_name (str | None, optional) – The name of the archive which contains the video asset. If None, the archive name is inferred from the label. Not necessary for dictionary videos. An example is datasets/pk-hfad-1.videos-mp4.zip. General syntax is datasets/country-organization-number[_person_camera].videos-mp4.zip. Defaults to None.
overwrite (bool, optional) – Whether to overwrite the video asset if it is already downloaded or extracted. Defaults to False.
progress_bar (bool, optional) – Whether to display a progress bar while downloading or extracting the asset. Defaults to True.
leave (bool, optional) – Whether to leave the progress bar after the operation is complete. Defaults to True.
**kwargs – Additional keyword arguments to be passed to the Video constructor.

Raises:

FileNotFoundError – If no video assets are found for the given label.

Warns:

UserWarning – If multiple video assets match the given label and the only first asset is used.

Returns:

An instance of the Video class representing the video that matched the label.

Return type:

Video

Examples

import sign_language_translator as slt

# Load a dictionary video asset
video = slt.Video.load_asset("pk-hfad-1_airplane")

# Load a replication video from the built-in datasets
video = slt.Video.load_asset("videos/pk-hfad-1_airplane_dm0001_front.mp4", archive_name="datasets/pk-hfad-1_dm0001_front.videos-mp4.zip")

property n_channels: int: number of color channels in a video frame (e.g. RGB) (dimension=3)

property n_frames: int: The number of frames in the video.

static name() → str[source]: the string code of the sign format

numpy(*args, **kwargs)[source]

Convert the video frames to a (4D) NumPy array.

Parameters:

*args – Positional arguments to be passed to np.array().
**kwargs – Keyword arguments to be passed to np.array().

Returns:

A NumPy array representing the video frames.

Return type:

np.ndarray

save(path: str, overwrite=False, fps: float | None = None, codec: str | None = None, progress_bar=True, leave=False, **kwargs) → None[source]

Save the video frames to a file.

Parameters:

path (str) – The path to the output file.
fps (float | None, optional) – The frames per second of the output video. If None, it will use the fps of the video source or 30 if that is not set either. Defaults to None.
codec (str | None, optional) – The codec used for the output video e.g [“h264”, “mp4v”, “xvid”, “avc1”, “hvc1”]. Make sure the codec is already installed in your system (some do not ship with OpenCV because of license mismatch). If None, it will use the source video codec or “mp4v” if that is not set. Defaults to None.
overwrite (bool, optional) – Whether to overwrite the output file if it already exists. Defaults to False.
progress_bar (bool, optional) – Whether to display a progress bar while saving the video. Defaults to True.
leave (bool, optional) – Whether to leave the progress bar after the operation is complete. Defaults to False.
**kwargs – Additional keyword arguments. (not used yet.)

static save_(frames_iterable: Iterable[ndarray[Any, dtype[uint8]]], path: str, overwrite=False, height: int | None = None, width: int | None = None, fps: float = 30.0, codec: str = 'mp4v', progress_bar=True, leave=False, total_frames: int | None = None, **kwargs) → None[source]

Class method to save a sequence of frames into a video file. A frame is a 3D numpy uint8 array of shape (height, width, channels)

Parameters:

path (str) – The path to the output file.
overwrite (bool, optional) – Whether to overwrite the output file if it already exists. Defaults to False.
height (Optional[int], optional) – The number of vertical pixels in the output video file. If None, the shape (index=0) of the first frame is used. Defaults to None.
width (Optional[int], optional) – The number of horizontal pixels in the output video file. If None, the shape (index=1) of the first frame is used. Defaults to None.
fps (float, optional) – The frames per second of the output video. Defaults to 30.0.
codec (str, optional) – The codec used for the output video e.g [“h264”, “mp4v”, “xvid”, “avc1”, “hvc1”]. Make sure the codec is already installed in your system (some do not ship with OpenCV because of license mismatch). Defaults to “mp4v”.
progress_bar (bool, optional) – Whether to display a progress bar while saving the video. Defaults to True.
leave (bool, optional) – Whether to leave the progress bar after the operation is complete. Defaults to False.
total_frames (Optional[int], optional) – total number of frames in the frames iterable. Used by the progress bar. Defaults to None.

Raises:

FileExistsError – If a file already exists at the output path and overwrite is False.

save_frame(path: str, timestamp: float | None = None, index: int | None = None, overwrite=False) → None[source]

Saves a single frame from the video object to the specified path.

Parameters:

path (str) – The path where the frame will be saved.
timestamp (float | None, optional) – The timestamp of the frame to be saved. If None, index will be used. Defaults to None.
index (int | None, optional) – The index of the frame to be saved. If None, timestamp will be used. Defaults to None.
overwrite (bool, optional) – Whether to overwrite the image file if it already exists. Defaults to False.

Raises:

ValueError – If both or neither timestamp and index are provided or If the specified timestamp or index is out of range.
FileExistsError – If a file already exists at the output path and overwrite is False.

save_frames_grid(path: str, rows: int = 2, columns: int = 3, width: int | None = 1024, height: int | None = None, overwrite=False) → None[source]

Write a grid of frames from the video to a single image file. The grid is created by arranging frames from the video taken at equally spaced timestamps in a rows x columns grid.

Parameters:

path (str) – The path to the output image file.
rows (int, optional) – The number of rows in the grid. Defaults to 2.
columns (int, optional) – The number of columns in the grid. Defaults to 3.
width (Optional[int], optional) – The width of the grid. If only height is given, the resized width is calculated by maintaining the aspect ratio of the grid cell. If both are None, the grid is not resized. Defaults to 1024.
height (Optional[int], optional) – The height of the grid. If only width is given, the resized height is calculated by maintaining the aspect ratio of the grid cell. If both are None, the grid is not resized. Defaults to None.
overwrite (bool, optional) – Whether to overwrite the image file if it already exists. Defaults to False.

Raises:

FileExistsError – If a file already exists at the output path and overwrite is False.

property shape: Tuple[int, int, int, int]: Tuple of array dimensions (n_frames, height, width, n_channels).

show(inline_player: Literal['jshtml', 'html5'] = 'html5', codec='avc1', **kwargs) → None[source]

Display the video. If the function is called from a Jupyter Notebook, the output is displayed inline using the specified player. If the video has a single frame, it is displayed as an image plot. If the function is called from command line, a matplotlib animation window shows the video.

Note

Clear previous video output in jupyter before displaying the video again to avoid issues.

Parameters:

inline_player (str, optional) – The type of matplotlib inline player to use. Defaults to “html5”.
codec (str, optional) – The codec to use for the video (should be installed on the system). Possible values are [“avc1”, “h264”, “mp4v”, …]. Defaults to “avc1”.

show_frame(timestamp: float | None = None, index: int | None = None) → None[source]

Display a specific frame from the video.

Parameters:

timestamp (Optional[float], optional) – The timestamp of the frame to display. If not provided, index will be used. Defaults to None.
index (Optional[int], optional) – The index of the frame to display. If not provided, timestamp will be used. Defaults to None.

show_frames_grid(rows=2, columns=3, width: int | None = 800, height: int | None = None)[source]

Display a grid of frames from the video as a single stacked image. Top left cell contains the first frame. sequence grows towards the right and then down.

Parameters:

rows (int, optional) – The number of rows in the grid. Defaults to 2.
columns (int, optional) – The number of columns in the grid. Defaults to 3.
width (Optional[int], optional) – The width of the grid. If only height is given, the resized width is calculated by maintaining the aspect ratio of the grid cell. If both are None, the grid is not resized. Defaults to 800.
height (Optional[int], optional) – The height of the grid. If only width is given, the resized height is calculated by maintaining the aspect ratio of the grid cell. If both are None, the grid is not resized. Defaults to None.

property source: VideoFrames: A VideoFrames object which wraps around a frame sequence.

classmethod stack(videos: Sequence[Video], dim=1, fps: float | Literal['first', 'max', 'average', 'weighted', 'min'] = 'max')[source]

Stack a list of Video objects along a specified dimension (0=time, 1=height, 2=width, 3=channels). All videos will be aligned at the start and padded with black frames at the end if necessary.

Parameters:

videos (Sequence[Video]) – A list of Video objects to stack.
dim (int, optional) – The dimension along which to stack the videos (0=time, 1=height, 2=width, 3=channels). Negative values represent dimensions relative to the end (e.g., -1 is the channel dimension). Defaults to 1.
fps (Union[float, Literal["first", "max", "average", "weighted"]], optional) – The frames per second of the output video. If “first”, it will use the fps of the first video. If “max”, it will use the maximum fps of all videos. If “average”, it will use the average fps of all videos. If “weighted”, it will use the weighted average fps of all videos (weights=duration). If “min”, it will use the minimum fps of all videos. Not used for time dimension. Defaults to “max”.

Returns:

A new Video object resulting from stacking the input videos.

Return type:

Video

torch(*args, **kwargs)[source]

Convert the video frames to a (4D) PyTorch tensor.

Parameters:

*args – Positional arguments to be passed to torch.tensor().
**kwargs – Keyword arguments to be passed to torch.tensor().

Returns:

A PyTorch tensor representing the video frames.

Return type:

torch.Tensor

transform(transformation: Callable[[ndarray[Any, dtype[uint8]]], ndarray[Any, dtype[uint8]]])[source]

Apply a transformation function to the individual frames of this object (in place).

Parameters:: transformation (Callable[[NDArray[np.uint8]], NDArray[np.uint8]]) – The transformation function to be applied that must map 1 numpy array to another numpy array.
Raises:: ValueError – If the transformation is not callable.
Returns:: None

trim(start_time: float | None = None, end_time: float | None = None, start_index: int | None = None, end_index: int | None = None) → Video[source]

Cut the video to a specific time range or index range. Allows for negative indexing and negative timestamps which select the frame from the end of the video. for example: cut a minute long video with start at 5 sec and end at -15 sec to get a 40 sec video.

Parameters:

start_time (Optional[float], optional) – The start time in seconds of the trimmed video. If only end_time is provided, trim will start from 0. Defaults to None.
end_time (Optional[float], optional) – The end time in seconds of the trimmed video. If only start_time is provided, trim will end at video duration. Defaults to None.
start_index (Optional[int], optional) – The start index of the trimmed video. If only end_index is provided, trim will start from 0. Defaults to None.
end_index (Optional[int], optional) – The end index of the trimmed video. If only start_index is provided, trim will end at last index (inclusive). Defaults to None.

Raises:

ValueError – If the start is not smaller than the end.
IndexError – If the start index or end index is out of range.

Returns:

The trimmed video.

Return type:

Video

property width: int: number of horizontal pixels in a video frame (dimension=2)