sign_language_translator.models.video_embedding.mediapipe_landmarks_model module

This module contains the MediaPipeLandmarksModel class, which is a deep learning-based video embedding model utilizing the MediaPipe framework for extracting pose and hand landmarks from video frames.

Classes:

MediaPipeLandmarksModel: A video embedding model that utilizes MediaPipe for pose and hand landmark extraction.

Example:

from sign_language_translator.models import MediaPipeLandmarksModel
from sign_language_translator.vision.utils import iter_frames_with_opencv

mediapipe_model = MediaPipeLandmarksModel(number_of_persons=1)

frame_sequence = iter_frames_with_opencv("video.mp4")
embedding = mediapipe_model.embed(frame_sequence, landmark_type="world")
print(embedding.shape)
class sign_language_translator.models.video_embedding.mediapipe_landmarks_model.MediaPipeLandmarksModel(pose_model_name='pose_landmarker_heavy.task', hand_model_name='hand_landmarker.task', number_of_persons: int = 1)[source]

Bases: VideoEmbeddingModel

A video embedding model using MediaPipe to extract pose and hand landmarks from video frames.

Parameters:
  • pose_model_name (str) – The name of the pose estimation model.

  • hand_model_name (str) – The name of the hand estimation model.

  • number_of_persons (int) – The maximum number of persons to detect in each frame.

n_persons

The maximum number of persons to detect in each frame.

Type:

int

embed()[source]

Embeds a sequence of frames using pose and hand landmarks.

embed(frame_sequence: Iterable[Tensor | ndarray[Any, dtype[uint8]]], landmark_type: str = 'world', progress_callback: ProgressStatusCallback | None = None, total_frames: int | None = None, **kwargs) Tensor[source]

Embed a sequence of frames (video) into a sequence of pose & hand landmarks.

Parameters:
  • frame_sequence (Iterable[torch.Tensor | NDArray[np.uint8]]) – A sequence of video frames as 3D arrays (W, H, c).

  • landmark_type (str) – The type of landmarks to include in the embedding (“world”, “image”, “all”).

Returns:

A tensor containing the frame embeddings.

Return type:

torch.Tensor