sign_language_translator.models.video_embedding.mediapipe_landmarks_model module
This module contains the MediaPipeLandmarksModel class, which is a deep learning-based video embedding model utilizing the MediaPipe framework for extracting pose and hand landmarks from video frames.
- Classes:
MediaPipeLandmarksModel: A video embedding model that utilizes MediaPipe for pose and hand landmark extraction.
Example:
from sign_language_translator.models import MediaPipeLandmarksModel
from sign_language_translator.vision.utils import iter_frames_with_opencv
mediapipe_model = MediaPipeLandmarksModel(number_of_persons=1)
frame_sequence = iter_frames_with_opencv("video.mp4")
embedding = mediapipe_model.embed(frame_sequence, landmark_type="world")
print(embedding.shape)
- class sign_language_translator.models.video_embedding.mediapipe_landmarks_model.MediaPipeLandmarksModel(pose_model_name='pose_landmarker_heavy.task', hand_model_name='hand_landmarker.task', number_of_persons: int = 1)[source]
Bases:
VideoEmbeddingModelA video embedding model using MediaPipe to extract pose and hand landmarks from video frames.
- Parameters:
pose_model_name (str) – The name of the pose estimation model.
hand_model_name (str) – The name of the hand estimation model.
number_of_persons (int) – The maximum number of persons to detect in each frame.
- n_persons
The maximum number of persons to detect in each frame.
- Type:
int
- embed(frame_sequence: Iterable[Tensor | ndarray[Any, dtype[uint8]]], landmark_type: str = 'world', progress_callback: ProgressStatusCallback | None = None, total_frames: int | None = None, **kwargs) Tensor[source]
Embed a sequence of frames (video) into a sequence of pose & hand landmarks.
- Parameters:
frame_sequence (Iterable[torch.Tensor | NDArray[np.uint8]]) – A sequence of video frames as 3D arrays (W, H, c).
landmark_type (str) – The type of landmarks to include in the embedding (“world”, “image”, “all”).
- Returns:
A tensor containing the frame embeddings.
- Return type:
torch.Tensor