sign_language_translator.languages package

Subpackages

Submodules

Module contents

class sign_language_translator.languages.SignLanguage[source][source]

Bases: ABC

This abstract class defines the structure and methods required for mapping spoken language text to signs in sign languages using rule-based approaches.

Keys[source]

Enumerates all keys that are used in a sign dict.

Type:: enum.Enum

name()[source][source]: Returns the name of the sign language.

tokens_to_sign_dicts()[source][source]: Converts tokens to signs based on rules and returns a list of sign dictionaries.

restructure_sentence()[source][source]: Restructures a sentence by adjusting grammar, dropping meaningless words, and normalizing synonyms.

_make_equal_weight_sign_dict()[source][source]: Creates a sign dictionary with equal weights for the provided signs.

class SignDictKeys(value)[source][source]

Bases: Enum

Enumerates all keys that are used in a sign dict.

SIGNS[source]

key for the ‘signs’ field in the sign dict mapping to list of sequence of video names.

Type:: str

WEIGHTS[source]

key for the ‘weights’ field in the sign dict mapping to the usage frequency of a video sequence.

Type:: str

SIGNS = 'signs'[source]

WEIGHTS = 'weights'[source]

abstract static name() → str[source][source]: Returns the name of the sign language.

abstract restructure_sentence(sentence: Iterable[str], tags: Iterable[Any] | None = None, contexts: Iterable[Any] | None = None) → Tuple[Iterable[str], Iterable[Any], Iterable[Any]][source][source]

Restructures a sentence by changing the grammar, removing stopwords, spaces & punctuation, and modifying token contents.

Parameters:

sentence (Iterable[str]) – Input sentence to be restructured.
tags (Iterable[Any], optional) – Additional tags associated with the sentence. Defaults to None.
contexts (Iterable[Any], optional) – Additional contexts associated with the sentence. Defaults to None.

Returns:

The restructured sentence, associated tags, and contexts.

Return type:

Tuple[Iterable[str], Iterable[Any], Iterable[Any]]

abstract tokens_to_sign_dicts(tokens: Iterable[str], tags: Iterable[Any] | None = None, contexts: Iterable[Any] | None = None) → List[Dict[str, List[List[str]] | List[float]]][source][source]

Converts tokens to signs based on rules and returns a list of sign dictionaries.

Parameters:

tokens (Iterable[str]) – Input tokens to be converted to signs.
tags (Iterable[Any], optional) – Additional tags associated with the tokens. Defaults to None.
contexts (Iterable[Any], optional) – Additional contexts associated with the tokens. Defaults to None.

Returns:

A list of sign dictionaries, where each dictionary contains: the ‘signs’ field mapping to a list of sign sequences and the ‘weights’ field mapping to the usage frequency of each sign sequence. e.g. “word” -> [{“signs”: [[sign_1, sign_2], [alternate_1]], “weights”: [10, 5]}, …]

Return type:

List[Dict[str, List[List[str]] | List[float]]]

class sign_language_translator.languages.TextLanguage[source][source]

Bases: ABC

Base NLP class for a language.

Subclass it and provide the functionality to tokenize text and classify & disambiguate tokens. Each token should correspond to a sign language clip.

abstract classmethod allowed_characters() → Set[str][source][source]: Returns a set of all allowed characters in the language.

abstract detokenize(tokens: Iterable[str]) → str[source][source]: Joins tokens back into text.

abstract get_tags(tokens: str | Iterable[str]) → List[Any][source][source]: Get the classifications of all tokens in the form of a sequence of tags

abstract get_word_senses(tokens: str | Iterable[str]) → List[List[str]][source][source]: Get all known meanings of the ambiguous words.

abstract static name() → str[source][source]: Returns the name of the language used everywhere else in datasets.

abstract preprocess(text: str) → str[source][source]: Preprocesses text before tokenization. Make sure no different unicode characters are used for the same word. Remove unnecessary symbols, spaces, etc.

static romanize(text: str, *args, add_diacritics=True, character_translation_table: Dict[int, str] | None = None, n_gram_map: Dict[str, str] | None = None, **kwargs) → str[source][source]

Map characters to phonetically similar characters of the English language. Transliteration is useful for readability & simple text-to-speech. First maps (n>1)-grams, then unigrams.

ALA-LC Standardized Romanization Tables (70 languages): https://www.loc.gov/catdir/cpso/roman.html

Parameters:

text (str) – Non-English text to be mapped to Latin script.
add_diacritics (bool, optional) – Whether to use diacritics over English characters to help pronunciation. (Rules: 1. The under-dot ‘ ̣’ indicates alternate soft/hard pronunciation of the letter. 2. The over-bar/macron ‘ ̄’ means long pronunciation). Defaults to True.
character_translation_table (Optional[Dict[int, str]], optional) – A dictionary mapping unicode of single characters to their latin equivalent. Defaults to None.
n_gram_map (Optional[Dict[str, str]], optional) – A dictionary mapping bigrams, trigrams or more to their latin equivalent. Keys are expected to be regular expressions. Defaults to None.

abstract sentence_tokenize(text: str) → List[str][source][source]: Break text into sentences.

abstract tag(tokens: str | Iterable[str]) → List[Tuple[str, Any]][source][source]: Classify the tokens and mark them with appropriate tags.

abstract classmethod token_regex() → str[source][source]: Returns a regular expression that matches words in this language.

abstract tokenize(text: str) → List[str][source][source]: Break apart text into words or phrases

class sign_language_translator.languages.Vocab(language: str = '.^', country: str = '.^', organization: str = '.^', part_number: str = '.^', data_root_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/sign-language-translator/checkouts/latest/sign_language_translator/assets', arg_is_regex: bool = True, word_sense_regex: str = '\\([^\\(\\)]*\\)')[source][source]

Bases: object

Loads text datasets for a specific language, country and organization.

Note

Our mapping datasets will only be downloaded automatically if the data_root_dir arg is the same as Assets.ROOT_DIR.

remove_word_sense(text: str) → str[source][source]

Remove the word sense or disambiguation information from given text.

Parameters:: text (str) – The text from which the word sense needs to be removed.
Returns:: The word without the word sense or disambiguation information.
Return type:: str

Example:

word = "this is a spring(metal-coil). those are glasses(water-containers)."
without_word_sense = remove_word_sense(word)
print(without_word_sense)  # Output: "this is a spring. those are glasses."

sign_language_translator.languages.get_sign_language(language_name: str | Enum) → SignLanguage[source][source]

Retrieves a SignLanguage object based on the provided language name.

Parameters:: language_name (str) – The name of the language.
Returns:: An instance of SignLanguage class corresponding to the provided language name.
Return type:: SignLanguage
Raises:: ValueError – If no SignLanguage class is known for the provided language name.

sign_language_translator.languages.get_text_language(language_name: str | Enum) → TextLanguage[source][source]

Retrieves a TextLanguage object based on the provided language name.

Parameters:: language_name (str) – The name of the language.
Returns:: An instance of the TextLanguage class corresponding to the provided language name.
Return type:: TextLanguage
Raises:: ValueError – If no TextLanguage class is known for the provided language name.