sign_language_translator.languages package

Subpackages

Submodules

Module contents

class sign_language_translator.languages.SignLanguage[source][source]

Bases: ABC

This abstract class defines the structure and methods required for mapping spoken language text to signs in sign languages using rule-based approaches.

Keys[source]

Enumerates all keys that are used in a sign dict.

Type:

enum.Enum

name()[source][source]

Returns the name of the sign language.

tokens_to_sign_dicts()[source][source]

Converts tokens to signs based on rules and returns a list of sign dictionaries.

restructure_sentence()[source][source]

Restructures a sentence by adjusting grammar, dropping meaningless words, and normalizing synonyms.

_make_equal_weight_sign_dict()[source][source]

Creates a sign dictionary with equal weights for the provided signs.

class SignDictKeys(value)[source][source]

Bases: Enum

Enumerates all keys that are used in a sign dict.

SIGNS[source]

key for the ‘signs’ field in the sign dict mapping to list of sequence of video names.

Type:

str

WEIGHTS[source]

key for the ‘weights’ field in the sign dict mapping to the usage frequency of a video sequence.

Type:

str

SIGNS = 'signs'[source]
WEIGHTS = 'weights'[source]
abstract static name() str[source][source]

Returns the name of the sign language.

abstract restructure_sentence(sentence: Iterable[str], tags: Iterable[Any] | None = None, contexts: Iterable[Any] | None = None) Tuple[Iterable[str], Iterable[Any], Iterable[Any]][source][source]

Restructures a sentence by changing the grammar, removing stopwords, spaces & punctuation, and modifying token contents.

Parameters:
  • sentence (Iterable[str]) – Input sentence to be restructured.

  • tags (Iterable[Any], optional) – Additional tags associated with the sentence. Defaults to None.

  • contexts (Iterable[Any], optional) – Additional contexts associated with the sentence. Defaults to None.

Returns:

The restructured sentence, associated tags, and contexts.

Return type:

Tuple[Iterable[str], Iterable[Any], Iterable[Any]]

abstract tokens_to_sign_dicts(tokens: Iterable[str], tags: Iterable[Any] | None = None, contexts: Iterable[Any] | None = None) List[Dict[str, List[List[str]] | List[float]]][source][source]

Converts tokens to signs based on rules and returns a list of sign dictionaries.

Parameters:
  • tokens (Iterable[str]) – Input tokens to be converted to signs.

  • tags (Iterable[Any], optional) – Additional tags associated with the tokens. Defaults to None.

  • contexts (Iterable[Any], optional) – Additional contexts associated with the tokens. Defaults to None.

Returns:

A list of sign dictionaries, where each dictionary contains

the ‘signs’ field mapping to a list of sign sequences and the ‘weights’ field mapping to the usage frequency of each sign sequence. e.g. “word” -> [{“signs”: [[sign_1, sign_2], [alternate_1]], “weights”: [10, 5]}, …]

Return type:

List[Dict[str, List[List[str]] | List[float]]]

class sign_language_translator.languages.TextLanguage[source][source]

Bases: ABC

Base NLP class for a language.

Subclass it and provide the functionality to tokenize text and classify & disambiguate tokens. Each token should correspond to a sign language clip.

abstract classmethod allowed_characters() Set[str][source][source]

Returns a set of all allowed characters in the language.

abstract detokenize(tokens: Iterable[str]) str[source][source]

Joins tokens back into text.

abstract get_tags(tokens: str | Iterable[str]) List[Any][source][source]

Get the classifications of all tokens in the form of a sequence of tags

abstract get_word_senses(tokens: str | Iterable[str]) List[List[str]][source][source]

Get all known meanings of the ambiguous words.

abstract static name() str[source][source]

Returns the name of the language used everywhere else in datasets.

abstract preprocess(text: str) str[source][source]

Preprocesses text before tokenization. Make sure no different unicode characters are used for the same word. Remove unnecessary symbols, spaces, etc.

static romanize(text: str, *args, add_diacritics=True, character_translation_table: Dict[int, str] | None = None, n_gram_map: Dict[str, str] | None = None, **kwargs) str[source][source]

Map characters to phonetically similar characters of the English language. Transliteration is useful for readability & simple text-to-speech. First maps (n>1)-grams, then unigrams.

ALA-LC Standardized Romanization Tables (70 languages): https://www.loc.gov/catdir/cpso/roman.html

Parameters:
  • text (str) – Non-English text to be mapped to Latin script.

  • add_diacritics (bool, optional) – Whether to use diacritics over English characters to help pronunciation. (Rules: 1. The under-dot ‘ ̣’ indicates alternate soft/hard pronunciation of the letter. 2. The over-bar/macron ‘ ̄’ means long pronunciation). Defaults to True.

  • character_translation_table (Optional[Dict[int, str]], optional) – A dictionary mapping unicode of single characters to their latin equivalent. Defaults to None.

  • n_gram_map (Optional[Dict[str, str]], optional) – A dictionary mapping bigrams, trigrams or more to their latin equivalent. Keys are expected to be regular expressions. Defaults to None.

abstract sentence_tokenize(text: str) List[str][source][source]

Break text into sentences.

abstract tag(tokens: str | Iterable[str]) List[Tuple[str, Any]][source][source]

Classify the tokens and mark them with appropriate tags.

abstract classmethod token_regex() str[source][source]

Returns a regular expression that matches words in this language.

abstract tokenize(text: str) List[str][source][source]

Break apart text into words or phrases

class sign_language_translator.languages.Vocab(language: str = '.^', country: str = '.^', organization: str = '.^', part_number: str = '.^', data_root_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/sign-language-translator/checkouts/latest/sign_language_translator/assets', arg_is_regex: bool = True, word_sense_regex: str = '\\([^\\(\\)]*\\)')[source][source]

Bases: object

Loads text datasets for a specific language, country and organization.

Note

Our mapping datasets will only be downloaded automatically if the data_root_dir arg is the same as Assets.ROOT_DIR.

remove_word_sense(text: str) str[source][source]

Remove the word sense or disambiguation information from given text.

Parameters:

text (str) – The text from which the word sense needs to be removed.

Returns:

The word without the word sense or disambiguation information.

Return type:

str

Example:

word = "this is a spring(metal-coil). those are glasses(water-containers)."
without_word_sense = remove_word_sense(word)
print(without_word_sense)  # Output: "this is a spring. those are glasses."
sign_language_translator.languages.get_sign_language(language_name: str | Enum) SignLanguage[source][source]

Retrieves a SignLanguage object based on the provided language name.

Parameters:

language_name (str) – The name of the language.

Returns:

An instance of SignLanguage class corresponding to the provided language name.

Return type:

SignLanguage

Raises:

ValueError – If no SignLanguage class is known for the provided language name.

sign_language_translator.languages.get_text_language(language_name: str | Enum) TextLanguage[source][source]

Retrieves a TextLanguage object based on the provided language name.

Parameters:

language_name (str) – The name of the language.

Returns:

An instance of the TextLanguage class corresponding to the provided language name.

Return type:

TextLanguage

Raises:

ValueError – If no TextLanguage class is known for the provided language name.