sign_language_translator.languages package
Subpackages
- sign_language_translator.languages.sign package
- Submodules
- Module contents
- sign_language_translator.languages.text package
- Submodules
- Module contents
EnglishEnglish.ALLOWED_CHARACTERSEnglish.ALPHABETEnglish.BRACKETSEnglish.CHARACTER_MAPEnglish.CHARACTER_TRANSLATOREnglish.END_OF_SENTENCE_MARKSEnglish.FULL_STOPSEnglish.NUMBER_REGEXEnglish.PUNCTUATIONEnglish.QUESTION_MARKSEnglish.QUOTESEnglish.SYMBOLSEnglish.UNALLOWED_CHARACTERS_REGEXEnglish.UNICODE_RANGEEnglish.WORD_REGEXEnglish.allowed_characters()English.delete_unallowed_characters()English.detokenize()English.get_tags()English.get_word_senses()English.name()English.preprocess()English.romanize()English.sentence_tokenize()English.tag()English.token_regex()English.tokenize()
HindiHindi.ACRONYM_PERIODSHindi.ALLOWED_CHARACTERSHindi.BRACKETSHindi.CHARACTERSHindi.CHARACTER_TO_DECOMPOSEDHindi.CHARACTER_TRANSLATORHindi.DIACRITICSHindi.END_OF_SENTENCE_MARKSHindi.FULL_STOPSHindi.NGRAM_ROMANIZATION_MAPHindi.NUMBER_REGEXHindi.PUNCTUATIONHindi.QUESTION_MARKSHindi.ROMANIZATION_CHARACTER_TRANSLATORHindi.ROMANIZATION_MAPHindi.ROMANIZATION_MAP_CONSONANTS_ASPIRATEHindi.ROMANIZATION_MAP_CONSONANTS_CEREBRALSHindi.ROMANIZATION_MAP_CONSONANTS_DENTALSHindi.ROMANIZATION_MAP_CONSONANTS_GUTTURALSHindi.ROMANIZATION_MAP_CONSONANTS_LABIALSHindi.ROMANIZATION_MAP_CONSONANTS_PALATASHindi.ROMANIZATION_MAP_CONSONANTS_SEMIVOWELSHindi.ROMANIZATION_MAP_CONSONANTS_SIBILANTSHindi.ROMANIZATION_MAP_VOWELS_AND_DIPHTHONGSHindi.SYMBOLSHindi.UNALLOWED_CHARACTERS_REGEXHindi.UNICODE_RANGEHindi.WORD_REGEXHindi.allowed_characters()Hindi.delete_unallowed_characters()Hindi.detokenize()Hindi.get_tags()Hindi.get_word_senses()Hindi.name()Hindi.normalize_characters()Hindi.preprocess()Hindi.romanize()Hindi.sentence_tokenize()Hindi.tag()Hindi.token_regex()Hindi.tokenize()
TagsTextLanguageUrduUrdu.ALLOWED_CHARACTERSUrdu.BRACKETSUrdu.CHARACTER_TO_WORDUrdu.CHARACTER_TRANSLATORUrdu.COMBINE_CHARACTERS_REGEXUrdu.CORRECT_URDU_CHARACTERS_TO_INCORRECTUrdu.DIACRITICSUrdu.DIACRITICS_REGEXUrdu.END_OF_SENTENCE_MARKSUrdu.FULL_STOPSUrdu.HONORIFICSUrdu.NGRAM_ROMANIZATION_MAPUrdu.NUMBER_REGEXUrdu.PUNCTUATIONUrdu.PUNCTUATION_REGEXUrdu.QUESTION_MARKSUrdu.QUOTATION_MARKSUrdu.ROMANIZATION_CHARACTER_TRANSLATORUrdu.ROMANIZATION_MAPUrdu.SPLIT_TO_COMBINED_CHARACTERSUrdu.SYMBOLSUrdu.UNALLOWED_CHARACTERS_REGEXUrdu.UNICODE_RANGEUrdu.WORD_REGEXUrdu.allowed_characters()Urdu.character_normalize()Urdu.delete_unallowed_characters()Urdu.detokenize()Urdu.get_tags()Urdu.get_word_senses()Urdu.name()Urdu.passage_preprocessor()Urdu.poetry_preprocessor()Urdu.preprocess()Urdu.remove_diacritics()Urdu.romanize()Urdu.sentence_tokenize()Urdu.tag()Urdu.token_regex()Urdu.tokenize()Urdu.wikipedia_preprocessor()
Submodules
Module contents
- class sign_language_translator.languages.SignLanguage[source][source]
Bases:
ABCThis abstract class defines the structure and methods required for mapping spoken language text to signs in sign languages using rule-based approaches.
- tokens_to_sign_dicts()[source][source]
Converts tokens to signs based on rules and returns a list of sign dictionaries.
- restructure_sentence()[source][source]
Restructures a sentence by adjusting grammar, dropping meaningless words, and normalizing synonyms.
- _make_equal_weight_sign_dict()[source][source]
Creates a sign dictionary with equal weights for the provided signs.
- class SignDictKeys(value)[source][source]
Bases:
EnumEnumerates all keys that are used in a sign dict.
- SIGNS[source]
key for the ‘signs’ field in the sign dict mapping to list of sequence of video names.
- Type:
str
- abstract restructure_sentence(sentence: Iterable[str], tags: Iterable[Any] | None = None, contexts: Iterable[Any] | None = None) Tuple[Iterable[str], Iterable[Any], Iterable[Any]][source][source]
Restructures a sentence by changing the grammar, removing stopwords, spaces & punctuation, and modifying token contents.
- Parameters:
sentence (Iterable[str]) – Input sentence to be restructured.
tags (Iterable[Any], optional) – Additional tags associated with the sentence. Defaults to None.
contexts (Iterable[Any], optional) – Additional contexts associated with the sentence. Defaults to None.
- Returns:
The restructured sentence, associated tags, and contexts.
- Return type:
Tuple[Iterable[str], Iterable[Any], Iterable[Any]]
- abstract tokens_to_sign_dicts(tokens: Iterable[str], tags: Iterable[Any] | None = None, contexts: Iterable[Any] | None = None) List[Dict[str, List[List[str]] | List[float]]][source][source]
Converts tokens to signs based on rules and returns a list of sign dictionaries.
- Parameters:
tokens (Iterable[str]) – Input tokens to be converted to signs.
tags (Iterable[Any], optional) – Additional tags associated with the tokens. Defaults to None.
contexts (Iterable[Any], optional) – Additional contexts associated with the tokens. Defaults to None.
- Returns:
- A list of sign dictionaries, where each dictionary contains
the ‘signs’ field mapping to a list of sign sequences and the ‘weights’ field mapping to the usage frequency of each sign sequence. e.g. “word” -> [{“signs”: [[sign_1, sign_2], [alternate_1]], “weights”: [10, 5]}, …]
- Return type:
List[Dict[str, List[List[str]] | List[float]]]
- class sign_language_translator.languages.TextLanguage[source][source]
Bases:
ABCBase NLP class for a language.
Subclass it and provide the functionality to tokenize text and classify & disambiguate tokens. Each token should correspond to a sign language clip.
- abstract classmethod allowed_characters() Set[str][source][source]
Returns a set of all allowed characters in the language.
- abstract get_tags(tokens: str | Iterable[str]) List[Any][source][source]
Get the classifications of all tokens in the form of a sequence of tags
- abstract get_word_senses(tokens: str | Iterable[str]) List[List[str]][source][source]
Get all known meanings of the ambiguous words.
- abstract static name() str[source][source]
Returns the name of the language used everywhere else in datasets.
- abstract preprocess(text: str) str[source][source]
Preprocesses text before tokenization. Make sure no different unicode characters are used for the same word. Remove unnecessary symbols, spaces, etc.
- static romanize(text: str, *args, add_diacritics=True, character_translation_table: Dict[int, str] | None = None, n_gram_map: Dict[str, str] | None = None, **kwargs) str[source][source]
Map characters to phonetically similar characters of the English language. Transliteration is useful for readability & simple text-to-speech. First maps (n>1)-grams, then unigrams.
ALA-LC Standardized Romanization Tables (70 languages): https://www.loc.gov/catdir/cpso/roman.html
- Parameters:
text (str) – Non-English text to be mapped to Latin script.
add_diacritics (bool, optional) – Whether to use diacritics over English characters to help pronunciation. (Rules: 1. The under-dot ‘ ̣’ indicates alternate soft/hard pronunciation of the letter. 2. The over-bar/macron ‘ ̄’ means long pronunciation). Defaults to True.
character_translation_table (Optional[Dict[int, str]], optional) – A dictionary mapping unicode of single characters to their latin equivalent. Defaults to None.
n_gram_map (Optional[Dict[str, str]], optional) – A dictionary mapping bigrams, trigrams or more to their latin equivalent. Keys are expected to be regular expressions. Defaults to None.
- abstract tag(tokens: str | Iterable[str]) List[Tuple[str, Any]][source][source]
Classify the tokens and mark them with appropriate tags.
- class sign_language_translator.languages.Vocab(language: str = '.^', country: str = '.^', organization: str = '.^', part_number: str = '.^', data_root_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/sign-language-translator/checkouts/latest/sign_language_translator/assets', arg_is_regex: bool = True, word_sense_regex: str = '\\([^\\(\\)]*\\)')[source][source]
Bases:
objectLoads text datasets for a specific language, country and organization.
Note
Our mapping datasets will only be downloaded automatically if the data_root_dir arg is the same as Assets.ROOT_DIR.
- remove_word_sense(text: str) str[source][source]
Remove the word sense or disambiguation information from given text.
- Parameters:
text (str) – The text from which the word sense needs to be removed.
- Returns:
The word without the word sense or disambiguation information.
- Return type:
str
Example:
word = "this is a spring(metal-coil). those are glasses(water-containers)." without_word_sense = remove_word_sense(word) print(without_word_sense) # Output: "this is a spring. those are glasses."
- sign_language_translator.languages.get_sign_language(language_name: str | Enum) SignLanguage[source][source]
Retrieves a SignLanguage object based on the provided language name.
- Parameters:
language_name (str) – The name of the language.
- Returns:
An instance of SignLanguage class corresponding to the provided language name.
- Return type:
- Raises:
ValueError – If no SignLanguage class is known for the provided language name.
- sign_language_translator.languages.get_text_language(language_name: str | Enum) TextLanguage[source][source]
Retrieves a TextLanguage object based on the provided language name.
- Parameters:
language_name (str) – The name of the language.
- Returns:
An instance of the TextLanguage class corresponding to the provided language name.
- Return type:
- Raises:
ValueError – If no TextLanguage class is known for the provided language name.