sign_language_translator.text.tokenizer module

class sign_language_translator.text.tokenizer.SignTokenizer(word_regex: str = '\\w+', compound_words: Iterable[str] = (), end_of_sentence_tokens: Iterable[str] = ('.', '?', '!'), acronym_periods=('.',), non_sentence_end_words: Iterable[str] = ('A', 'B', 'C'), tokenized_word_sense_pattern: List | None = None)[source]

Bases: object

detokenize(tokens: Iterable[str]) str[source]
sentence_tokenize(text: str) List[str][source]
tokenize(text: str, join_compound_words: bool = True, join_word_sense: bool = False) List[str][source]