sign_language_translator.text.tagger module
- class sign_language_translator.text.tagger.Rule(matcher: Callable[[str], bool], tag: Any, priority: int)[source]
Bases:
objectA rule for token classification based on a matching function.
- Parameters:
matcher (Callable[[str], bool]) – A function that takes a token (str) as input and returns a boolean indicating whether the token matches the rule.
tag (Any) – The tag associated with tokens that match the rule.
priority (int) – The priority level of the rule.
- is_match(token
str) -> bool: Checks if the given token matches the rule.
- from_pattern(pattern
str, tag: str, priority: int) -> Rule: Creates a rule from a regular expression pattern, tag, and priority. The created rule will use the pattern to match tokens.
Note
Rules with higher priority are applied first when classifying tokens.
The matcher function should return True if the token matches the rule, and False otherwise.
- class sign_language_translator.text.tagger.Tagger(rules: List[Rule], default=Tags.DEFAULT)[source]
Bases:
objectA tagger that applies a set of rules to classify tokens.
- Parameters:
- tag(tokens
List[str]) -> List[Tuple[str, Any]]: Assigns tags to a list of tokens based on the defined rules. Returns a list of tuples containing the token and its corresponding tag.
- get_tags(tokens
List[str]) -> List[Any]: Retrieves the tags for a list of tokens based on the defined rules. Returns a list of tags corresponding to the input tokens.
Note
- The rules are applied in the order they appear in the list
but higher priority (smaller value) rules overpower.
The default tag is assigned to tokens that do not match any rule.
- class sign_language_translator.text.tagger.Tags(value)[source]
Bases:
EnumEnumeration of token tags used in NLP processing.
- ACRONYM = 'ACRONYM'
- AMBIGUOUS = 'AMBIGUOUS'
- DATE = 'DATE'
- DEFAULT = ''
- END_OF_SEQUENCE = 'EOS'
- NAME = 'NAME'
- NUMBER = 'NUMBER'
- PUNCTUATION = 'PUNCTUATION'
- SPACE = 'SPACE'
- START_OF_SEQUENCE = 'SOS'
- SUPPORTED_WORD = 'SUPPORTED_WORD'
- TIME = 'TIME'
- WORD = 'WORD'
- WORDLESS = 'WORDLESS'