sign_language_translator.text.tagger module

class sign_language_translator.text.tagger.Rule(matcher: Callable[[str], bool], tag: Any, priority: int)[source]

Bases: object

A rule for token classification based on a matching function.

Parameters:

matcher (Callable[[str], bool]) – A function that takes a token (str) as input and returns a boolean indicating whether the token matches the rule.
tag (Any) – The tag associated with tokens that match the rule.
priority (int) – The priority level of the rule.

is_match(token: str) -> bool: Checks if the given token matches the rule.

get_tag() → str[source]: Retrieves the tag associated with the rule.

get_priority() → int[source]: Retrieves the priority level of the rule.

from_pattern(pattern: str, tag: str, priority: int) -> Rule: Creates a rule from a regular expression pattern, tag, and priority. The created rule will use the pattern to match tokens.

Note

Rules with higher priority are applied first when classifying tokens.
The matcher function should return True if the token matches the rule, and False otherwise.

static from_pattern(pattern: str, tag: Any, priority: int)[source]

get_priority()[source]

get_tag()[source]

is_match(token: str)[source]

class sign_language_translator.text.tagger.Tagger(rules: List[Rule], default=Tags.DEFAULT)[source]

Bases: object

A tagger that applies a set of rules to classify tokens.

Parameters:

rules (List[Rule]) – A list of Rule objects representing the classification rules. Smaller priority value rules overwrite the others.
default (Tags, optional) – The default tag to assign when no rule matches a token. Defaults to Tags.DEFAULT.

tag(tokens: List[str]) -> List[Tuple[str, Any]]: Assigns tags to a list of tokens based on the defined rules. Returns a list of tuples containing the token and its corresponding tag.

get_tags(tokens: List[str]) -> List[Any]: Retrieves the tags for a list of tokens based on the defined rules. Returns a list of tags corresponding to the input tokens.

Note

The rules are applied in the order they appear in the list
but higher priority (smaller value) rules overpower.
The default tag is assigned to tokens that do not match any rule.

get_tags(tokens: Iterable[str]) → List[Any][source]

tag(tokens: Iterable[str]) → List[Tuple[str, Any]][source]

class sign_language_translator.text.tagger.Tags(value)[source]

Bases: Enum

Enumeration of token tags used in NLP processing.

ACRONYM = 'ACRONYM'

AMBIGUOUS = 'AMBIGUOUS'

DATE = 'DATE'

DEFAULT = ''

END_OF_SEQUENCE = 'EOS'

NAME = 'NAME'

NUMBER = 'NUMBER'

PUNCTUATION = 'PUNCTUATION'

SPACE = 'SPACE'

START_OF_SEQUENCE = 'SOS'

SUPPORTED_WORD = 'SUPPORTED_WORD'

TIME = 'TIME'

WORD = 'WORD'

WORDLESS = 'WORDLESS'