sign_language_translator.text.tagger module

class sign_language_translator.text.tagger.Rule(matcher: Callable[[str], bool], tag: Any, priority: int)[source]

Bases: object

A rule for token classification based on a matching function.

Parameters:
  • matcher (Callable[[str], bool]) – A function that takes a token (str) as input and returns a boolean indicating whether the token matches the rule.

  • tag (Any) – The tag associated with tokens that match the rule.

  • priority (int) – The priority level of the rule.

is_match(token

str) -> bool: Checks if the given token matches the rule.

get_tag() str[source]

Retrieves the tag associated with the rule.

get_priority() int[source]

Retrieves the priority level of the rule.

from_pattern(pattern

str, tag: str, priority: int) -> Rule: Creates a rule from a regular expression pattern, tag, and priority. The created rule will use the pattern to match tokens.

Note

  • Rules with higher priority are applied first when classifying tokens.

  • The matcher function should return True if the token matches the rule, and False otherwise.

static from_pattern(pattern: str, tag: Any, priority: int)[source]
get_priority()[source]
get_tag()[source]
is_match(token: str)[source]
class sign_language_translator.text.tagger.Tagger(rules: List[Rule], default=Tags.DEFAULT)[source]

Bases: object

A tagger that applies a set of rules to classify tokens.

Parameters:
  • rules (List[Rule]) – A list of Rule objects representing the classification rules. Smaller priority value rules overwrite the others.

  • default (Tags, optional) – The default tag to assign when no rule matches a token. Defaults to Tags.DEFAULT.

tag(tokens

List[str]) -> List[Tuple[str, Any]]: Assigns tags to a list of tokens based on the defined rules. Returns a list of tuples containing the token and its corresponding tag.

get_tags(tokens

List[str]) -> List[Any]: Retrieves the tags for a list of tokens based on the defined rules. Returns a list of tags corresponding to the input tokens.

Note

  • The rules are applied in the order they appear in the list

    but higher priority (smaller value) rules overpower.

  • The default tag is assigned to tokens that do not match any rule.

get_tags(tokens: Iterable[str]) List[Any][source]
tag(tokens: Iterable[str]) List[Tuple[str, Any]][source]
class sign_language_translator.text.tagger.Tags(value)[source]

Bases: Enum

Enumeration of token tags used in NLP processing.

ACRONYM = 'ACRONYM'
AMBIGUOUS = 'AMBIGUOUS'
DATE = 'DATE'
DEFAULT = ''
END_OF_SEQUENCE = 'EOS'
NAME = 'NAME'
NUMBER = 'NUMBER'
PUNCTUATION = 'PUNCTUATION'
SPACE = 'SPACE'
START_OF_SEQUENCE = 'SOS'
SUPPORTED_WORD = 'SUPPORTED_WORD'
TIME = 'TIME'
WORD = 'WORD'
WORDLESS = 'WORDLESS'