sign_language_translator.text.synonyms module

This module provides a SynonymFinder class that can find synonyms of a given text by utilizing translation and back-translation or similarity in embedding vectors.

Dependencies: - deep_translator

Classes:
  • SynonymFinder: A class for finding synonyms using translation and similarity methods.

class sign_language_translator.text.synonyms.SynonymFinder(language: str = 'en')[source]

Bases: object

This class provides methods for finding synonyms of a given text using two different approaches: 1. Translation and back-translation through the ‘synonyms_by_translation’ method (requires internet). 2. Embedding-based similarity search through the ‘synonyms_by_similarity’ method.

language

The target language for translation. Use 2-letter codes (ISO 639-1).

Type:

str

translator

The translator object for language translation.

Type:

GoogleTranslator

intermediate_languages

List of languages supported by the translator, excluding the current language.

Type:

List[str]

embedding_model

The embedding model for similarity-based synonym finding.

Type:

str

synonyms_by_translation()[source]

Finds synonyms by translating text into an intermediate language and then back-translation.

synonyms_by_similarity()[source]

Finds synonyms based on embedding vector similarity.

translate()[source]

Translates text to the specified target language.

Example

# Instantiate SynonymFinder with the target language
synonym_finder = SynonymFinder("en")

# Find synonyms using translation and back-translation
text = "happy"
synonyms = synonym_finder.synonyms_by_translation(text)
print(f"Synonyms by Translation: {synonyms}")

# Find synonyms using similarity based on embedding vectors
text = "joyful"
synonyms = synonym_finder.synonyms_by_similarity(text)
print(f"Synonyms by Similarity: {synonyms}")
property embedding_model
property intermediate_languages: List[str]

Returns a list of languages supported by the translator, excluding the current language. They are used to find synonyms by translation and back-translation. These are 2-letter codes (ISO 639-1).

property language: str

The target language for translation. Use 2-letter codes (ISO 639-1).

synonyms_by_similarity(text: str, top_k=10, min_similarity=0.5) List[str][source]

Looks into a vector database and returns the closest matches to the input text.

Parameters:
  • text (str) – The input text to find synonyms for.

  • top_k (int, optional) – The maximum number of synonyms to return. Defaults to 10.

  • min_similarity (float, optional) – Cut off value for similarity between embedding vectors. Words with greater similarity score than this value are returned as synonyms. Defaults to 0.8.

Returns:

A list of synonyms for the input text.

Return type:

List[str]

Example

# Instantiate SynonymFinder with the target language
synonym_finder = SynonymFinder("ur")

# Find synonyms using similarity based on embedding vectors
text = "تعلیم"
synonyms = synonym_finder.synonyms_by_similarity(text, 3)
print(synonyms)
# ["تعلیم", "تربیت", "تعلیمی"]
synonyms_by_translation(text: str, intermediate_languages: List[str] | None = None, min_frequency: int = 1, time_delay: float = 0.01, timeout: float | None = 10, max_n_threads: int = 132, lower_case: bool = True, progress_bar: bool = True, leave: bool = False, cache: Dict[str, Dict[str, str]] | None = None) List[str][source]

Translates the given text into intermediate languages and performs back-translation to obtain synonyms. Translation is done via the internet using web scraping by the deep_translator library.

Parameters:
  • text (str) – The text to be translated.

  • intermediate_languages (Optional[List[str]]) – List of intermediate languages to translate the text into. Use 2-letter codes (ISO 639-1). If None, all supported languages of the translator will be used. Defaults to None.

  • min_frequency (int) – Minimum occurrence count for synonyms to get considered. Value is inclusive. Defaults to 1.

  • time_delay (float) – Time delay between translation requests (in seconds). Defaults to 1e-2.

  • timeout (float | None) – The maximum amount of time (in seconds) to wait for a thread to finish. None means wait indefinitely. Defaults to 10.

  • max_n_threads (int) – Maximum number of threads to use for parallel translation. Defaults to 128.

  • lower_case (bool) – Whether to convert the synonyms to lowercase. Defaults to True.

  • progress_bar (bool) – Whether to display a progress bar during translation. Defaults to True.

  • leave (bool) – Whether to leave the progress bar after translation. Defaults to True.

  • cache (Optional[Dict[str, Dict[str, str]]]) – A dictionary to save or retrieve the intermediate translations of the text. Structure is {“text”: {“language”: “translation”, …}, …} where each input maps to a dict mapping language code to the text’s translation. Defaults to None.

Returns:

A list of synonyms obtained through back-translation from other languages.

Return type:

List[str]

translate(text: str, target_language: str) str[source]

Translates the given text to the specified target language.

Parameters:
  • text (str) – The text to be translated.

  • target_language (str) – The target language for translation. Use 2-letter codes (ISO 639-1).

Returns:

The translated text.

Return type:

str

property translator

The deep_translator.GoogleTranslator object with the source language as “auto” and the target language as the __init__ argument or according to the current state.