sign_language_translator.text.synonyms module
This module provides a SynonymFinder class that can find synonyms of a given text by utilizing translation and back-translation or similarity in embedding vectors.
Dependencies: - deep_translator
- Classes:
SynonymFinder: A class for finding synonyms using translation and similarity methods.
- class sign_language_translator.text.synonyms.SynonymFinder(language: str = 'en')[source]
Bases:
objectThis class provides methods for finding synonyms of a given text using two different approaches: 1. Translation and back-translation through the ‘synonyms_by_translation’ method (requires internet). 2. Embedding-based similarity search through the ‘synonyms_by_similarity’ method.
- language
The target language for translation. Use 2-letter codes (ISO 639-1).
- Type:
str
- translator
The translator object for language translation.
- Type:
GoogleTranslator
- intermediate_languages
List of languages supported by the translator, excluding the current language.
- Type:
List[str]
- embedding_model
The embedding model for similarity-based synonym finding.
- Type:
str
- synonyms_by_translation()[source]
Finds synonyms by translating text into an intermediate language and then back-translation.
Example
# Instantiate SynonymFinder with the target language synonym_finder = SynonymFinder("en") # Find synonyms using translation and back-translation text = "happy" synonyms = synonym_finder.synonyms_by_translation(text) print(f"Synonyms by Translation: {synonyms}") # Find synonyms using similarity based on embedding vectors text = "joyful" synonyms = synonym_finder.synonyms_by_similarity(text) print(f"Synonyms by Similarity: {synonyms}")
- property embedding_model
- property intermediate_languages: List[str]
Returns a list of languages supported by the translator, excluding the current language. They are used to find synonyms by translation and back-translation. These are 2-letter codes (ISO 639-1).
- property language: str
The target language for translation. Use 2-letter codes (ISO 639-1).
- synonyms_by_similarity(text: str, top_k=10, min_similarity=0.5) List[str][source]
Looks into a vector database and returns the closest matches to the input text.
- Parameters:
text (str) – The input text to find synonyms for.
top_k (int, optional) – The maximum number of synonyms to return. Defaults to 10.
min_similarity (float, optional) – Cut off value for similarity between embedding vectors. Words with greater similarity score than this value are returned as synonyms. Defaults to 0.8.
- Returns:
A list of synonyms for the input text.
- Return type:
List[str]
Example
# Instantiate SynonymFinder with the target language synonym_finder = SynonymFinder("ur") # Find synonyms using similarity based on embedding vectors text = "تعلیم" synonyms = synonym_finder.synonyms_by_similarity(text, 3) print(synonyms) # ["تعلیم", "تربیت", "تعلیمی"]
- synonyms_by_translation(text: str, intermediate_languages: List[str] | None = None, min_frequency: int = 1, time_delay: float = 0.01, timeout: float | None = 10, max_n_threads: int = 132, lower_case: bool = True, progress_bar: bool = True, leave: bool = False, cache: Dict[str, Dict[str, str]] | None = None) List[str][source]
Translates the given text into intermediate languages and performs back-translation to obtain synonyms. Translation is done via the internet using web scraping by the deep_translator library.
- Parameters:
text (str) – The text to be translated.
intermediate_languages (Optional[List[str]]) – List of intermediate languages to translate the text into. Use 2-letter codes (ISO 639-1). If None, all supported languages of the translator will be used. Defaults to None.
min_frequency (int) – Minimum occurrence count for synonyms to get considered. Value is inclusive. Defaults to 1.
time_delay (float) – Time delay between translation requests (in seconds). Defaults to 1e-2.
timeout (float | None) – The maximum amount of time (in seconds) to wait for a thread to finish. None means wait indefinitely. Defaults to 10.
max_n_threads (int) – Maximum number of threads to use for parallel translation. Defaults to 128.
lower_case (bool) – Whether to convert the synonyms to lowercase. Defaults to True.
progress_bar (bool) – Whether to display a progress bar during translation. Defaults to True.
leave (bool) – Whether to leave the progress bar after translation. Defaults to True.
cache (Optional[Dict[str, Dict[str, str]]]) – A dictionary to save or retrieve the intermediate translations of the text. Structure is {“text”: {“language”: “translation”, …}, …} where each input maps to a dict mapping language code to the text’s translation. Defaults to None.
- Returns:
A list of synonyms obtained through back-translation from other languages.
- Return type:
List[str]
- translate(text: str, target_language: str) str[source]
Translates the given text to the specified target language.
- Parameters:
text (str) – The text to be translated.
target_language (str) – The target language for translation. Use 2-letter codes (ISO 639-1).
- Returns:
The translated text.
- Return type:
str
- property translator
The deep_translator.GoogleTranslator object with the source language as “auto” and the target language as the __init__ argument or according to the current state.