Word similarity matching in Python
Finding words that are similar is very important for cleaning or analysing text.
Suppose your text has many spelling mistakes for proper nouns like names, places etc. and you want to make all similar names or places in a standard form.
You can use Soundex algorithm to find similarity between two words that sound alike.
Soundex is a way of finding words that have similar sounds.
When you give a word, like a person’s name, to Soundex, it gives you a string of characters that shows a group of words that sound (more or less) the same.
The Soundex method is based on six types of sounds that humans make with their lips and tongue (bilabial, labiodental, dental, alveolar, velar, and glottal). These depend on how you place your lips and tongue to make the sounds.
Let’s test something in python. Note: I have used fuzzy==1.1 version
# Text to process
word = 'phone'
soundex(word)
Output: ‘P500000000’
Here P is for first letter of word ‘phone’
Now if someone misspelled ‘phone’ to ‘fone’ let’s see if Soundex can identify or not.
word = 'fone'
soundex(word)
Output: ‘F500000000’
You can see apart from first character phonetic characters are same for both ‘phone’ and ‘fone’. So we can use Soundex algorithm to solve this kind of problems.
If you want to learn how Soundex algorithm works with Python code implementation, Read this article.