Difference between stemming and lemmatizing and where to use

Anindya Naskar
2 min readAug 23, 2023

--

stemming vs lemmatization
thinkinfi.com

Stemming and Lemmatization is very important and basic technique for any Project of Natural Language Processing.

You have noticed that if you type something on google search it will show relevant results not only for the exact expression you typed but also for the other possible forms of the words you use.

For example, if you have typed “mobiles” in the search bar, it’s likely you want to see results containing the form of “mobile”.

This is done by finding out the root word of a given word. Here “mobile” is the root word of “mobiles”.

This can be done by two possible methods: stemming and lemmatization.

What is Stemming?

Stemming means changing a word to its root form.

Stemming uses some rules, it removes extra parts from words that show how they are used.

For example: Extra parts like: “es”, “ing”, “pre” etc.

Now if you want to do stemming on a word “reading”, it will change it to “read”. Just remove the extra part “ing” from the word which is in the stemming list.

What is Lemmatization?

Lemmatization means changing a word to its root form.

Lemmatization is a proper way of doing things with the help of a word list and word analysis. It looks at the place and type of a word before removing anything.

For word “saw”, stemming may give only “s”, but lemmatization would try to give either “see” or “saw” based on whether the word was used as an action or a thing.

Which one is best: lemmatization or stemming?

Stemming and Lemmatizing are different ways to make words normal.

The difference is that a stemmer works on one word without knowing the situation, and so it cannot tell the difference between words that have different meanings based on how they are used.

Stemming is much quicker than Lemmatizing.

Lemmatizing is much more correct than Stemming.

Python implementation with Explanation:

https://thinkinfi.com/difference-between-stemming-and-lemmatizing-and-where-to-use/

--

--

No responses yet