Stemming
Introduction#
Stemming is a sort of normalizing method. Many variations of words carry the same meaning, other than when tense is involved. The reason why we stem is to shorten the lookup, and normalize sentences. Basically, it is finding the root of words after removing verb and tense part from it. One of the most popular stemming algorithms is the Porter stemmer, which has been around since 1979.
Porter stemmer
-
Import
PorterStemmer
and initializefrom nltk.stem import PorterStemmer from nltk.tokenize import word_tokenize ps = PorterStemmer()
-
Stem a list of words
example_words = ["python","pythoner","pythoning","pythoned","pythonly"] for w in example_words: print(ps.stem(w))
Result:
python python python python pythonli
-
Stem a sentence after tokenizing it.
new_text = "It is important to by very pythonly while you are pythoning with python. All pythoners have pythoned poorly at least once." word_tokens = word_tokenize(new_text) for w in word_tokens: print(ps.stem(w)) # Passing word tokens into stem method of Porter Stemmer
Result:
It is import to by veri pythonli while you are python with python . all python have python poorli at least onc .