nltk

Tokenizing

Introduction#

It refers to the splitting of sentences and words from the body of text into sentence tokens or word tokens respectively. It is an essential part of NLP, as many modules work better (or only) with tags. For example, pos_tag needs tags as input and not the words, to tag them by parts of speech.

Sentence and word tokenization from user given paragraph

from nltk.tokenize import sent_tokenize, word_tokenize
example_text = input("Enter the text:  ")

print("Sentence Tokens:")
print(sent_tokenize(example_text))

print("Word Tokens:")
print(word_tokenize(example_text))

This modified text is an extract of the original Stack Overflow Documentation created by the contributors and released under CC BY-SA 3.0 This website is not affiliated with Stack Overflow