nltk

POS Tagging

Introduction#

Part of speech tagging creates tuples of words and parts of speech. It labels words in a sentence as nouns, adjectives, verbs,etc. It can also label by tense, and more. These tags mean whatever they meant in your original training data. You are free to invent your own tags in your training data, as long as you are consistent in their usage. Training data generally takes a lot of work to create, so a pre-existing corpus is typically used. These usually use the Penn Treebank and Brown Corpus.

Remarks#

Important points to note

  • The variable word is a list of tokens.
  • Even though item i in the list word is a token, tagging single token will tag each letter of the word.
  • nltk.tag.pos_tag_ accept a
    • list of tokens — then separate and tags its elements or
    • list of string
  • You can not get the tag for one word, instead you can put it within a list.
  • POS tag

Basic Example

import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
text = 'We saw the yellow dog'
word = word_tokenize(text)
tag1 = nltk.pos_tag(word)
print(tag1)

This modified text is an extract of the original Stack Overflow Documentation created by the contributors and released under CC BY-SA 3.0 This website is not affiliated with Stack Overflow