Part-of-speech (POS) tagging in Natural Language Processing is a process where we read some text and assign parts of speech to each word or token, such as noun, verb, adjective, etc.

POS tagging becomes extremely important when we want to identify some entity in the given sentence.

Why is POS tagging needed for chatbots?

POS tagging needed for chatbots to reduce the complexity of understanding a text that can’t be trained or is trained with less confidence. By use of POS tagging, we can identify parts of the text input and do string matching only for those parts. For example, if we were to find if a location exists in a sentence, then POS tagging would tag the location word as NOUN, so you can take all the NOUNs from the tagged list and see if it’s one of the locations from your preset list or not.

Let’s get our hands dirty with some of the examples of real POS tagging.

Example 1:

import spacy

nlp = spacy.load('en_core_web_sm')
doc = nlp(u'I am learning the basics of natural language processing at Asquero')

for token in doc:
   print(token.text, token.pos_)

Output:

I PRON
am AUX
learning VERB
the DET
basics NOUN
of ADP
natural ADJ
language NOUN
processing NOUN
at ADP
Asquero PROPN

 

 

Example 2:

import spacy

nlp = spacy.load('en_core_web_sm')
doc = nlp(u'I am going to visit India next week.')

for token in doc:
   print(token.text, token.pos_)

Output:

I PRON
am AUX
going VERB
to PART
visit VERB
India PROPN
next ADJ
week NOUN
. PUNCT

Example 3:

import spacy

nlp = spacy.load('en_core_web_sm')
doc = nlp(u'Microsoft Corporation is an American multinational technology company with headquarters in Redmond, Washington.')

for token in doc:
   print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_, token.shape_, token.is_alpha, token.is_stop)

Output:

Text Lemma POS Tag Dep Shape Alpha Stop
Microsoft Microsoft PROPN NNP compound Xxxxx True False
Corporation Corporation PROPN NNP nsubj Xxxxx True False
is be AUX VBZ ROOT xx True True
an an DET DT det xx True True
American american ADJ JJ amod Xxxxx True False
multinational multinational ADJ JJ amod xxxx True False
technology technology NOUN NN compound xxxx True False
company company NOUN NN attr xxxx True False
with with ADP IN prep xxxx True True
headquarters headquarter NOUN NNS pobj xxxx True False
in in ADP IN prep xx True True
Redmond Redmond PROPN NNP pobj Xxxxx True False
, , PUNCT , punct , False False
Washington Washington PROPN NNP appos Xxxxx True False
. . PUNCT . punct . False False

 

 

Refer to the below table to find out the meaning of each attribute we printed in the above output.

TEXT Actual text or word being processed
LEMMA Root form of the word being processed
POS Part-of-speech of the word
TAG They express the part-of-speech (e.g., VERB) and some amount of morphological information (e.g., that the verb is past tense).
DEP Syntactic dependency (i.e., the relation between tokens)
SHAPE The shape of the word (e.g., the capitalization, punctuation, digits format)
ALPHA Is the token an alpha character?
STOP Is the word a stop word or part of a stop list?