Part-Of-Speech (POS) Tagging in Natural Language Processing using spaCy
Part-of-speech (POS) tagging in Natural Language Processing is a process where we read some text and assign parts of speech to each word or token, such as noun, verb, adjective, etc.
POS tagging becomes extremely important when we want to identify some entity in the given sentence.
Why is POS tagging needed for chatbots?
POS tagging needed for chatbots to reduce the complexity of understanding a text that can’t be trained or is trained with less confidence. By use of POS tagging, we can identify parts of the text input and do string matching only for those parts. For example, if we were to find if a location exists in a sentence, then POS tagging would tag the location word as NOUN, so you can take all the NOUNs from the tagged list and see if it’s one of the locations from your preset list or not.
Let’s get our hands dirty with some of the examples of real POS tagging.
import spacy nlp = spacy.load('en_core_web_sm') doc = nlp(u'I am learning the basics of natural language processing at Asquero') for token in doc: print(token.text, token.pos_)
I PRON am AUX learning VERB the DET basics NOUN of ADP natural ADJ language NOUN processing NOUN at ADP Asquero PROPN
import spacy nlp = spacy.load('en_core_web_sm') doc = nlp(u'I am going to visit India next week.') for token in doc: print(token.text, token.pos_)
I PRON am AUX going VERB to PART visit VERB India PROPN next ADJ week NOUN . PUNCT
import spacy nlp = spacy.load('en_core_web_sm') doc = nlp(u'Microsoft Corporation is an American multinational technology company with headquarters in Redmond, Washington.') for token in doc: print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_, token.shape_, token.is_alpha, token.is_stop)
Refer to the below table to find out the meaning of each attribute we printed in the above output.
|TEXT||Actual text or word being processed|
|LEMMA||Root form of the word being processed|
|POS||Part-of-speech of the word|
|TAG||They express the part-of-speech (e.g., VERB) and some amount of morphological information (e.g., that the verb is past tense).|
|DEP||Syntactic dependency (i.e., the relation between tokens)|
|SHAPE||The shape of the word (e.g., the capitalization, punctuation, digits format)|
|ALPHA||Is the token an alpha character?|
|STOP||Is the word a stop word or part of a stop list?|