Named-Entity Recognition in Natural Language Processing using spaCy
Named-entity recognition (NER), also known by other names like entity identification or entity extraction, is a process of finding and classifying named entities existing in the given text into pre-defined categories.
The NER task is hugely dependent on the knowledge base used to train the Named-Entity extraction algorithm, so it may or may not work depending upon the provided dataset it was trained on.
spaCy comes with a very fast entity recognition model that is capable of identifying entity phrases from a given document.
Entities can be of different types, such as a person, location, organization, dates, numerals, etc. These entities can be accessed through .ents property of the doc object.
Let’s try to find named-entities by taking some examples with the help of spaCy’s powerful NER tagging capability.
import spacy nlp = spacy.load('en_core_web_sm') doc = nlp(u'Microsoft Corporation is an American multinational technology company with headquarters in Redmond, Washington.') for ent in doc.ents: print(ent.text, ent.label_)
Microsoft Corporation ORG American NORP Redmond GPE Washington GPE
import spacy nlp = spacy.load('en_core_web_sm') doc = nlp(u'Apple Inc. is an American multinational technology company headquartered in Cupertino, California, that designs, develops, and sells consumer electronics, computer software, and online services.') for ent in doc.ents: print(ent.text, ent.label_)
Apple Inc. ORG American NORP Cupertino GPE California GPE
import spacy nlp = spacy.load('en_core_web_sm') doc = nlp(u'I usually wake up at 9:00 AM. 90% of my daytime goes in learning new things.') for ent in doc.ents: print(ent.text, ent.label_)
9:00 AM TIME 90% PERCENT
As we can see, the entity extractor can easily extract the time information from the given string. Also as we can see entity extractor not just tries to identify the number but also the exact PERCENTAGE value.