Named-entity recognition (NER), also known by other names like entity identification or entity extraction, is a process of finding and classifying named entities existing in the given text into pre-defined categories.

The NER task is hugely dependent on the knowledge base used to train the Named-Entity extraction algorithm, so it may or may not work depending upon the provided dataset it was trained on.

spaCy comes with a very fast entity recognition model that is capable of identifying entity phrases from a given document.

Entities can be of different types, such as a person, location, organization, dates, numerals, etc. These entities can be accessed through .ents property of the doc object.

Let’s try to find named-entities by taking some examples with the help of spaCy’s powerful NER tagging capability.

Example 1:

import spacy

nlp = spacy.load('en_core_web_sm')
doc = nlp(u'Microsoft Corporation is an American multinational technology company with headquarters in Redmond, Washington.')

for ent in doc.ents:
  print(ent.text, ent.label_)

Output:

Microsoft Corporation ORG
American NORP
Redmond GPE
Washington GPE

 

 

Example 2:

import spacy

nlp = spacy.load('en_core_web_sm')
doc = nlp(u'Apple Inc. is an American multinational technology company headquartered in Cupertino, California, that designs, develops, and sells consumer electronics, computer software, and online services.')

for ent in doc.ents:
  print(ent.text, ent.label_)

Output:

Apple Inc. ORG
American NORP
Cupertino GPE
California GPE

Example 3:

import spacy

nlp = spacy.load('en_core_web_sm')
doc = nlp(u'I usually wake up at 9:00 AM. 90% of my daytime goes in learning new things.')

for ent in doc.ents:
  print(ent.text, ent.label_)

Output:

9:00 AM TIME
90% PERCENT

As we can see, the entity extractor can easily extract the time information from the given string. Also as we can see entity extractor not just tries to identify the number but also the exact PERCENTAGE value.