Natural language processing for chatbots

Building a Pipeline in Rasa for Training

Less than 500 views Posted On Aug. 27, 2020

Prerequisite: Introduction to Building Chatbots using Rasa NLU

Installing Rasa

To install Rasa, run the following pip command (pip3 in case of python3).

pip3 install rasa-nlu

Rasa NLU has multiple components for classifying intents and recognizing entities.

Different components of Rasa have their own sets of dependencies. When we train our model, Rasa NLU checks that all the required dependencies are installed.

If any of the required dependency is found missing, when training the model, the Rasa NLU will immediately throw a "module not found" error.

Rasa requires TensorFlow version 1.13.1. So you need to make sure that this requirement is satisfied.

It is recommended that you create a virtual environment and make your chatbot inside that environment.

 

 

Pipeline in Rasa

A pipeline is nothing but a set of algorithms that we need to use to train our NLP model.

Rasa NLU has two commonly used pipelines called spacy_sklearn and tensorflow_embedding.

Let us know a bit about both of them.

spacy_sklearn
  • spacy_sklearn pipeline makes use of pre-trained word vectors from either the GloVe algorithm or an algorithm developed by the Facebook AI-team called fastText.
  • spacy_sklearn works amazingly well in situations where, suppose you have an utterance like, “What is the weather in Mumbai?” When we train our model on the same utterance example and then ask it to predict the intent for, “What is the weather in Delhi?” our model is now intelligent enough to know that both the words “Mumbai” and “Delhi”, are similar, and they belong to the same intent.
  • This pipeline is very useful with small sets of data.

 

 

tensorflow_embedding
  • tensorflow_embedding pipeline doesn’t make use of any pre-trained word vectors like spacy_sklearn, but it adjusts itself as per our own provided dataset.
  • The good thing about the tensorflow_embedding pipeline is that our word vectors will be as per our domain.
  • To explain how tensorflow_embedding works, in the English language, the word “bat” may be closely related to “cricket bat” or “an animal”. In a cricket domain, “bat” and “cricket bat” are closely related, where “bat” means “the cricket bat, the batsman holds” and it is very necessary to tell our model to learn specific to our domain and not get confused due to some pre-trained model.
Share this tutorial with someone who needs it

What are your thoughts?