How to build your own NLP for chatbots
Introduction
When building a chatbot, one of the most important parts is the NLP (Natural Language Processing), that allows us to understand what the user wants and match it into an intent (action) of our chatbot.
This part that is really important, usually becomes a “black box” provided by a third party like Google DialogFlow, Microsoft LUIS or IBM Watson, that also means to lost privacy of our clients. There is a lot of unknows around those “black boxes” and how they work internally, and this causes fear of implementing one in-house.
Through this article you’ll learn theory, and later you’ll build your own NLP. You’ll learn how to measure whether it’s performing well or not. Source code is included and runnable on the cloud directly on CodeSandbox’s website, so you can fork every experiment and play with the code.
Techniques
When we talk about NLP, we are talking about a huge field of Artificial Intelligence, with lots of techniques like LSTM, word2vec, fasttext or PoS (Part of Speech) for the NLU (Natural Language Understanding).
In our case we will implement a multiclass classifier using a neural network. Let’s go step by step.
What is a multiclass classifier
In NLP there are tons of techniques. In our case we want to implement a multiclass classifier. What does it means?
A classifier, in Artificial Intelligence, is what given an input can classify it into the best class (or label), the class that match better the input. Those classes must be a discrete set, something that can be enumerated, like the colors of the rainbow, and not continuous like a real number between 0 and 1.
When a classifier has only two classes, is called binary. Example: if we have a classifier that given an email text returns if it’s “Spam” or “Not Spam”, then is a binary classifier.
When a classifier has several classes, is called multiclass classifier.
How a Multiclass Classifier works
There are several ways to implement it, but let’s first understand what a perceptron is:
A perceptron is a unit that given an input vector, every input element is multiplied by a real number called “weight”, and the perceptron sums all the inputs multiplied by their weights, and sum also the bias. Finally the result is passed to an activation function. Is the basis of neural networks, and a process called backpropagation is the responsible of choosing the weights and the bias. So for each perceptron you’ll have n+1 variables, where n is the number of elements of the input.
In a multiclass classifier you’ll have the same scenario but at least one perceptron by class, so the mininum number of variables to be calculated is (n + 1) * c, being n the number of elements of the input, and c the number of diferent classes.
What are the classes in an NLP
In the NLP the input is the sentence from the user, what are the classes? The classes are the intents, i.e., the actions that the chatbot can perform or has an answer for. Example:
There are two classes: Greet and Travel.
How to build the input for an NLP
In the previous example we have different sentences matched with their intents, so now we now that the classes are the intents. But what about the input?
Our input of the neural network is composed by “features”, that must be numeric. Let’s say that the possible words of the inputs, those words seen in the training examples, are the features, and that to convert to numeric let’s assume that will be 0 if the word is not in the sentence, and 1 if the word is in the sentence. Example, the sentence “I need holidays”:
So we have 12 features, one per word. Each blue line represents the weight to the class “Greet”, each green line represents the weight to the class “Travel”.
How to mesure the success
Before developing whe need a method to measure the success and compare with other NLPs. We will base in the paper SIGDIAL22, that propose the use of 3 different corpus: Chatbot transportation in Munich, Ask Ubuntu and Web Applications. Some examples of each one:
Chatbot: i want to go marienplatz (FindConnection); when is the next train in muncher freiheit? (DepartureTime)
AskUbuntu: What software can I use to view epub documents? (Software Recommendation); What does my computer do when I click ‘Shut Down’? (Shutdown computer)
WebApplication: Alternative to Facebook (Find Alternative); How do I delete my Facebook account? (Delete Account)
To measure it I created the node package evaluate-nlp, that will be used during the exercise, and contains the corpus of the paper as well as the already obtained metrics from the other providers.
We will train, as is written in the paper, only with those sentences for training, and we will test with the sentences that are not for training.
Tokenizer
The first step will be to build a tokenizer: something to split the sentence into words. We will do it in a simple way, by simply split with a regular expression for the whitespaces.
Here you have the example using codesandbox, that will allow us to see the code and also execute it in the cloud:
Developing the Neural Network
To develop the neural network we will use brain.js, that allows to develop classifiers in a simple way and with good enough performance. Tensorflow.js can be used but the code will be more complex for the same result.
Let’s start with a function that creates the neural network:
The main change with the default creation, is that brain.js puts 2 hidden layers by default, but in our case the hidden layers performs worst.
Now we will need to train our neural network. The more complex point here is how to transform our inputs into the format that brain.js expect, that is an array of objects in this format:
First we will create a function “utteranceToFeatures” than given a text (the utterance) will return the features object as the input of the example. The method chain is to build a pipeline of functions, and featuresToDict converts an array of features to the object format.
And now the train function that given the neural network, the inputs (xs) and the classes (ys) will return the trained network.
Last step is to build the function predict, that given a neural network and an input, returns the prediction, that will be one number for each class, greater numbers means more probability to be this class.
So now we have a first version where the NLP is only 13 lines of code. Let’s check how it performs:
The results:
Is not so bad for a first iteration! Is still worst that all providers, because is very bad for the Web Application corpus, but is scoring better than DialogFlow for Chatbot Corpus, and is at the middle of the table for Ask Ubuntu.
Trim and lowercase
First step to improve: sometimes we get an empty string token, so let’s remove it, and we are not doing lowercase of the tokens, so “Developer” is not the same than “developer” for our system. So let’s implement a trim and lowercase and add them to the pipeline for the input:
New results:
We see and improvement in Web Application but not in other corpus. This is normal because Chatbot and Ask Ubuntu already gives you the sentence in lowercase. On the other hand, our method surpased DialogFlow and RASA in the overall!
Improving the Neural Network
Next step is to improve the neural network:
By default the activation function is sigmoid, but the best for our problem is tanh or leaky ReLU
We well change the error threshold from 0.005 to 0.00005, and the learning rate from 0.3 to 0.1.
New Results:
We see that we got into first position in Chatbot corpus, still average in Ask Ubuntu, and starts to grow in Web Application. In overall, DialogFlow, RASA and Recast are surpassed, but with one handicap: our method still does not takes into account the language!!!
Stemmer
To take into account the language, usually we want to know the lemma of a word, but usually this means to have a big dictionary for this calculation. But for calculating the stem of a word there are algorithms that are not perfect, but are good enough.
We will use the PorterStemmer of the library Natural, so for each token we will calculate the stem, and the feature will be the stem and not the token:
New results:
So right now our method is the best in Chatbot corpus, best in Ask Ubuntu, and second in Web Application, and first in the overall, using only 23 lines of code.
Entity augmentation
Rigth now the other providers are using an advantage that our system is not still taking profit of it: other providers are trained taking into account the entities. Let’s see an example:
This is one of the training sentences, and we see the entities section where we have the information that “TV Tropes” is the entity “WebService”. So now, during the training, if a phrase has entities, we will do an augmentation: we will generate a different sentence for the training replacing the entities values by the entity name.
Now we have to modify the training so it calls “augment” and receive an array of sentences for the input:
The sandbox with the source code:
New results:
So with this change our method is first in every category, with a 0.934 of Overall score!
Synonyms
The last improvement is about synonyms. If we analyze the sentences that are not classified properly, we see that some of them use words not seem at the training because at training time other synonyms are used. Example: “remove” instead of “delete”, “junk” instead of “spam” or “transfer” instead of “export”.
If we provide a map of synonyms, and we calculate the stems of each one, then we can use this dictionary for replace stems by their synonym stem when calculating the features.
New pipeline for features genration:
Source code sandbox:
Results:
And we see that we broke all the records!
Conclusión
Following a logical process, with neural networks, and an small amount of lines of code, we can understand the natural language very well for Chatbot environments. But the big question is, how it’s possible that big companies with huge amounts of budget and engineers does not perfom as well as something that you can develop in-house in one day? In my humble opinion, this is because:
- Their main objective is not to be the best, but “good enough”
- They put more focus in other features
- When you use your own in-house NLP, you’re your own client, so all your resources like RAM or CPU are for you. When you use a platform like Google DialogFlow, IBM Watson or Microsoft LUIS, we are talking about thousands of clients sharing the same platform, so they want to minimize the resources needed for each client in terms of RAM and CPU.