How to build a chatbot – Part 1

We decided to make a beginner’s guide on how to create a chatbot based entirely on Natural Language Processing.

As it turns out, AI is starting to truly become mainstream and 2017 looks like it’s going to be full of new technologies and platforms. During the past year we’ve seen many new companies dealing with this, including the grand opening of Open AI (which has already shipped some pretty interesting papers), quite a few startups on the subject, a huge amount of FUD concerning self-driving cars and, of course, the rise of the mighty chatbot.

Just like it happened with the popularization of countless technologies before, there’s always people who buy way too much into the hype and out come the tinfoil hats. There’s already talk of how AI is making humans obsolete and that we’re like two days away from giant floating heads in the sky demanding we show them what we got.

rick-and-morty

The reality, however, is somewhat different. While we are seeing some truly impressive technologies and platforms in this space, it’s important to remember that AI is a (relatively) young field and it’s quite easy to expect a day-old JS library to do things that aren’t actually possible, which just leads you on to build crappy products.

This brings me, of course, to Chatbots. These fellas are becoming popular thanks to Slack, Facebook, and Telegram opening up their platforms, as well as companies like Wit.ai and Api.ai that handle the secret NLP (Natural Language Processing) sauce we need to make these things work properly.

Our experience

At Aerolab we’re always experimenting with new stuff and figuring out how technology influences user experience. So, of course, we wanted to figure out how to take advantage of the new paradigm that comes with chatbots: The Conversational UI. 

Whenever we design a product, we always try to understand the users and their mental models, how they think and interpret their world. We then use this knowledge to build an interface and, if we did our job correctly, our users would find our app easy to use.

This is more an art than a science, but the important thing is that during the last 30 years we’ve gotten used to having full control of the UI. On a traditional app we know exactly which buttons went into the product and what they do, because we deliberately put them there. There’s very little unpredictability from the development side of things, and the worst thing that can happen is that the user can’t figure out how to use one of our features.

Conversational UIs completely turn the tables on this idea. We are now working with language, which is not only unpredictable, but also extremely difficult for computers to understand, even when using state-of-the-art AI.

All of this gives us significantly less control and room for error than what we are used to, which means we have to try and work around these limitations from the very first second of our design process.

Trying to work with this new paradigm has some very specific challenges, so after a lot of tests and experiments, we decided to make a series of articles to give you a beginner’s guide on how to design and create a product based entirely on Natural Language Processing (NLP), which we believe is setting us on a path towards user experiences that are even more tightly integrated in our lives.

bot-800x600

Designing with Stories

A Conversational UI is a way to interact with a computer primarily by using language, either written or spoken. Typical examples of Conversational UIs are Siri, Google Now and Cortana, which are the platforms that let you ask your phone for simple things using your voice. There’s also a multitude of chatbots in Messenger, Telegram, Slack and other platforms that perform similar tasks, except via text. For the time being, we are going to focus on text-based chatbots, since they are both the most popular and accessible for new developers.

The problem with conversations is that they are extremely unpredictable. If you ever worked on an app, you probably know that unpredictability is the sworn enemy of developers and designers alike. So the first thing we need to do here is try to turn these conversations into smaller, bite-sized pieces that we can actually work with, which brings me to the main design concept behind Conversational UIs: Conversational Stories.

Stories are wireframes for conversational design: A story is a small piece of a conversation that covers a basic exchange with a user, in which we try to understand what the user intends to do so that we can provide an appropriate response. We first saw this concept on wit.ai and we believe it’s a beautiful way to think about conversations in a somewhat visual manner.

The idea behind this is that we can’t really script a full conversation with a user and allow for any sort of variation (unless we are doing a carefully rehearsed demo or have some astonishing technology behind our product). However, what we can (and should) do is to split a long, complex conversation into many smaller exchanges, like handling greetings, answering a simple question, saying goodbye, and other small conversational fragments which we can design and tweak very carefully.

By adding many of these self-contained stories, we can help the bot handle long conversations a little bit at a time and, if we do this correctly, we’ll be able to build a product that feels solid and well designed, even during long sessions.

This is a new concept, so it’s better if we start with a few examples. Here’s a simple conversation:

User: Hi!
Chatbot: Hey there! What can I help you with?

The worst possible thing you can do is to start by typing something like this:
if( userSays === ‘Hi!’ ) { send(‘Hey there! What can I help you with?’); }

The reason this is a terrible thing to do is that language is a very malleable thing, which means that our users will come up with a seemingly infinite amount of ways to say everything. Trying to process conversations simply by comparing text is -at best- extremely naïve.

However, there’s a way to deal with this. By using Natural Language Processing (NLP) we can ask an AI to figure out what the user intended to say. The question we need to ask isn’t “what did the user type?”, it’s “what did he mean?”. Going back to our previous example, we can go a bit deeper into the meaning of the user’s message and classify their intent as a greeting:

User: Hi! [Intention: Greeting]
Chatbot: Hey there! What can I help you with? [Response to a Greeting]

Our stories aren’t built around«The user says Hi», but around «The user is greeting us». The reason behind this is that there are multiple ways to greet someone (“Hey!”, “Hello”, “Sup’”, “Yo”), but all of these sentences mean the exact same thing: Hi!

So, for instance, we can apply the exact same conversational story to a very different message:

User: Aloha! [Intention: Greeting]
Chatbot: Hey there! What can I help you with? [Response to a Greeting]

Since we are still receiving a greeting, we can safely give the same response, without worrying if the user said Hello, Hi, Hey, or any other text. By utilizing a solid NLP engine we will ensure that all of these ways to say “Hi!” are treated in the exact same way, which makes our Chatbot a lot smarter and flexible, even in the face of unpredictable input.

Thinking in terms of intents is a wonderful way to make our lives easier, because instead of comparing strings, we can take these randomly complex strings and feed them to an AI, which will classify the message and extract all of the necessary information from the text, from the main intention of the user to dates and locations.

give-me-legs-1

Dealing with different tones

But what if we were to try out a more innovative prompt?

User: I hereby give you the warmest regards from the Queen of France
Chatbot: Hey there! What can I help you with?

While I’m not sure France has a Queen (Gabrielle Gatti says french people aren’t really into the whole royalty thing), in this case the answer doesn’t exactly fit the ridiculously formal tone the user is using. Normally we can just classify it as a greeting and be done with it, but if we want to add an easter egg (or just change the tone as needed), we can to handle this in a different way than a normal greeting.

This means we have to design a new story, and we start by classifying this kind of phrase as a different intent, a Formal Greeting. This way we can distinguish between both and be able to respond appropriately:

User: I hereby give you the warmest regards from the Queen of France [Intention: Formal Greeting]
Chatbot: I express my deepest desire to help you, my liege [Response to a Formal Greeting]

Thinking about the messages we receive in terms of intents, and using a solid NLP engine makes the implementation of our Chatbot a lot easier. Stories are a powerful design tool, and they help us split conversations into smaller components, which are easy to code, understand and test.

While the chatbot ecosystem is quite new, this is one of the most promising design paradigms to come out in the last few years. And thanks to wit.ai, api.ai and other NLP platforms, we can start working on a whole new category of products, which offer a completely different way to think about our products.

In the coming articles, we’ll focus on these new elements, going deeper into Machine Learning, Prototyping and the cool new Tools we have at our disposal as product designers.

Stay tuned!