The Conversation Model in Interaction Design

As you can probably tell by the title of the blog, Savant is a program that lets users interact with it in natural language and accomplish a large variety of tasks. However, I am guilty of coming up with a title that is quite misleading. Natural Language Processing is a very difficult problem in computer science and I am, by no means, claim to have solved it. The syntax for Savant is quite flexible and allows users to specify arbitrarily complex queries. The rules for Savant’s grammar are very intuitive, and occasionaly form gramatically correct English sentences. But my no means it is a complete natural language parser. For example, you could tell Savant, “move doc files to new folder Spring09″. But you couldn’t say, “move doc files to new folder called Spring09″, or “move files with doc extension to a new folder called Spring09″.

One can argue that all this is syntactic sugar and the grammar can be easily extended to parse these commands. But is the right approach to solve the problem? Here are new a couple of other ways we could go about to tackle this issue. The first method involves filtering non-essential words. Note that in two alternate sentences in the above example, the main key words were already there. Just by filtering out a few words, we can realize the original intent of the command. The second method involves richer interaction with the user as he is typing in the command. As the user types ‘mov’ we can lookup a list of commands and find out the list of commands that start with that prefix. Then for each one of those commands syntax suggestions can appear to guide the user to input his command in a way that the program understands. A positive feedback loop can be started by rewarding the user with auto-completion and detailed syntax break-down. On the contrary, the program can stop suggesting when the user starts going off track, creating a negative feedback. This helps in keeping the core parser simple and still allow flexibility for the user.

If you stop to think for a minute and imagine Savant as a human being, the latter method sounds like a dialogue between two people. The positive/negative feedback is something we constantly provide with various facial expressions. Think about it, how many times has your teacher re-explained a topic after seeing a blank face? How many times have you had the other person complete the word that you just couldn’t think of? Can you see how these real life scenarios have direct mappings to the interaction between man and software. In fact, I believe any kind of human-computer interaction can be explained in terms of a conversation between two people of different languages.

Here are two scenarios to illustrate my point. Think of the command line from UNIX or DOS. To be able to use the terminal you need to memorize the names of the commands and the correct sequence of parameters they take as arguments. So, how does this scenario look in the Conversation Model? It’s similar to an Englishman learning Japanese to talk to person from Japan, or vice versa. Now, learning Japanese can be a very hard prospect. Especially for people who are geographically far apart from Japan. But it is no doubt that the best way to communicate with a Japanese person is in Japanese. That’s the language that he is most comfortable and you can express very complicated ideas, and he’d still be able to understand. This is the case with learning the command line way. The learning curve is tremendous. But those who have mastered it are at the peak of productivity and can express intricate tasks with very concise notation.

The second scenario is of the point-and-click interfaces that we are so used to these days. To make another analogy, this is similar to Sign Language. By sign language I dont mean languages like ASL, but the most basic hand gestures and facial expressions. The beauty of these gestures are they are universal. Irrespective of what language the other person speaks, you can always point at your wrist and indicate that you want to know the time, or nod your sideways to say no. The drawback with this system that it only works with basic things. Try asking somebody the directions to the closest museum using sign language. It is apparent that it is not the most of expressive of language. Some tasks were never meant to be described using sign language, unless you begin a new convention involving careful hand gestures. The problem with GUI is precisely this. Point-and-click makes sense for a handful of things. For others, it becomes a long repetitive process of pressing boxes and ticking checboxes.

This brings us back to our conversation with Savant. The motivation of the project was to not require the user to memorize Savant’s vocabulary. Instead, Savant could master the human language, which would require solving the natural language processing problem. Or, the other option for Savant would be to have a conversation with the user and work with him to come up with something that it can carry out as required. Making this conversation efficient does not require a technological break-through. However it does require pro-active involvement from Savant’s part to guide the user in typing his command.