AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

The connectionist approach.
 
 

I’m going to go over some thoughts on bots and AI that might offend some people. None of this is meant to diminish or belittle any of the excellent work that has been done on bots or bot systems by any developers so far. These are just observations, and I hope, a call to action for new paths in the field.

I’ve spent a lot of time over the last several months thinking and coding and trying different things, trying to gain traction on the problem of depth in chat. AIML and other pattern/trigger based bots are nothing more than puppets with very clever strings. There’s no way to incorporate real depth into those conversation without adding more clever strings, and at some point, the complexity of the system exceeds the capacity of the computer to process or store it.

Combinatorial explosion happens when you try to automate patterned response bots. This is the sad, but true reality we keep butting our heads against, trying to find some holy grail variation of AIML that will produce a believable Turing level bot.

It’s not possible. Here’s why. Let’s take a controlled language, like Basic English and define some patterns and responses. Of 800 words, let’s say our maximum number of words per pattern is around 8. Assuming repetition is allowed, there are around 167,772,160,000,000,000,000,000 , or roughly 167 septillion 8 word combinations. Obviously Natural Language isn’t as random as that, but those combinations represent the borders of the first dimension of the problem space.

The second dimension of the problem space represents the response to the 8 word input. Now you have a 2 dimensional plane upon which a given point represents an input/response pair.

This is as far as most ELIZA type bots go. They restrict the problem space to a toy domain, and perform the function equivalent of a lookup based on the interpretation of the point in that problem space. The algorithm itself is essentially a 2 dimensional array lookup function.

This algorithm has no nuance, and has no concept of higher order relations. No matter how cleverly you manipulate your search function, you’re still only working with 2 dimensions, and this is what I refer to as the absence of depth.

In order to add depth to the algorithm, you need a third dimension. This means that for every possible 8 word input(x), it gets mapped to an 8 word output(y), which can be any other 8 word output(z).

Consider what this means in terms of computing: You have a 3 dimensional problem space restricted to a vocabulary of 800 words in 8 word sets. This brute force methodology brings you face to face with a monstrous (1.67x10^26)^3 space. Unless you’re utilizing every atom of planet earth’s mass, no computer is going to be able to process this effectively.

There is no physically or mathematically feasible way to implement a search function that approximates general intelligence. Chatbots that incorporate search as the engine of intelligence will only ever work on extremely limited domains.

Applied to real language, of which vocabularies can exceed 100,000 words, the numbers become even more ridiculous. ELIZA style bots are dead ends, except for extremely limited domains.

Human beings don’t have septillions of neurons, so how do we manage conversation and intelligence? Well, we don’t use a brute force search function. Our brains do something far more interesting than “if;then.” Human brains make use of electrical signals caused by chemical reactions in neural axons and dendrites to communicate with other neurons. The arrangement of these neurons is mostly on the surface of the brain, within an organ called the neocortex. Most neural activity happens at the surface; the vast majority of the mass of the brain consists of connections between neurons.

A neural researcher, Vernon Mountcastle, posited that there was only one cortical function, that neurons from any one part of the brain were doing the same thing as neurons from any other part of the brain. The difference in function came about from connections to other organs and their interface with the outside world. The basic unit of computation was the cortical column, or stacks of neurons arranged in roughly 6 layers inside the neocortex.

The biological discoveries about brain function were accompanied by discoveries in artificial neural networks, which turned out to be universal function approximators. This means that any problem space can be modeled in a neural network, and any function within the problem space can approximated through changing the weights of the network. People got really excited in the 80’s when neural networks became more accessible, and many breakthroughs were made in backpropagation training techniques.

The problem with this, in regards to the design of an intelligent chatbot, is that while a neural network can approximate nonlinear functions, it cannot ignore the dimensionality of the problem space. You would still need to exhaustively model the possible inputs and outputs and either incorporate a complex feedback system, or explicitly add a new set of inputs for every previous percept in a sequence you want to affect the output. Neural networks are just as susceptible to combinatorial explosion as the simple search function.

The crux of the problem is that while generating a naive plausible response is easy, real depth in a conversation requires knowing what has come before. Not a perfect memory of each word, but the spreading activation of potentially relevant connections within a semantic network.

The problem space in which a chatbot operates can be reduced fairly easily by dividing it into features. The most common approach to this is to do some sort of POS tagging. Dimensionality reduction is achieved through many statistical means. None of these are necessarily easy, and none of them so far have translated into deep conversation, because no matter how you reduce the problem space, you still have to produce a function that approximates some sort of intelligence.

Neural networks fail because they have no memory, or because it’s extremely hard to model the entire problem space.

2 algorithms of which I am aware, one relatively new and the other relatively unknown, have the potential to produce a functional general intelligence that can serve as a chatbot’s mind.

The first is Numenta’s Cortical Learning Algorithm / Hierarchical Temporal Memory, which is slated to be open-sourced sometime soon. The second is the neural model known as Long Short Term Memory, incorporated into a memory-prediction framework.

Both algorithms use the memory-prediction theory to perform learning and processing.

The downfall of CLA/HTM is the complexity and scalability. Thus far, the implementations don’t appear very optimal, they don’t operate on non-binary inputs, and take a huge amount of memory and processing power. My experiments with it led me to the discovery that while a deep-conversation bot could be built around HTM, the depth would be severely limited unless the algorithm became much more efficient.

This led me to a search for a more efficient algorithm, and I revisited one of my earlier favorite algorithms, LSTM. LSTM is a type of neuron that instead of summing all inputs from a previous layer and applying an activation function, it uses weights from cells in the previous layer to modify logic gates that tell it to memorize, forget, pass input through, or damp input. The model allows memory of activation to be stored indefinitely, making the neuron inherently temporally aware, across an arbitrary number of cycles.

This means that contextual information is preserved and used when needed. After determining that LSTM neurons provided the basic features needed for a deep-conversation bot, I set about searching for a way of implementing the memory-prediction theory.

One way would be to simply train a network over time to predict its current inputs from the previous inputs into the system, using backpropagation. This would mean selecting an initial layer structure and arbitrary constraints. It’s easy to see how that would fail.

Another way, which I am currently exploring, is the use of Autoencoders as a spatial learning technique, and then adding a layer that accepts the output of the autoencoder and the content of each neuron’s memory to predict the next input into the network. This algorithm would dynamicaly add layers when it encounters patterns it’s not capable of predicting, and then the activation of a successful layer would trigger the memory cells to pass their input to a preidiction layer.

I’m still working on the details, but I want to use backpropagation as the learning method, because it’s well understood and extensible to other optimization techniques.

The actual architecture of the bot would consist of an input space that read lines of text as pixels, an output space that behaved as a buffer, and one or more neural networks arranged in a hierarachical manner. The bot should have the capacity to recognize its own output and cogitate accordingly.

If HTM proves to be more efficient, I wouldn’t hesitate to attempt the same sort of architecture to produce a deep-conversation bot, but I have a gut feeling that LSTM will prove a more efficient system altogether, because of its relative simplicity.

One neat thing about either approach is that both are extensible. Once you have a foundational network, you can add capacity to the system and it utilizes the new resources intelligently.

 

 
  [ # 1 ]

If the course is offered again, you may be interested in Coursera’s course:
Neural Networks for Machine Learning
by Geoffrey Hinton

Hinton is leading some of the most interesting work in neural networks as they relate to machine learning. He and his team have recently been hired by Google to work on their Knowledge Graph.

The course reviewed the latest Neural Network techniques (including LSTM). I found Hopfield networks and Restricted Boltzmann machines and stacked RBMs to be of particular interest in regard to machine learning and neural network memory.

Recurrent Neural Networks

Jan is a big neural network guy and his architechture is based on them. You might get a jumpstart by working with him/his tools.

The Computational Neuroscience may also be of interest.
Topics covered include:

1. Basic Neurobiology
2. Neural Encoding and Decoding Techniques
3. Information Theory and Neural Coding
4. Single Neuron Models (Biophysical and Simplified)
5. Synapse and Network Models (Feedforward and Recurrent)
6. Synaptic Plasticity and Learning

There are around 167,772,160,000,000,000,000,000 , or roughly 167 septillion 8 word combinations. Obviously Natural Language isn’t as random as that, but those combinations represent the borders of the first dimension of the problem space.

The problem space in which a chatbot operates can be reduced fairly easily by dividing it into features. The most common approach to this is to do some sort of POS tagging.

It is true that the problem space is large and sparse (and even larger than you suggest if you take into account mispellings, and Named Entity Recognition). But, as you said, it can easily be reduced (although I would disagree that the most common approach is POS tagging). AIML’s symbolic reduction (SRAI) is just one example. The more recent inclusion of sets/aliases in most of the modern chatbot languages is another example aimed at handling the problem.

The problem of depth in chat includes:
NLU - Natural Language understanding, or what was it that was said. The most difficult part is retaining context over the course of the conversation.

Goal Driven responses - How should a response be formed? Conversational and world memories come into play, along with goal driven behavior and planning.

NLG - Natural Language Generation, assuming that an AI has understood the input and has decided on a concept to output, how should the response be formed? Also, response need to have variation, or the user loses interest in the bot.

Each of these has their own difficulties. I would suggest that solving them all may require a system (of which a neural net may be a component) using a variety of techniques. Some of which still need to be invented.

 

 

 
  [ # 2 ]

Some of my links are broken:
Recurrent Neural Networks - http://www.idsia.ch/~juergen/rnn.html
Hopfield Network - http://en.wikipedia.org/wiki/Hopfield_network

 

 
  [ # 3 ]

I fixed the broken links, Merlin. All better now. cheese

 

 
  [ # 4 ]

I include statistical parsing that chunks NL into probablistic categories under the umbrella of POS tagging, even if it’s only loosely similar. Both end up with arbitrarily 2 dimensional mappings of features.

I’ve actually followed Hinton’s course and I’ve followed almost all his lectures on youtube and elsewhere. Using stacked autoencoders enables some neat higher order behavior, but neural networks definitely need a temporal component, or they’re missing an entire dimension crucial to human level experience.

The difficulty in adding temporal components to neural networks becomes quickly apparent when you try to use your vanilla feedforward network to do any sort of time dependent learning. You have to incorporate one of two things: a graph, into which you feed positionally dependent inputs based on a preconceived model of the world, or one or more copies of the input state representing previous inputs into the system.

Both of those have crippling limitations. It’s simply not feasible to replicate the input space for each timestep you want to include in your accounting. You can try to cheat and use moving averages to carry forward activity, but you end up losing nuance very quickly. Building a graphable model of the world runs smack into the fact that there’s simply too much to know to build a general model of everything that’s knowable out of little factoids. see Cyc/wolfram alpha.

These limits neccessitate a different form of network. It cannot be completely connected in the feedforward fashion. This wastes space on meaningless synapses. An intelligent system has to be linked together sparsely. It must also display attention, forgetting, long term memory, short term memory, episodic memory, and creativity.

Memory-prediction theory fits the bill nicely. HTM works well, but the algorithm is in its open-source infancy. There’s not a lot of efficient code out there, and that means there’s not a lot to experiment with, outside of the beta level development code.

Now don’t get me wrong: OpenHTM is blazing the way, and their project is amazing and absolutely fascinating to play with. You get to poke about and see the insides of something that could be considered generally intelligent, to the limits of its resources. It’s just not feasible to run a massive grid to try a self-organized approach, so you have to create a hierarchy based on some sort of preconceived model.

And the inputs are binary. That bugs me… thus the exploration of stacked LSTM neurons as a replacement of the neural model in HTM, with a learning mechanism based on the memory prediction theory of cognition. I’d still have to come up with some sort of model, but having inputs that can range from -1 to 1 make implementing sensors and articulators and other embodiment much more feasible. Right now I’m at the stage where I’m taking the demonstration problems from HTM and attempting to manually build an LSTM network to solve the problem.

The structure of the network after it’s been pieced together should offer insight into how to automate the process. Hinton’s lectures are most definitely a great resource, and Schmidhuber is one of my heros.

 

 
  login or register to react