AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Functional Approximation, or, “The Flapping Wright Bros”
 
 

Intelligence is hard to define, but can be summarized, in a brief sense, as a feedback loop that operates on and in internal models and external environment. In this sense, there’s just a matter of degree between Alice, a squirrel, and a human mind. Or one of those mesmerising water/oil fulcrum wave contraptions, but I digress…

The internal model of a mind is the sum total of the perceptions, the model of the world, the interpretation of the model of the world, its projections, and pretty much represents the outer boundaries of what could be considered an “intelligence”. The external environment includes the corporeal aspect of the intelligence - the thing itself, not the perceived model - and the actual environment it moves in. Whether this is a virtual environment and the corporeal aspect is a software agent, or whether its a human body hosting a brain surgeon, the two facets interact in very similar ways.

Approximating intelligence should be approached the same way the Wright Bros approximated the flight of birds - they tried flapping, they tried gliding, and then they figured out a reasonable, functional approximation of flight that allowed them to stay aloft. Birdlike flapping doesnt cut it, at least with their level of technology (I’m sure nanomaterials and future technology will enable energy efficiency and materials that enable flapping flight, but in order to begin flying, they needed to work with what they had.)

In the same way, AI researchers need to work with what they have, and provide functional approximations for aspects of intelligence they can observe. This is precisely why current chatbots fall so flat. They’re flapping for all they’re worth, and intelligence just doesn’t work that way. You can’t simply replicate the words that result from intelligent thought processes and drop them in as a replacement for the thought processes themselves. At best, you’ll get a Loebner prize winner, or a couple days notoriety on a slow news week when a couple animated bots deliver a loosely scripted newscast.

At worst, you’ll realize the months and years of work needed to make “flapping” come anywhere near reasonably imitating something intelligence. Even when you end up with a reasonable imitation… it’s still not quite the real thing. No depth.

For the last several years, I’ve been thinking about how to get the intelligent thought processes that result in intelligent behaviors. It’s taken me all the way back to SHRDLU, which demonstrated a wonderful fusion of the internal model and awareness of the external environment, to neural networks, Cyc, AIML, CLIPS, and OpenCog. All of these projects demonstrate some excellent features, but none of them quite meet my criterion - self learning without tedious manual entry, runs on windows, no extensive mucking with arcane OS dependencies, no gratuitously cryptic jargon.

Cyc and OpenCog, btw, are the two most promising AGI projects currently being developed. Cyc may be 10 or 20 years from reaching the self-propelled stage, but it is measurably improving, and other statistical results suggest that eventually, Cyc will achieve AGI, once someone incorporates a valid feedback model with enough internal awareness to boostrap it.

Anyway, my conclusions are these: If you want a smart chatbot, it’s got to have the ability to be smart. It needs memory, reasoning capacity, embodiment, an internal model, and externally constrained behaviors (meaning if it has a robot arm, it doesn’t wildly flail it for no reason, it needs a basic “instinct” that damps the use of that arm unless specifically directed.”

All of those things need to be held together in an internal model that is capable of predicting an arbitrary number of interactions between internal and external stimuli (projecting the future to guide behaviors.)

What this means in English is simple. Instead of a bot mindlessly accepting a string as input, passing it to an if/then loop, and returning an output, it needs an active monitor that watches for input, and when input is given, the internal model updates (recognition) and then produces a behavior (act) and then updates again to account for the fact that something was recognized, something was done, and then, depending on the learned behaviors, other things may be done, or it can go back to waiting for a response.

This simple recognize/act cycle gives the bot a depth that can be expanded on much more easily than a trigger/response cycle that isn’t explicitly aware of time, because it forces the developer to fashion behaviors based on explicitly modeled interactions. Instead of “if input = hello then respond hello” you get “listen for input, respond appropriately”. This means that you need to teach a system general behaviors, and in order for that to work, you have to have a system capable of generalizing.

This is where neural networks are great. When properly used, they can take noisy inputs and produce useful outputs, and can be updated and trained continuously. They can also provide rigid and arbitrary behaviors through deliberate overtraining, and otherwise act as general “buckets” for processing.

My current angle in developing a chatbot is the use of arbitrarily built ordered binary graphs representing sets of well defined fixed or dynamic variables as inputs into a neural network. What this means is that I take some number of concepts that have a logic order - something like this:

000 - Socrates
001 - Zeus
010 - Man
011 - Mortal
100 - AllOf
101 - queryIsAB(A,B)
110 - Yes
111 - No

The network is built on the fly based on the inputs it receives, so in this case if I wanted to train the network to respond appropriately, I would teach it 101 000 010 as inputs with output 110. You could also teach it that Socrates is not Zeus, that Zeus is not Man, that Man is Mortal, and so forth.

A small network like that would require explicitly training every logical rule, without a lot of room for generalization. The network couldn’t generalize answers about a graph very well (or really at all) - it wouldn’t pick up that Socrates is Mortal unless you specifically teach it that bit of logic: if A is in B, and all B is in C, then A is in C.

Expand the size of the network, however, and give it a whole lot of examples of first order logic, with a set of fixed variables, and then you end up with a neural network model capable of some remarkable things. For example, if I incorporated the entire Cyc specification of relationships, and then utilized the remaining variables for a particular domain with specific knowledge and query variables, then I could have the ability to load knowledge in and out of the bots “brain” simply by changing the values of the references and loading the correct neural network with all available contextual information. Memory becomes a matter of storing a sequence of database references - concept lookup points. You don’t need the whole neural network running when you can activate it based on the context provided by the indexed concepts.

The generalization part comes in when I can identify a potentially correct network, fill in the available known information, and because of the nature of neural networks, it’s already got a whole lot of assumptions built into the model, so I don’t have to waste time specifying rules and situations and micromanaging the particulars. I operate on a statistical level, in the sense that the output from the neural network can be interpreted - and gives the system the inherent ability to “not be sure.”

Every concept and neural structure is stored in a simple lookup table (formal database or otherwise) and essentially represents a massive hypergraph. Knowledge is fractal, in the sense that as long as you have a sufficiently expressive initial set of first order logic terms, you have a turing complete network, capable of producing output that can be fed back into the same network, or same series of networks, for nonlinear results.

It also lends itself nicely to recreating particular regions of the brain in functionally and architecturally significant ways. It’s modular - the neural networks are fixed size, but taken together express more computational power than the sum of the parts, without the overhead of a combinatorially explosive monolithic ANN.

The tricky parts I haven’t fully fleshed out yet are the neural network lookup system - given a set of concepts, how do I pull up all “relevant” networks? How deep and how broad should the lookup go? However, I plan on creating a set of test cases, and then building off of them, and then ultimately building fully “cognizant” neural networks that only lack training on specific ontologies, but are otherwise completely trained on whatever implementation of first order logic I end up using. I’m debating on incorporating OWL in its entirety (especially since it comes with fully loaded existing test suites and is well documented) or CycL. The processing system that looks up networks will also need to incorporate back and forward chaining inference within the process, but I see these as surmountable challenges.

The bot learns by training networks sequentially on recognized inputs. Each unique ontology gets its own network, so if new inputs are detected, new networks are spawned. Part of the semantic pruning process would be compressing shared attributes of fresh inputs (age of concepts would be a factor in the pruning process) and then updating the network associations. Every network would be online, and trained incrementally after each recognition occurs. This implies that by changing the learning rate, weights, activation functions, and other aspects of the neural network, you end up with the potential to simulate virtual “drugs” and psyschoses and other fun stuff.

In any case, I’m planning on creating a wide set of initial learning behaviors, rather than incorporating existing knowledge, so as to avoid piegeonholing. Theoretically, then, it would be a matter of creating interfaces (embodiment) that did things like parse text. Some sort of metaprocess would also be needed to clean the neural networks, retrain overlapping concepts, and so forth - semantic pruning. That, however, is a long ways away. The learning behaviors would impact the development of the internal model, and the recognize/act processes through which the system learns things.

 

 
  [ # 1 ]

Bruce Lee said,
“Use only that which works, and take it from any place you can find it.”

 

 

 

 
  [ # 2 ]
JRowe - Dec 31, 2012:

Intelligence is hard to define, but can be summarized, in a brief sense, as a feedback loop that operates on and in internal models and external environment.

I’m quite satisfied with my definition of intelligence. I tried to improve it recently, but CR Hunt, a member here, put me through the wringer, whereupon I realized that I had to fall back to nearly the same definition I proposed back in 1990, though with some added words, especially “with respect to a given goal” and “the ability to perform”:

“Intelligence” with respect to a given goal is the ability to perform efficient, adaptive, processing of real-world data, processing that is directed toward achieving that goal.

Usually the implied goal is survival. As far as I can tell, this definition covers everything that is reasonable and relevant, and it can even be converted to a formula that measures intelligence. See this thread:

http://www.chatbots.org/ai_zone/viewthread/589/

JRowe - Dec 31, 2012:

Approximating intelligence should be approached the same way the Wright Bros approximated the flight of birds - they tried flapping, they tried gliding, and then they figured out a reasonable, functional approximation of flight that allowed them to stay aloft.

This is a decades-old analogy that people still often use, and it’s getting old enough that I’m getting of tired of hearing it. A much more accurate, useful, and modern approach than appealing to an engineering analogy is David Marr’s three levels of system description, originally used for vision systems, but it’s an approach that can be applied to general AI, I believe. I plan to make that topic one of my next posts, say 1-4 posts from now.

JRowe - Dec 31, 2012:

In the same way, AI researchers need to work with what they have, and provide functional approximations for aspects of intelligence they can observe.

But what if they’re still missing a critical piece of the solution? That’s another thread topic I have yet to post.

JRowe - Dec 31, 2012:

At worst, you’ll realize the months and years of work needed to make “flapping” come anywhere near reasonably imitating something intelligence. Even when you end up with a reasonable imitation… it’s still not quite the real thing. No depth.

Definitely true. Dreyfus gives an analogy of early man trying to get to the moon by climbing ever higher trees, which only asymptotically approaches a limit that will always be seriously deficient. It is necessarily to look deeply into the essence of nature to make major progress of that sort.

If the alchemist had stopped poring over his retorts and pentagrams and had spent his time looking for the deeper structure of the problem, as primitive man took his eyes off the moon, came out of the trees, and discovered fire and the wheel, things would have been set moving in a more promising direction. After all, three hundred years after the alchemists we did get gold from lead (and we have landed on the moon), but only after we abandoned work on the alchemic level, and worked to understand the chemical level and the even deeper nuclear level instead.
(“What Computers Still Can’t Do: A Critique of Artificial Reason”, Herbert L. Dreyfus, 1992, page 305)

JRowe - Dec 31, 2012:

It’s taken me all the way back to SHRDLU, which demonstrated a wonderful fusion of the internal model and awareness of the external environment, to neural networks, Cyc, AIML, CLIPS, and OpenCog.

I love SHRDLU! See my comment about my initial reaction to it:

http://www.chatbots.org/ai_zone/viewthread/1154/

In fact, I’m seriously thinking of resurrecting SHRDLU with a new language interface and a new representation of the world, as a demo platform for my new architecture. I plan to post a thread on that topic of updating SHRDLU, too, since my discussion requires some analysis of where SHRDLU went wrong. I’m fairly unfamiliar with Cyc, since it always seemed such an extremely naive approach to me, but maybe I’ll review their approach again.

JRowe - Dec 31, 2012:

What this means in English is simple. Instead of a bot mindlessly accepting a string as input, passing it to an if/then loop, and returning an output, it needs an active monitor that watches for input, and when input is given, the internal model updates (recognition) and then produces a behavior (act) and then updates again to account for the fact that something was recognized, something was done, and then, depending on the learned behaviors, other things may be done, or it can go back to waiting for a response.

This is the topic of yet another thread I want to post. (So many threads, so little time.) In summary, language is largely just a method of evoking knowledge of the real world that is common to both entities communicating. Too deep a topic to discuss in a short space. It relates to Searle’s Chinese Room Gedankenexperiment. I think that’s that same as what you’re saying, though.

JRowe - Dec 31, 2012:

This is where neural networks are great. When properly used, they can take noisy inputs and produce useful outputs, and can be updated and trained continuously. They can also provide rigid and arbitrary behaviors through deliberate overtraining, and otherwise act as general “buckets” for processing.

Beware the artificial neural network for it hath not the ability to perform binding. grin

Yet another thread topic. I think I’ll stop there. Thanks for letting us know about your approach, and welcome to this forum. It’s nice to know what others are up to. Some day soon, after I’ve gotten my first recent article completed and put in the public domain, I’ll start a thread about my own project. I haven’t decided whether that thread should be in the “AI thoughts” forum or the “My Chatbot Project” forum, since it’s more of an AI project than a chatbot project… unless I decide to resurrect SHRDLU early on. Then it will have a heavy language emphasis, as well.

 

 

 
  [ # 3 ]
Mark Atkins - Jan 1, 2013:

I love SHRDLU! See my comment about my initial reaction to it:

http://www.chatbots.org/ai_zone/viewthread/1154/

In fact, I’m seriously thinking of resurrecting SHRDLU with a new language interface and a new representation of the world, as a demo platform for my new architecture. I plan to post a thread on that topic of updating SHRDLU, too, since my discussion requires some analysis of where SHRDLU went wrong.

It’s already been done. You can download all the updated source code and working binaries from the following web page, however you will be disappointed.

http://www.semaphorecorp.com/misc/shrdlu.html

SHRDLU was implemented as an augmented transition graph connected to a planning module. It only worked at all because the domain of discourse was so limited. What was impressive about it was that the authors managed to pack so much into the tiny amount of computing capacity that was available at the time.

The part that I found most interesting (but ultimately not very useful) was the flow charts detailing SHRDLU’s parsing algorithms, available here:

http://dspace.mit.edu/bitstream/handle/1721.1/5791/AIM-282.pdf

 

 

 
  [ # 4 ]

The network is built on the fly based on the inputs it receives, so in this case if I wanted to train the network to respond appropriately, I would teach it 101 000 010 as inputs with output 110.

The neural network that you are describing is basically how my system works (but with a lot more features, like networks inside networks, programmable networks, dynamic output binding, a custom db system for neural nets,....) Take a look here:
http://bragisoft.com/

For the source code that defines how the neural network works, check out the project on sourceforge. (the subdirectory ‘moduleschatbot’ contains the source that does what you describe sort of (it’s in NNL wich compiles into neurons).

 

 
  [ # 5 ]

Thanks, Jan! Your system is actually one of my inspirations! I’ve spent some time with it, but it doesn’t quite match up with my long-term ideas for the experiment.


Over the last couple years I’ve been watching the development of Numenta’s HTM, and learning about LSTM neural networks. http://en.wikipedia.org/wiki/Long_short_term_memory

I have a very good foundation for my LSTM model, written in Lua, which I’m planning on either extending to incorporate some sort of attraction/repulsion feature, or using a variation on HTM’s style of clustering. The idea is to end up with a system that behaves predictably, has the ability to adapt statechanges piecewise (as opposed to monolithic single layer modeling) and incorporates sparse distributed memory (SDM.)

SDM, in the context of neural networks, means that layers are not fully interconnected, and don’t need it to function. HTM capitalized on SDM by using a 2d matrix and euclidean clustering to develop and reinforce connections between stacks of cells. The end result behaved just like a giant feedforward network trained sideways, with monolithic single layer modeling. (HTM with LSTM built into the cortical stacks might be a great project.)

Anyway, I have several features I need to be exposed within my engine, and LSTM does the trick. What I’d like to have is the ability to generate a sparsely connected “basic” network with a fixed model size (imagine 5000 input neurons, 10 hidden layers of 2000 neurons, and then 1500 outputs.) This basic network would be trained on predicate calculus of some sort (OWL/CycL/something) and then its memory cleared, and it would be stored away, indexed and associated as a semantic endpoint. The inputs represent a fixed model size - 5000 input neurons means 500 10bit concepts. This translates into 1024 different variables, a subset of which are the predicate calculus, the remainder of which are variables- some of them are dynamic, fill in the blank variables, others can represent higher order concept placeholders.

Higher order concept placeholders are things like “person1” “person2” “place1” “place2” and so forth. These concept placeholders do not need to be part of every input into the system, because the network has learned an ontology, and LSTM will have a specific cluster or chain of neurons that remember recurrent input (it’s implicitly temporal.) A situation graph will fix a unique concept (i.e. Mark Atkins) to a placeholder (person1) and the system will not need to develop new concepts for repeated scenarios. It can re-use the “conversation” network with n participants simply by flushing the memory and assigning the variables. Every time it encounters “Mark Atkins” it will have a set of associated concepts it can apply to the conversational network - the longer the association, the more likely Mark will accrue a set of memorys associated with several unique networks. Memories are developed as individually trained networks, so that when the network is queried, the details of memory are produced. Soandso did such and such on a particular date, said this or that or the other thing.

Whenever the bot learns something new, it breaks out the basic network, fills in everything it knows, and checks the results. It sets the variable to newly acquired data (in the case of a chatbot, it could be a new word) and then runs the network. The output will result in an executable ordered semantic graph that gets fed back into the network. Each new concept learned is tested against higher order concept placeholders to determine if it behaves similarly (classifying the location in the ontology) and then indexed accordingly. This compresses the total number of possible networks - no combinatorial explosion, easy lateral expansion of knowledge (meeting people becomes a matter of associating them with features and indexing a semantic endpoint.)

One of the important features of LSTM is that you can identify a particular neuron and say “That neuron represents the awareness of “Person1” placeholder!” This would allow you to do some really creative things with learning - incorporating subconscious routines that directed attention, emotional feedback, and so on. You could develop a network that used awareness of particular neurons in other networks as inputs, and so forth. Relatively abstract concepts like significance, importance, attention, distraction, obsession, or other things can be exhibited.

These networks behave like executable hypergraphs. This boils down to a system in which I craft an ontological representation of the bot’s mind, starting with high level concepts, and then breaking each of them into more detailed networks. Semantic granularity is implicit - you get exactly what the model provides for, and if minute detail is needed, then express it in the ontology.

I’m very heavy on theory, right now, but I have a good foundation for LSTM networks already written in Lua, and I’m weighing the merits of using Chatscript as the frontend. Lua is trivial to embed, and it seems it would almost be trivial to embed a ^lua(scriptName, input) function in chatscript, with some carefully considered hooks. I’d probably want to create a daemon that did the processing, with some sort of lowlevel networks constantly running, representing awareness of time and timing and so forth.

 

 
  [ # 6 ]
Andrew Smith - Jan 1, 2013:

It’s already been done. You can download all the updated source code and working binaries from the following web page, however you will be disappointed.

Awesome, thanks. I hadn’t had time to research the details of how SHRDLU was implemented, so those references will more than help. As a result of recently trying to think of impressive demos, I’ve become very interested in microworlds and their pros and cons—yet another thread topic I want to start. The original SHRDLU was criticized for three reasons that I’ve read: (1) limited language understanding; (2) limitation to a microworld; (3) lack of a generalizable knowledge representation method. I claim that all those criticisms are either invalid or reparable. Maybe there exist more criticisms that I haven’t had time to find, though.

SHRDLU will rise again! grin

 

 

 
  [ # 7 ]

Cyc is pretty much SHRDLU writ large. They both use Lisp, even. tongue laugh But yeah, they’re based on pretty much identical principles, Cyc is just the production level version, where the original SHRDLU was the proof of concept.

 

 
  login or register to react