AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Chatbots, AI, wildcards, random phrases and Pattern Matching
 
 
Laura Patterson - Dec 12, 2011:

In reality, using random answers and wildcard substitutions is a very poor approach in design. I would much rather have a bot with limited responses that are based on the true understanding of the users input than tricky random replies.

As I emphasised in previous posts, wildcards are to be used carefully so the original Human Message is retained and the CyberTwin responds accurately.

And as you’ll notice in your CyberTwin account, the random responses only provide different CyberTwin Messages to the same Human Message passed through multiple times. For example, if I was to say:

Human Message: Is this contest fun?
CyberTwin Response: Yes this contest is very fun
Human Message: Is this contest fun?
CyberTwin Response: Yes, as I said this contest is really fun!

The CyberTwin is still matching to the correct Human Message, there are just ‘variations’ to the CyberTwin Response.

And of course you can include as few or as many Human Messages/CyberTwin Responses as you like smile

Cheers

Shaun

 

 
  [ # 1 ]
Laura Patterson - Dec 12, 2011:

A true Turing-Test should not be based on clever deception. If so, what is the real point?

The real point is to make money and a significant portion of the population have always believed that it’s ok to cheat in order to do that. For truth and/or innovation you will have to look elsewhere.

 

 
  [ # 2 ]

Laura, depending on one’s point of view, even the approach you’re taking can be considered a “deception”, if the objective is only to create an entity who’s conversational skills are such that you can’t tell it apart from an actual person. From the standpoint of the person having the conversation and expecting the other participant to be human, when in fact it’s a computer program/chatbot, there is a clear case of deception, regardless of which approach the chatbot uses in creating conversation. The sad fact is, the only “visible” successes in the area of AI with regard to the various Turing Tests (e.g. the Loebner Competition, Chatterbox Challenge, etc.) have been with pattern matching types of bots. That will change at some point, and we’ll see the AIML/ChatScript/other bots that use this approach go by the wayside, but that’s not the case right now.

Now on the other hand, from the standpoint of just having a conversation for the sake of the conversation, rather than whether one or more participants are biological or not, then the only “deception” in play is from any participant of said discussion that uses pattern matching & wildcards to emulate understanding, and try to pass themselves off as intelligent.

I’d also like to point out (and this is just my opinion, so take it or leave it, as you will) that even your approach can be considered, in some aspects, to be a type of pattern matching, though it’s matching patterns of understanding, rather than patterns of words. To me there’s a huge difference there, but there’s also the tiniest thread of similarity, as well. smile

 

 
  [ # 3 ]
Dave Morton - Dec 12, 2011:

Laura, depending on one’s point of view, even the approach you’re taking can be considered a “deception”, if the objective is only to create an entity who’s conversational skills are such that you can’t tell it apart from an actual person. From the standpoint of the person having the conversation and expecting the other participant to be human, when in fact it’s a computer program/chatbot, there is a clear case of deception, regardless of which approach the chatbot uses in creating conversation. The sad fact is, the only “visible” successes in the area of AI with regard to the various Turing Tests (e.g. the Loebner Competition, Chatterbox Challenge, etc.) have been with pattern matching types of bots. That will change at some point, and we’ll see the AIML/ChatScript/other bots that use this approach go by the wayside, but that’s not the case right now.

Now on the other hand, from the standpoint of just having a conversation for the sake of the conversation, rather than whether one or more participants are biological or not, then the only “deception” in play is from any participant of said discussion that uses pattern matching & wildcards to emulate understanding, and try to pass themselves off as intelligent.

I’d also like to point out (and this is just my opinion, so take it or leave it, as you will) that even your approach can be considered, in some aspects, to be a type of pattern matching, though it’s matching patterns of understanding, rather than patterns of words. To me there’s a huge difference there, but there’s also the tiniest thread of similarity, as well. smile

Dave that’s rubbish. For one thing, how many mainstream artificial intelligence researchers have bothered to enter their software in the Loebner contest? You don’t find professional astronomers showing up at astrology conventions do you?

For another thing, the difference between what Laura, CR, Victor, Andy, Bruce(?), myself and some others are trying to achieve is as fundamentally and provably different as the difference between pattern matching with regular expressions and parsing with context free grammar.

The level of success that has been achieved in natural language understanding is already astounding but though I’ve pointed this out here before, it’s largely been ignored. I won’t try to speculate about why it’s been ignored because unfortunately any hypothesis that I could put forward would be deemed insulting by too many of the people frequenting this forum.

If you want the basis for robust, capable natural language processing software go here:

http://www.cs.rochester.edu/~james/

No doubt there are other projects at least as advanced as this one, but this is the only one that I know about that has been widely publicised. Also, most other researchers in the public eye are still a bit preoccupied with statistical approaches to NLP, presumably because it’s easier to get grant money for that than for actually doing anything novel.

 

 

 
  [ # 4 ]

Andrew, I love you dearly, but you completely missed my meaning here.

First off, with regard to the LC and other Turing tests, the key word that I mentioned, and that you seem to have missed is “visible”, for exactly the same reason that you’re calling me out. Serious research in the area of AI, such as occurs in universities around the world, and by individuals such as yourself, Laura, Victor, CR, and others don’t enter these types of competitions, and thus aren’t generally visible, so my statement holds. No intimation of any sort was made there, and if you read one in my comments, then I can’t help.

Secondly:

Andrew Smith - Dec 12, 2011:

For another thing, the difference between what Laura, CR, Victor, Andy, Bruce(?), myself and some others are trying to achieve is as fundamentally and provably different as the difference between pattern matching with regular expressions and parsing with context free grammar.

I agree completely. What you folks and others in the field of AI research have accomplished, and the strides that have been made are nothing short of phenomenal, and they’re worlds away from AIML, or other “pattern matching” methods.

but!

That doesn’t change the basic, “nuts and bolts”, fundamental concept that, no matter how the data that’s transformed from input to output, whether it’s AIML, NLP, or even human thought/speech, at the most basic of levels the input is used as a template or pattern to search through a “knowledge base”, to determine the correct output. I’m not just talking about words here. I’m referring to something much more. Here’s an example:

In a given conversation, I say to you that I have a red truck. Your mind (in addition to unimaginably huge amounts of other data (e.g. memories, ideas, etc.) probably has within it hundreds of thousands, maybe even millions of images or concepts that match “red truck”, as opposed to “yellow baby buggy”. Your brain at some level searches all of the data within, collecting things that match “red truck”, and discard or reject anything that doesn’t match. “Red truck” is the pattern, and what comes to mind when I mention my red truck is the match.

Yes, this is a gross oversimplification. But my point is that, whether you like it or not, there is a small amount of “common ground” between text-based pattern matching chatbots and “understanding-based” chatbots. The relationship is somewhat like ours to bacteria. We both have DNA (our “common ground”), but humans are immensely more complex than bacteria. Are we superior to bacteria? In many, many aspects, yes. But we still share that “common ground”.

 

 
  [ # 5 ]

Comprehension of individual words as opposed to pattern matching is far closer to real AI IMO. Yes, this approach does have it’s challenges, however the results are far more useful.

Understand, I am not putting down the efforts of any members here to achieve their goals. Back years ago when I worked for US Internet, I use to challenge the engineers in the same way when 56k was the latest and greatest. Only by pushing the envelope do we progress forward. Right?

 

 
  [ # 6 ]

Here’s a tip I picked up along the way: it doesn’t matter how you get the data in your system, it’s what you do with it afterwards. And this is where most regular parsing techniques (like LL(1), LR(1),RR(1), ... fail: they need to parse the entire text or nothing.
Pattern matching on the other hand generally doesn’t need to ‘understand’  the entire input, just a part, so you can get away with gaps.
But, you can still extract the data out of the matched patterns, so you can still do the ‘deep thinking’ before generating an answer.

 

 
  [ # 7 ]

Good point, Jan. wink
Getting as much data from the user input as possible is the ultimate goal.

 

 
  [ # 8 ]
Jan Bogaerts - Dec 12, 2011:

Here’s a tip I picked up along the way: it doesn’t matter how you get the data in your system, it’s what you do with it afterwards. And this is where most regular parsing techniques (like LL(1), LR(1),RR(1), ... fail: they need to parse the entire text or nothing.

Jan you don’t know what you are talking about. The input that can be parsed is a function of the rules that you define for the grammar, not the class of grammar for which you are defining the rules. You can define wild card patterns for context free grammars just as easily as you can define wild card patterns for regular expressions. I shouldn’t have to be explaining something so obvious to you.

You can also define many other kinds of patterns in a context free grammar that you cannot define using regular expressions without enumerating them all individually, which at best is inconvenient and at worst is impossible.

So, even if you were only concerned with pattern matching, you would still be far better off using a more powerful parser than the kinds of tools that you are limited to using at the moment.

Now I should make one further point about this. The algorithms that you mention are not able to parse unrestricted context free grammars, but only subsets of them. The compromise was made for reasons of efficiency, back when people needed to write compilers for machines that only had a few hundred kilobytes of memory and a state table of 30kB was unacceptably large. Nowadays we can use unrestricted context free grammar parsing algorithms such as GLR and GLR* which can be used in a wide variety of applications, even though they may need gigabytes of memory to run.

There are two really good solid reasons why you should forget about using regular expressions and should start acquiring the knowledge and tools to use context free grammars.

The first reason is that context free grammars are capable of expressing all the patterns that are to be found in almost all natural languages.

The second reason is that when you are using a context free grammar you can add and remove rules without having to worry about changing the nature of it. (That is, a CFG plus a CFG gives you another CFG. If you are using regular expressions or a subset of a CFG grammar i.e. LR(1), then as often as not when you add another rule it breaks the parser i.e. you get a shift/shift or shift/reduce conflict which the parser cannot handle.)

Pattern matching on the other hand generally doesn’t need to ‘understand’ the entire input, just a part, so you can get away with gaps. But, you can still extract the data out of the matched patterns, so you can still do the ‘deep thinking’ before generating an answer.

Right, so you are saying that anything that is not understood should be ignored and discarded? That explains a lot. Seriously??? It’s better to process the wrong input than recognise, let alone acknowledge, that you don’t understand it?

 

 

 
  [ # 9 ]

You can define wild card patterns for context free grammars just as easily as you can define wild card patterns for regular expressions. I shouldn’t have to be explaining something so obvious to you..

Yes you can. Now you try and do that for every part in every grammar rule in you definition. Have fun!

You can also define many other kinds of patterns in a context free grammar that you cannot define using regular expressions without enumerating them all individually, which at best is inconvenient and at worst is impossible.

May I suggest going through some of my tutorials. Thesaurus variables are 1 way of solving this (for abstract knowedge, aka lists of words). Asset variables in the input section (something I haven’t done yet) can be used for concrete knowledge (which colors can an eye have).

There are two really good solid reasons why you should forget about using regular expressions and should start acquiring the knowledge and tools to use context free grammars.

Who ever said anything about regular expressions. I am talking about pattern matching. It’s about how regular expressions are implemented: use this technique to define your own regular expression like language, that does have the features you need. This is what I did.

 

 
  [ # 10 ]
Andrew Smith - Dec 12, 2011:

Pattern matching on the other hand generally doesn’t need to ‘understand’ the entire input, just a part, so you can get away with gaps. But, you can still extract the data out of the matched patterns, so you can still do the ‘deep thinking’ before generating an answer.

Right, so you are saying that anything that is not understood should be ignored and discarded? That explains a lot. Seriously??? It’s better to process the wrong input than recognise, let alone acknowledge, that you don’t understand it?

No, I assume what Jan is saying is that if someone were to say something like, “Excuse me robot but I was just wondering if by any chance I could ask you a question and my question is, do you like eggs?”, the only part you need to worry about is “do you like eggs?”. There is no point in processing the waffle at the start. This is the approach I also use.

 

 
  [ # 11 ]
Steve Worswick - Dec 12, 2011:
Andrew Smith - Dec 12, 2011:

Pattern matching on the other hand generally doesn’t need to ‘understand’ the entire input, just a part, so you can get away with gaps. But, you can still extract the data out of the matched patterns, so you can still do the ‘deep thinking’ before generating an answer.

Right, so you are saying that anything that is not understood should be ignored and discarded? That explains a lot. Seriously??? It’s better to process the wrong input than recognise, let alone acknowledge, that you don’t understand it?

No, I assume what Jan is saying is that if someone were to say something like, “Excuse me robot but I was just wondering if by any chance I could ask you a question and my question is, do you like eggs?”, the only part you need to worry about is “do you like eggs?”. There is no point in processing the waffle at the start. This is the approach I also use.

Yep, that’s basically what I meant.

 

 
  [ # 12 ]

This is why it’s important to identify the “Noun Phrase” first. The modifiers are secondary and are used to qualify the actual “Head Noun”. No matter how “flowery” the user composes their input, the method used to parse the incoming data must be able to prioritize and this can only be accomplished by mapping the users input prior to the actual processing.

This approach requires substantial client-side preprocessing which has very little to do with pattern matching.

 

 
  [ # 13 ]

Maybe we should move this to a new thread?

I still fail to see why pattern matching is somehow cheating or deception. If someone says to my bot, “I have a red truck”, it will match a pattern, “I have a *” and set a variable called “has” equal to “a red truck”. So if the user asks, “what do I have?”, “what colour is my truck?”, “what do I have that is red?” and so on, it will query this variable and be able to answer.  Surely, this amounts to understanding?

Am I cheating if I use a match to start a fire instead of chipping flint together? No, I am merely using a more efficient method to achieve the same result.

Andrew says, “...how many mainstream artificial intelligence researchers have bothered to enter their software in the Loebner contest?” - Answer: None because they are afraid of their bots failing in public.

 

 
  [ # 14 ]

I already hinted that maybe we should move this to a new thread, but I guess Shaun is enjoying the attention that the thread is getting, even if it long since stopped being relevant to the original topic.

Pattern matching isn’t cheating or deception and I don’t believe anyone actually said that. The only claim that I’ve made about pattern matching is that it is not enough by itself to accomplish anything more than a superficial illusion of intelligence, and whether or not that is cheating depends on the context. As long as it’s just a game to all concerned then it’s no more cheating than playing a game of “make believe” would be.

If I understood Dave correctly, he is making the point that any kind of processing and regurgitation of information is ultimately pattern matching and if you want to quibble over semantics, on one level that’s true. However for any practical application it is not true. Noam Chomsky proved that more than fifty years ago with what is now known as the Chomsky hierarchy of formal languages.

http://en.wikipedia.org/wiki/Chomsky_hierarchy

In a nutshell, you can divide all languages (i.e. patterns of interest) into four broad categories according to how difficult it is to parse (i.e. recognise) them.

The simplest are called “regular expressions” and these have a precise mathematical definition. These are what I’m referring to when I say “regular expression” but I’ve got no idea what Jan is referring to when he uses the term. I suspect that neither does he. Regular expressions can be parsed with a finite state automaton and require no additional storage.

Somewhat more complicated than regular expressions are “context free grammars” and again they have a precise mathematical definition. They are sufficient for recognising any pattern that might occur in most natural languages (regular expressions are *not* sufficient for this). They are also a lot more difficult to parse, with most parsers only being able to handle a restricted subset of them. This is why I’ve put so much effort into developing a parser which can handle the full range of context free grammars. Context free grammars can be parsed with a finite state automaton and a stack for storage, but to do so efficiently requires some pretty sophisticated algorithms.

Even more complicated than context free grammars are “context sensitive grammars”. These are even more difficult to parse because they may require a combinatorially increasing amount of computation to parse them. (That is, they are NP-hard to parse and it isn’t practical to do in the general case.) They can be parsed with a Turing machine (e.g. a finite state automaton and *two* stacks for storage), a finite amount of storage, and a potentially astronomical amount of computation. The most complex patterns are called “recursively enumerable” and they require nothing short of a Turing machine with an unlimited amount of storage to parse them, so it might not be possible to parse them at all.

As for your bitchy comment about artificial intelligence researchers being afraid of failing in public Steve, you couldn’t be more wrong. Everyone that I know of is out there founding and running billion dollar companies. Here’s just the latest example for your instruction:

http://www.inc.com/lindsay-blakely/can-this-startup-eliminate-social-media-overload.html

 

 
  [ # 15 ]

I guess I did sort of get us off on a tangent. My bad. I’ll see about splitting the thread later today, since it’s half past three, and well past my bedtime. smile I’ll make my comments at that time, too. raspberry

 

 1 2 3 >  Last ›
1 of 5
 
  login or register to react