AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

A New Challenge, And a New Contest
 
 
  [ # 76 ]

I think this argues for the need for classifications or divisions rather than a single over-all best bot contest.  Most of the bots at this level can’t hold a conversation past a few lines.

True. But how do you determine in which category a bot should start?

 

 
  [ # 77 ]
Jan Bogaerts - Apr 6, 2012:

I think this argues for the need for classifications or divisions rather than a single over-all best bot contest.  Most of the bots at this level can’t hold a conversation past a few lines.

True. But how do you determine in which category a bot should start?

First, the contest organizer(s) would need to come up with definitions for each category and define the parameters.  Then, it would be up to the entrant to pick in which segment they wish to compete.  Those not sure could ask the organizer(s) for help by describing their bots platform and abilities.

Since entrants are apt to select a division they’re sure to win, I’d suggest adopting the rules of NHRA drag racing, where there’s a post-run inspection of vehicles before the results are made official, and cars are weighed, their fuel checked, and a complete engine teardown is done after an event victory.  Obviously, I’m speaking figuratively, but I think some sort of “closer inspection” couldn’t hurt, even if it’s just a review of the transcripts to see if there was fairness on the part of the judges as well as the entrants.

Also, having a rule that allows any other entrant to challenge the results might dissuade super bots from entering lower classes to ensure a win.  Up until now, all dissatisfied contestants could do is complain in the forums, and we’ve been stuck with the results whether we like them or not.  No questions answered, and no recourse for those who feel they’ve been treated unfairly.

 

 
  [ # 78 ]

Probably 1 very good classification could be based on the platform. That’s a bit like ‘engine classes’.

 

 
  [ # 79 ]

If you have such a bot, and you’re planning on entering the contest, then the rest of us might as well just give up now.

Absolutely

 

 
  [ # 80 ]
Andres Hohendahl - Apr 6, 2012:

¿Am I asking too much?

Unless a bot is capable of at least a good fraction of all the things I’ve mentioned, it will be only for fun!

Given the nature of chatbots these days, I think you ARE asking a bit too much, I’m afraid. smile And one of the major goals for the contest will be to have fun! raspberry

RE: classifications, I think we’re over-thinking things here, to be honest. I can certainly understand the reasoning behind wanting a “pro versus amateur” competition, but who decides which bot is which? The botmaster? some sort of “entry exam”? Some other means? Each of these methods has their share of problems, I think. smile

I do think that having separate categories for conversational bots and virtual assistants is something that should be pursued, but even that should probably take place in a later competition. For this first contest, I want to start out relatively small, and grow over time. The simpler the competition, the fewer bugs we’ll have to work out. smile

Thunder Walk - Apr 6, 2012:

Up until now, all dissatisfied contestants could do is complain in the forums, and we’ve been stuck with the results whether we like them or not.  No questions answered, and no recourse for those who feel they’ve been treated unfairly.

I agree here, up to a point. I intend to have some sort of means by which individuals can address issues and/or complaints, and where someone will answer questions, to try to ensure good communications regarding the contest, but second-guessing the decisions of the judges after the results have been released is counter productive. I’m of the opinion that there should be some discussion between the judges (and possibly the contest management, in cases of deadlock) prior to releasing the results, but after that process is complete, the old phrase “The judges’ decisions are final” will certainly come into play.

 

 
  [ # 81 ]

@Thunder
Of course!
I enumerated and stated what I would like to be a bot-conversational skills (those are my goals - most of them unfulfilled) This might be a good start for the classification of any-bot skills.

My proposal is to measure all the features, all collected with a mixed-free conversation with topics, specific goals and sections, some might be fulfilled by specialized judges, like the path and units testing, others might be done by skilled people, who might only do some of the testings, then all of them will be put into a comparison and finally we must agree the scoring of each of them to compose a general score, or may be several joint-ranks ones like this ones (not complete):

best conversational bot,
- most-normal/human
- best scientific/nerd
- best overall deduction-capability,
  etc.

Bot Intelligence Dimensions Proposal

EMOTIONS
Emotional Skills: none | aware | gentle | unstable | bad | stupid | not-aware.
Correctness: none | funny | gentle | harsh | bad-humored

MEMORY
Anaphoric Resolution Capability: none | regular | good | excellent
BackReferencing Resolution Capability: none | some | regular | good | excellent
Context Awareness: none | some | good | excellent
Conversational Fact-Memory:  none | regular | good | excellent
Interpreting ungrammatical stuff: good | bad | absent

SCIENTIFIC
International Units Management: none | regular | good | excellent
Units Management Fulfillment (number of manged units): #of-units 0+
Unit Conversion: none | some | full
Numeric Awerness (might be combined): none | numeric format | scientific-format | word-spelled format | romanic
Math Resolution Level: none | basic-algebra & numeric | scientific-math | analytic-functions (analysis) | theorem-proofing
Logical Resolution: none | basic-boolean | math-combined | advanced
commenting+Creativity: none | some | regular | awesome

DIALOG
Story-Teller/Tutor: none | some | regular | awesome
Rephrasing capabilities (when not understanding a sentence/question/answer/etc.)
Backchannel handling: none | some | good | excellent
Grounding Detection: none | some | correct | excellent

LANGUAGE
MultiLingual: yes / no
Languages: English / Spanish | French | Portuguese | German | Russian | etc. (might be more than one at a time, for a multilingual bot)
Language Detection/Inference: none | some | regular | awesome
Ungrammatical-Nonsense-Detection: none | some | regular | awesome
Spell-correction/inference: none | some | regular | awesome

DEDUCTIONS
Factoid Question-Answer Resolution: none | some | regular | awesome
General Culture Question-Answer Resolution: none | some | regular | awesome
Search Capabilities (internet/database): none | some | regular | awesome
Dialog (current) Question-Answer Resolution: none | some | regular | awesome
Learning form example: none | some | regular | awesome
Inference of properties/answers when no information available: none | some | regular | awesome

I may be missing more than an important category
Am I wrong with this classification-type?

 

 
  [ # 82 ]

I have a lot of respect for developers who have created something new, as well as any botmaster who has the courage to enter a contest with a clone of any kind.  But, I’m reminded of the 2010 Loebner and how one judge was “fooled” into thinking the bot was human.  As one viewer observed:

http://www.i-programmer.info/news/105-artificial-intelligence/1496-meet-suzette-prize-winning-chatbot.html

“So the 2010 Loebner prize was won because of the un-humanlike behavior of a real person rather than like the human-like behavior of a chatbot.”

Reading the transcripts, I was disappointed.  I have to wonder if there had been a review before the announcement, if the final results would have been the same.

 

 
  [ # 83 ]

Yup. I’m very mindful of that particular incident, as well as other examples, from other contests in other fields. I’ve also looked into the judging protocols from other types of competitions, from sporting events to television talent shows, and what I’ve found is that, in a large percentage of cases, there is some sort of review or deliberation process involved prior to any announcements of any decisions that affect the competition, whether it’s awarding penalties to one participant or another, or deciding the winning contestant. To my mind, this deliberation process will help to greatly reduce subjectivity, and is therefor an important part of the process.

 

 
  [ # 84 ]

Whatever you do, please don’t use the loebner protocol (or some variation) where you send a letter at a time. Every communication system I know of that ever tried this, stopped doing it, simply because ‘real’ people hate it. (google wave anyone) How many times have you typed something, only to delete it and write something completely different?

 

 
  [ # 85 ]
Jan Bogaerts - Apr 9, 2012:

Whatever you do, please don’t use the loebner protocol (or some variation) where you send a letter at a time. Every communication system I know of that ever tried this, stopped doing it, simply because ‘real’ people hate it. (google wave anyone) How many times have you typed something, only to delete it and write something completely different?

That brings to mind something I’ve been wondering about.  Should mistakes be counted as something positive because it makes a bot seem more human?  How do you know when you’re viewing an actual mistake, and when you’re seeing a device being employed to create the illusion that a judge is chatting with a human rather than a bot?

A few years ago at the AI Nexus Forum, we started keeping track of various errors in AIML including spelling errors and typos.  Someone once made the comment that such mistakes made the bot seem more human, and should therefore be left unchanged.

At some point, I observed in the Loebner contest logs where a botmaster employed BackSpace, and after that, others seemed to be using it.  Since backspacing is a common “human-like” activity, and it would create a delineation between bots that did that, and bots that never backspaced, should it be considered unfair to permit it’s usage, or is it something that demonstrates an advanced bot worthy of recognition?

 

 
  [ # 86 ]

There’s no “danger” at all of using a “character-at-a-time” format like the one that the LPP (Loebner Prize Protocol) employs, for two reasons. First off, because we’re NOT going to be trying to “build a better human” here, and as such, mistakes (and their corrections) aren’t going to be considered one way or the other. Secondly, the level of script complexity, from the perspective of both the contest and the botmaster, would be too high, creating a barrier that would cause many botmasters to decide not to enter the contest. Also, if you look at chat rooms and instant messenger apps, the overwhelming majority of them do “line-at-a-time”, or “volley-at-a-time” transport methods, pretty much for the same reason; complexity.

To illustrate the difference between these transport methods, Skype, AIM and Yahoo messenger all use “line-at-a-time” transport methods, whereas forum scripts, such as chatbots.org, use “volley-at-a-time” transport methods. If you have ever chatted with Mitsuku, her Flash interface seems to use “character-at-a-time”, but that’s just her Flash GUI transforming the output for aesthetic purposes. Her core script (Pandorabots) is actually a “volley-at-a-time” setup. smile

 

 
  [ # 87 ]

Dave is correct in that I get the full output for Mitsuku from the server and display it a character at a time in most of the bot’s interfaces. However, as part of Mitsuku’s website I have a Turing Test where people can speak to “someone” and decide whether it is a human or a robot:

http://www.square-bear.co.uk/mitsuku/turing

Looking at the comments, I find that most of the people who think it is human are the ones fooled by it’s backspacing, misspellings, variable tryping speeds and one character at a time typing. If you are trying to build a chatbot to pretend it is human, I would STRONGLY advise keeping this format but as this is a chatbot contest we are discussing and are not pretending to be human, a message at a time protocol is better to save unnecessary work.

 

 
  [ # 88 ]

Rather like the old Univac

http://www.youtube.com/watch?v=j2fURxbdIZs

 

 
  [ # 89 ]

Dear all
Is there any comment on my Bot Intelligence Dimensions Proposal ?
I think the conversation here had gone to… wherever, just got funny!

I mean that a “human fooler” score might be given to a magician not to a serious programmer, (called here as botmaster)

To mistype logical (statistically) using the keyboard closer-letters or do some classical misspellings don’t render the fact that it’s fake! I also did this as part of the humanization of the response, but it drilled my client’s mind and they promptly asked me to eliminate this behavior ASAP

What I mean is to rate a bot as follows:

First based on the native platform abilities (not added abilities performed by skilled programmers, using categories and backtracking ad a resource) I mean real abilities, like NLP, syntactic/semantic recognition (internally hard-coded or an AI-learning parser)

Second based on the “Botmaster” skills to perform a fluent conversation.

Third, the overall experience, based on the above mentioned Bot Intelligence Dimensions

that’a all, thanks

@Patty, very interesting the video on the UNIVAC
May be in 100 years they show a video of us, running programs and bots, in old-crumbled historical PCs and we all might sound as ridiculous as we see this Univac today!

@Steve, played with your Mitsuky for 5 minutes, she is interesting, and the short term conversational memory is well implemented, I can see a lot of your work doing this and understanding patterns, the repetitions and so on ¡good work! congratulations! I wander what this could be if there are no patterns but some understanding below the surface, at least a very simple one.

@all
Another Fact that pops-up to me yet!

¿why not implement an AI child-bot?

A bot that resembles a learning child, with no knowledge at all, it shold have a limited vocabulary but a immense curiosity and some good learning capabilities (even on vocabulary), and the voting should be as the smarter one and the rated human-equivalent age it implements!

any comments?

 

 
  [ # 90 ]

@all

I have written a description (sorry, but it’s rather commercial) of my new designed platform translated into English, here is the document link:

Disclaimer
At the time of this writing I have not heard/read about RiveScript not over other good platform called ChatScript (Bruce Wilcox, herein member) who won the 2010 Loebner Prize.

They all have good approaches to AIML-like scripting, powered up by some Wordnet and NLP processing, which goes closer to my platform, but all of them are at this time lacking multilingual approach, and natural language generation skills, but I must say they are fine pieces of programming-art, congratulations to both!

I will soon make a better English document and translate the whole AndyScript manual (over 150 pages and growing) - my promise

 

‹ First  < 4 5 6 7 > 
6 of 7
 
  login or register to react