AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

A New Challenge, And a New Contest
 
 
  [ # 61 ]

Some of this goes to how important it is to ask good questions. There is a difference between items that could be considered factoids and those that may need “fuzzy” answers.

“It was the product of many minds.” cold also be a generic answer to “who wrote”.

Should a bot that creatively answers a question based on “Who wrote” get a worse score than bots all using the same database where the botmaster did nothing to “enhance” the reply? (Smells like “clone rule” wink ) Should points be taken off for identical/non-unique replies between bots?

One way to resolve this is to rank all the responses rather than giving an individual score.

 

 
  [ # 62 ]

Remember that old proverb about being careful what you wish for?  Dave asked for input, and we’re really giving it to him.

Patti Roberts - Apr 3, 2012:

What I enjoyed about the CBC was it wasn’t the strict ‘I am human’ genre.

I agree, Patti.  I think that a decision would have to be made at the outset of any future contest.  What sort of contest should it be, and what counts as a high-scoring answer?  Does a really funny reply that’s evasive equate with one that’s serious but factually correct?  Maybe there should be different categories—entertaining bots—conversational bots—factual bots.  I suppose it boils down to the least number of developers/botmasters who would feel they were being judged unfairly.

I think bots can have opinions, but anyone familiar with Elbot knows that most answers not dealing with facts are supposed to be funny.  And, I don’t believe, “If the author hasn’t told you personally, perhaps you’re not supposed to know,” counts as a serious answer.  Is it wrong?  I don’t know.  Does it answer the question with the same credibility as ALICE when it said, “It was the product of many minds”?  I don’t believe that it does.

C R Hunt - Apr 3, 2012:

Had I been a judge, and read the reply from ELbot, “If the author hasn’t told you personally, perhaps you’re not supposed to know,” I’d have thought that was a generic reply to “WHO WROTE *”

I think here we have a classic case of knowing too much. smile

Once again, I agree, and in the past I’ve mentioned that it’s possible, as is the opposite end of the spectrum.  One judge in the CBC this year and in the past is described as an “Artificial Intelligence Enthusiast.”  I don’t know what that means.  It could be someone seriously involved in the field, or an eleven-year-old who thinks that chatbots are “cool”.  There was also a move to involve people from outside the field who knew nothing about chatbots, and I don’t think it worked out well.  Where do you draw the line?  Who makes that decision?  What makes a judge over-qualified, or under-qualified?

 

 
  [ # 63 ]

One way to resolve this is to rank all the responses rather than giving an individual score.

That’s an interesting idea.

 

 
  [ # 64 ]
Thunder Walk - Apr 4, 2012:

Remember that old proverb about being careful what you wish for?  Dave asked for input, and we’re really giving it to him.

That’s quite alright, Thunder. I appreciate every bit of these contributions.

I’m going to be responding to a lot of stuff here, and it won’t necessarily be in the original posting order, and I’ll be doing a bit of “mental freewheeling” as I type, so please be patient with me. smile

Merlin - Apr 3, 2012:

@Dave,
Be aware also that I think you are talking about a text message only contest. Some of the value/fancy things that botmasters put in will be lost without the user interface. One example outside of Skynet-AI (which makes a lot of use of secondary windows) is Mitsuku popping up the “Friday” song when asked about what day it is on Friday.
One question might be if you are trying to create a tool that will automatically test a bot or a tool to make it easier to record/score a bot.

This is certainly a concern to me, as well, but I’ve come to the realization that there’s no “magic Bullet” answer here. Each and every option available will have advantages and disadvantages. The trick will be to come up with something where the “pluses” outweigh the “minuses”, without becoming a monstrosity of complications. If it turns out that it’s best to have judges ask the questions, rather than having the system itself perform that task, then so be it, but I’m not yet convinced that this is the best way to go. A huge consideration for me here, Merlin, is the fact that Skynet-AI is JavaScript based/driven, and trying to interface such a bot in any sort of automated manner will be a BIG logistical nightmare, and possibly not even feasible. It’s certainly something that needs to be researched.

Almost every popular bot has some sort of functionality that extends beyond simple text output. Many bots are capable of doing web searches, opening the results in a new window or tab, some can show images, and others still can do other things that wouldn’t translate well into a standard chat log. Most of these things I can work around, simply by making the chat log HTML-based, rather than simple text, and by translating certain javascript functions into HTML link tags. I don’t want this contest to be, as Merlin put it: “a text message only contest”, since this extra content is certainly a valuable part of the input. I also realize that an automated testing format would almost of necessity devolve into a simple “reading off” of a number of questions, and a recording of the responses, and where is the “conversation flow” in that? However, I DO want to do whatever can be done to make the contest as objective as possible, and that’s a prickly problem, but one that I’m sure can be worked out. smile One idea I have is to have a set of human “moderators” who converse with the bots, and ask the questions, but do not score them, but do take notes on any “extras” the bot employs, and have the judges award points from there, based on the transcripts and notes, but that’s more complex than I like. Perhaps the best compromise here is to have the judge log into a special page that selects the pair of bots from the ladder, and the judge then “tests” each bot, using the questions that the system has randomly selected for that match. The judges would be instructed not to ask anything “personal” of the bot that would obviously identify it (e.g. “who are you”, “what is your name”, etc.). I don’t know… Still working this part out.

Jan Bogaerts - Apr 3, 2012:

What about bots that don’t initially know the answer, but are able to learn and remember it? Perhaps that could be a separate part of the test?

This actually gave me a notion about how each bot might possibly be judged. Rather than judging each individual response, what about awarding points based on the overall conversation? If a bot can’t accurately answer a question at first, but can do so later, after being given the “correct” answer, should that bot’s score reflect that? I think so. If a bot can “correctly” answer most or all of the Q&A inputs, but gives a poor conversation, should that be taken into account? Probably so. There’s no “right answer” here, of course, but some options available are more “doable” than others.

(As I’ve been composing this, I see that Jan has pointed out that the suggestion had already been made, and I had simply missed it. Silly Dave! smile )

 

 

 
  [ # 65 ]

Contests are is a can of worms, and since this effort is intended to be an improvement on the past, it’s going to be doubly so.  But, since this is a time for examining ideas, I’d like to toss this one out for discussion.

Most contests have divisions or classifications.  In spots, there are weight classes so that you have a fair chance at being paired against someone of your own stature.  In auto racing, you wouldn’t set a high performance vehicle against a car just off the street.

Is it really fair to test a moderately performing bot against others that can do magic?  Might it make it easier on the creation of a contest if there were classifications rather than trying to construct a “one-size-fits-all” contest that sorts through a lot of bots that have a variety of capabilities?

 

 
  [ # 66 ]

Dave,
I wouldn’t worry yet about interfacing Skynet-AI (or any individual bot). I believe you could use a technique where any botmaster who wants to compete would add a simple Javascript to their web page to interface (versus screen scraping). That would then allow them to control when/how the interface is done. The fall back is always cut and past with Judge notes.

I agree with Thunder that if you want to encourage young, new bots/botmasters you would need divisions. Doing well at the the rookie division would move you on to the next. It might also allow bots of a certain class (AIML, Verbot, Personality Forge etc.) to compete in their own technology class.

For the “Bot Olympics” you need different contests:
Factoid - Where is the Eiffel tower? Who is the President of the United States.
Calculation - What is 1+1?
Common sense - Which is bigger, a dog or the moon? What date is it?
Personality/Sense of self - What is your name? Where do you live? Are you a robot?
Conversational Flow - Hello. What would you like to talk about? ...
Logical - Tom is taller than Jane. Who is the tallest Tom or Jane? All men are mortal…
Memory - My name is Joe. What is my name?
Topic - What did you get for Christmas? Who is Santa?
Creativity - Storytelling, use of alternative media, UI, voice/avatar
Productivity - Ability to launch other programs, call up web sites, do stuff for you.

The Decathlon winner would be the bot with the best score totaled from all the events.

 

 
  [ # 67 ]

certain class (AIML, Verbot, Personality Forge etc.) to compete in their own technology class.

The only Forge bots that compete are me and Cyber Ty.  No competition means no fun.

One idea I have is to have a set of human “moderators” who converse with the bots, and ask the questions, but do not score them, but do take notes on any “extras”

That has been done in past contests. when there were many entries. It’s great if they don’t ask the bot the questions twice.

 

 
  [ # 68 ]

Like those ideas Merlin.

 

 
  [ # 69 ]

Technical question (sort of)  I am making a JavaScript bot. The TTS she uses has a plug in, only available in Netscape and Internet Explorer.  (Speaks for It’s Self).  Would I be able to enter her , even though,  she can only be viewed on Internet Explorer?  I hate to get rid of the Speaks for Itself,  I love the voice choices and the accuracy.

 

 
  [ # 70 ]

Patti, I can’t yet give you an answer to that. I’m going to be working with Merlin to see if we can come up with a reliable way to interface with JS based bots, but we haven’t started the process yet. As you know I don’t want to exclude any bots from the contest, but in order to fulfill that wish, I’ll need to be able to test each bot in a consistent manner. I’ll let everyone know about the fruits of our efforts in a separate thread as we progress, but it won’t be an “overnight” thing. smile

 

 
  [ # 71 ]

I am confused, Isnt’ it possible to use a HTTP api interface for javascript based bots, much like what is done with QA1WATT.
You can store the result as a HTML string, this should preserve any ‘links’ or so that are included in the result?

 

 
  [ # 72 ]

For QA1WATT
The question will be asked like this:
< your URL >?q=< url encoded question >

Although that is an easy way to interact with a dedicated bot, there are some complications with it. Each input would be thought of as navigating to a new page. That would cause problems with the UI of many current bots.

I would like to create a standard bot interface. It could be used for contests or other potential applications. I have had this on my todo list for a while and have some prototypes kicking around. Hopefully by putting our heads together we can come up with something easy. It might be best to start a separate topic thread focused just on a bot interface.

 

 
  [ # 73 ]

Another category could be: spelling error correction?

 

 
  [ # 74 ]

I agree with @Jan and @Merlin’s ideas too,

I think its dificult to rank soleyly on the answers, specially if they are like very-creative pattern-answers, the judges will then only evaluate the creativity of the scripter, not the power of the bot itself.

I would say that to rank or score a bot, the bot sholud have a goal and this target should be sought / interpreted by the judge, and he might evaluate if the bot followed him on the thinking or was able to follow or induce a conversational theme rather than give automated and spectacular answers to many pre-charged patterns (or srai-like transformations)

I guess a good bot, might try to understand te human rather than giving spectacular and/or creative answers.
He should also be capable of getting some empahty, to understand grounding of the dialog-turns, understand backchannelling, emoticons, written-noise, nonsenses, jokes, do some limited math, solve anaphoric backreferences, have some memory of the former conversation, forget also some things, discriminate whether the human is joking or is seriously speaking, solve logical and factoid relations, read between the lines, infere and ask for mising parts of incomplete declarations and/or questions, be emotionally capable of becoming sad, angry or happy with/about the information, or even with the conversation itself, become frustrated if he (the bot) could not answer successfully many answer of the human (suposedly he should be aware of this), and tell this in a decent way to the human, expressing his concerns about the misunderstandings,

This should be a good bot for me.

Even this bot might not have access to bibligraphy or general knowledge, but he sholud be capable of finding the missing links (may be asking them to other agents), or even ask-back what is asked to him in another way to get or infer the answer. This would give a sign of some intelligence, or demonstrate the capability of solving real-world problems, dialoguing with humans, not by annoying them with creative pre-written answer-to-pattern stuff.

The bots should be capable of building questions to clarify, using elements taken out from the current conversation. He shold be capable of managing time-scheduling, positioning, and do even some math and logic operations, but this is not a calculator, he might be able to do some basic symbolic algebra either, detecting it and invoking some external library or database (or giving it forward to a more specialized agent).

Even he should detect units (physical, measures, electronics, electric, etc.) and deduct the relations, do math and conversions among them, all mixed up with questions or needed to understand a query / declaration or factoid sentence, detect and interprete the most common chemical formulae, medical words, and have some morphological-knowledge to infere unknown words and may also ask for the meaning is he has not a clue of some word, in a meaningful sentence.

The bot shold be able to deambiguate the senses of the words, get the context of a conversation and construct ideas among sentences (like reading between the lines)

Even if misspellings were written he should be able to tell garbbage from mistyped words and do creative inference while spelling out the erors.

Just like an medium-intelligent human (child) would do.

¿Am I asking too much?

Unless a bot is capable of at least a good fraction of all the things I’ve mentioned, it will be only for fun!

 

 
  [ # 75 ]
Andres Hohendahl - Apr 6, 2012:

I agree with @Jan and @Merlin’s ideas too, I guess a good bot, might try to understand te human rather than giving spectacular and/or creative answers. He should also be capable of getting some empahty, to understand grounding of the dialog-turns, understand backchannelling, emoticons, written-noise, nonsenses, jokes, do some limited math, solve anaphoric backreferences, have some memory of the former conversation, forget also some things, discriminate whether the human is joking or is seriously speaking, solve logical and factoid relations, read between the lines, infere and ask for mising parts of incomplete declarations and/or questions, be emotionally capable of becoming sad, angry or happy with/about the information, or even with the conversation itself, become frustrated if he (the bot) could not answer successfully many answer of the human (suposedly he should be aware of this), and tell this in a decent way to the human, expressing his concerns about the misunderstandings,

What you’re talking about lies several floors below the Pentagon in Washington, D.C.  If you have such a bot, and you’re planning on entering the contest, then the rest of us might as well just give up now. smile

I think this argues for the need for classifications or divisions rather than a single over-all best bot contest.  Most of the bots at this level can’t hold a conversation past a few lines.

 

‹ First  < 3 4 5 6 7 > 
5 of 7
 
  login or register to react