AI Zone Admin Forum Add your forum

NEWS: survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

The Chatterbox Challenge - Past Experiences

I’ve started this thread because the topic of discussion about Laura’s “My Marie” has wandered of topic to the point where it’s become a discussion about our past experiences with the CBC. What follows is that portion of the original thread that wandered. smile

Please feel free to continue the discussion. I just felt that this would be a more suitable place to hold it, is all. smile


  [ # 1 ]

As a former Loebner Prize Contest judge, I second the advice that Dave just gave you.
Without insinuating anything about My Marie, in general, even scoring above average is an honor.  Entering experimenal bots in the contests is generally where all chatter robot masters started out.  And worry not, Wendell is prompt with technical support during the contest. You’ll see.


  [ # 2 ]
8PLA • NET - Mar 7, 2012:

And worry not, Wendell is prompt with technical support during the contest. You’ll see.

I agree. I emailed Wendell with some discrepancies about Mitsuku’s chatlog (no mention was made of the pictures she displayed in context with the conversation or that she opened a website showing nearby gas stations) and he amended her log almost immediately.


  [ # 3 ]

That’s a big issue with Marie too. Since she displays most of her responses in the form of search results and maps. This was one reason for wanting to pull her from the completion. I felt that her abilities and responses were not being properly recognized and logged accordingly.

I guess I need to speak to Wendell about that same issue.

Thanks Steve!

A footnote: I sent an email to Wendell expressing my concerns over the issue of what constitutes a correct response? If its in the form of a picture of search result, does that make it incomplete or wrong?


  [ # 4 ]

Judging is very subjective. If each of us scored each of the 32 transcripts we would all probably come up with different scores. The issue has been discussed for years, no two people give the exact same score. All you can hope for is that the judges are fair and consistent and discuss any outliers that show up.

In the email that Wendell sent out he referenced that the judges did make notes that are not on the transcripts. But, if something was presented that you think might have been missed, you should document it and drop a note to the CBC.


  [ # 5 ]

Merlin is spot on about the subjective voting. I remember last year getting this:

Judge 3:


2) Who will win the 2011 Chatterbox Challenge?
Bot: If I could predict things like that, I would be in Las Vegas instead of wasting time on here.

Awarded 0 points

Not quite sure what they expected her to answer to that. My bot isn’t psychic. Zero points was a little harsh I feel.


  [ # 6 ]

In the past I have had 3 judges give a response of 3-4 and one give a zero.  The same judge that gave a zero then went on to give another bot a 4 for something that I thought was at best a vague response. That 7 point swing could make a difference in the results. Years ago only 1 judge scored your bot. I hated it when I got the Russian Judge. wink

To overcome this in 2011, each Judge scored all the responses and they discussed why they scored them the way they did. In earlier years the top 10 bots automatically got in based on scores. Now, the top 9 are automatic and they discuss which should be the tenth.

I know Wendell is tying hard to make it a fair and well run competition. Scoring human conversation is not easy, and I suggest everyone try it at least once. You will find the results interesting. Scoring the top and bottom bots is easy, but the ranking of those in the middle are often influenced by personal taste.


  [ # 7 ]

My experiences with the CBC have been very positive. It has established deadlines and surfaced issues which drove some of my development and exposed a broad range of users to my bots. Just participating is a win.


  [ # 8 ]

The thing to remember is the rule in the CBC that states

or the
bot simply doesn’t know. Examples include…I have no idea, totally
clueless, your guess is as good as mine, etc.

equals 0, better a stupid answer than any form of I don’t know.


  [ # 9 ]

I think the only issue I had in the first round was the question about the “gas” station. We don’t have gas stations in the UK, as we call them petrol stations. This seemed to break rule 3 which states:

3) As this is an international contest the questions will not favor
any particular country. For example asking a question about a
certain country that only the people of that country would know.

However, Mitsuku managed to open a website which answered the question. I wondered what others thought about this or indeed any of the 10 questions in the first round?


  [ # 10 ]

Skynet-AI got caught on some of the questions that do not pertain/relate to a non-human persona.

As a virtual entity,
What did you get for Christmas and how far anything is from him aren’t topics that come up.

It was good to see that many of the bots got “a boost in my IQ ” for Christmas though. wink


  [ # 11 ]

I liked the questions. Easy, straight forward.


  [ # 12 ]

Since Marie operates from keywords that are associated by category, if she does not find a topic match she defaults to Google for a search. I did not have a holiday topic containing Christmas so no logical reply could be formed. Marie’s AI engine was designed more as a customized help agent that can be configured to respond directly to topic categories more than just random questions.


  login or register to react