AI Zone: chatbots.org

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

The Chatterbox Challenge - Past Experiences

Posted: Mar 8, 2012

Dave Morton

Administrator

Total posts: 3111

Joined: Jun 14, 2010

E-mail Dave

I’ve started this thread because the topic of discussion about Laura’s “My Marie” has wandered of topic to the point where it’s become a discussion about our past experiences with the CBC. What follows is that portion of the original thread that wandered.

Please feel free to continue the discussion. I just felt that this would be a more suitable place to hold it, is all.

Posted: Mar 7, 2012

[ # 1 ]

∞Pla•Net

Guru

Total posts: 1297

Joined: Nov 3, 2009

E-mail ∞Pla•Net

As a former Loebner Prize Contest judge, I second the advice that Dave just gave you.
Without insinuating anything about My Marie, in general, even scoring above average is an honor. Entering experimenal bots in the contests is generally where all chatter robot masters started out. And worry not, Wendell is prompt with technical support during the contest. You’ll see.

Posted: Mar 7, 2012

[ # 2 ]

Steve Worswick

Administrator

Total posts: 2048

Joined: Jun 25, 2010

E-mail Steve

8PLA • NET - Mar 7, 2012:
And worry not, Wendell is prompt with technical support during the contest. You’ll see.

I agree. I emailed Wendell with some discrepancies about Mitsuku’s chatlog (no mention was made of the pictures she displayed in context with the conversation or that she opened a website showing nearby gas stations) and he amended her log almost immediately.

Posted: Mar 7, 2012

[ # 3 ]

Laura Patterson

Senior member

Total posts: 250

Joined: Oct 29, 2011

E-mail Laura

That’s a big issue with Marie too. Since she displays most of her responses in the form of search results and maps. This was one reason for wanting to pull her from the completion. I felt that her abilities and responses were not being properly recognized and logged accordingly.

I guess I need to speak to Wendell about that same issue.

Thanks Steve!

A footnote: I sent an email to Wendell expressing my concerns over the issue of what constitutes a correct response? If its in the form of a picture of search result, does that make it incomplete or wrong?

Posted: Mar 8, 2012

[ # 4 ]

Merlin

Guru

Total posts: 1081

Joined: Dec 17, 2010

E-mail Merlin

Judging is very subjective. If each of us scored each of the 32 transcripts we would all probably come up with different scores. The issue has been discussed for years, no two people give the exact same score. All you can hope for is that the judges are fair and consistent and discuss any outliers that show up.

In the email that Wendell sent out he referenced that the judges did make notes that are not on the transcripts. But, if something was presented that you think might have been missed, you should document it and drop a note to the CBC.

Posted: Mar 8, 2012

[ # 5 ]

Steve Worswick

Administrator

Total posts: 2048

Joined: Jun 25, 2010

E-mail Steve

Merlin is spot on about the subjective voting. I remember last year getting this:

Judge 3:

Mitsuku

2) Who will win the 2011 Chatterbox Challenge?
Bot: If I could predict things like that, I would be in Las Vegas instead of wasting time on here.

Awarded 0 points

Not quite sure what they expected her to answer to that. My bot isn’t psychic. Zero points was a little harsh I feel.

Posted: Mar 8, 2012

[ # 6 ]

Merlin

Guru

Total posts: 1081

Joined: Dec 17, 2010

E-mail Merlin

In the past I have had 3 judges give a response of 3-4 and one give a zero. The same judge that gave a zero then went on to give another bot a 4 for something that I thought was at best a vague response. That 7 point swing could make a difference in the results. Years ago only 1 judge scored your bot. I hated it when I got the Russian Judge.

To overcome this in 2011, each Judge scored all the responses and they discussed why they scored them the way they did. In earlier years the top 10 bots automatically got in based on scores. Now, the top 9 are automatic and they discuss which should be the tenth.

I know Wendell is tying hard to make it a fair and well run competition. Scoring human conversation is not easy, and I suggest everyone try it at least once. You will find the results interesting. Scoring the top and bottom bots is easy, but the ranking of those in the middle are often influenced by personal taste.

Posted: Mar 8, 2012

[ # 7 ]

Merlin

Guru

Total posts: 1081

Joined: Dec 17, 2010

E-mail Merlin

My experiences with the CBC have been very positive. It has established deadlines and surfaced issues which drove some of my development and exposed a broad range of users to my bots. Just participating is a win.

Posted: Mar 8, 2012

[ # 8 ]

Patti Roberts

Experienced member

Total posts: 66

Joined: Feb 11, 2011

E-mail Patti

The thing to remember is the rule in the CBC that states

or the
bot simply doesn’t know. Examples include…I have no idea, totally
clueless, your guess is as good as mine, etc.

equals 0, better a stupid answer than any form of I don’t know.

Posted: Mar 8, 2012

[ # 9 ]

Steve Worswick

Administrator

Total posts: 2048

Joined: Jun 25, 2010

E-mail Steve

I think the only issue I had in the first round was the question about the “gas” station. We don’t have gas stations in the UK, as we call them petrol stations. This seemed to break rule 3 which states:

3) As this is an international contest the questions will not favor
any particular country. For example asking a question about a
certain country that only the people of that country would know.

However, Mitsuku managed to open a website which answered the question. I wondered what others thought about this or indeed any of the 10 questions in the first round?

Posted: Mar 9, 2012

[ # 10 ]

Merlin

Guru

Total posts: 1081

Joined: Dec 17, 2010

E-mail Merlin

Skynet-AI got caught on some of the questions that do not pertain/relate to a non-human persona.

As a virtual entity,
What did you get for Christmas and how far anything is from him aren’t topics that come up.

It was good to see that many of the bots got “a boost in my IQ ” for Christmas though.

Posted: Mar 9, 2012

[ # 11 ]

Patti Roberts

Experienced member

Total posts: 66

Joined: Feb 11, 2011

E-mail Patti

I liked the questions. Easy, straight forward.

Posted: Mar 9, 2012

[ # 12 ]

Laura Patterson

Senior member

Total posts: 250

Joined: Oct 29, 2011

E-mail Laura

Since Marie operates from keywords that are associated by category, if she does not find a topic match she defaults to Google for a search. I did not have a holiday topic containing Christmas so no logical reply could be formed. Marie’s AI engine was designed more as a customized help agent that can be configured to respond directly to topic categories more than just random questions.

‹‹ My friendly pre-CBC contest The CBC first round results ››

Search the Forum

Forum Profile

Forum Subscription

Forum Moderators

On Our Admin Forums

Partner Forums

Science Statistics

Chatbot Statistics

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

How many team members (marketing, sales, IT and customer support) will be involved in
your chatbot system?

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

We're putting your report together.

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

What chat automation functions are most important to you? Check all that apply.

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

Who should we send the information to?

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

Who should we send the information to?

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

What is the best number to reach you?

Search the Forum

Forum Profile

Forum Subscription

Forum Moderators

On Our Admin Forums

Partner Forums

Science Statistics

Chatbot Statistics

Use our Chat Match Tool to get started with Chatbots for Business

Compare features, pricing, and reviews from award-winning providers based on best fit for your business.

How many team members (marketing, sales, IT and customer support) will be involved in your chatbot system?

Compare features, pricing, and reviews from award-winning providers based on best fit for your business.

We're putting your report together.

Compare features, pricing, and reviews from award-winning providers based on best fit for your business.

What chat automation functions are most important to you? Check all that apply.

Compare features, pricing, and reviews from award-winning providers based on best fit for your business.

Who should we send the information to?

Compare features, pricing, and reviews from award-winning providers based on best fit for your business.

Who should we send the information to?

Compare features, pricing, and reviews from award-winning providers based on best fit for your business.

What is the best number to reach you?

Compare features, pricing, and reviews from award-winning providers based on best fit for your business.

Subscribe

Use our Chat Match Tool to get started with
Chatbots for Business

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

How many team members (marketing, sales, IT and customer support) will be involved in
your chatbot system?

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.