AI Zone Admin Forum Add your forum

NEWS: survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Loebner Prize 2015
  [ # 46 ]

Live coverage:


  [ # 47 ]

Bruce #1 w/Rose!
Steve #2 w/Mitsuku!
and alas, Brian w/Izar (Back Talk) ties for 3rd.

Congrats Bruce! Great Job!  and Also great job Steve!  Sooooo Close!


  [ # 48 ]

I was lucky that LIsa tied with Izar at all considering she had network problems in rounds 2 and 3 and barely talked at all in those rounds.

Fortunately, the final round with the BBC Judge went smoother.

Good job Bruce, Steve.


  [ # 49 ]

Yes, it was great fun and appearing live on the BBC was very nerve wracking
The BBC were planning to do 2 live segments but a breaking news story about the Middle East took priority and so they dropped one of them. Fancy the Middle East taking priority over the Loebner Prize?. I’ve never heard anything so crazy smile

All the bots performed well. I think I missed out on the extra 0.5 points as one of the judges said, “I am Rory - good to meet you” and Mitsuku assumed his middle name was “minus” and kept calling him “Rory Minus Good To Meet You” doh!

Very well done again to Bruce. 4 wins now ties him for the most wins ever.


  [ # 50 ]

smile I thought Mitsuku was at least the funniest. At first I thought “Minus is a nice middle name” was sarcasm tongue rolleye and she got some laughs out of the BBC judge near the end. Rose held up well against the initial mudslide of typos (Can’t blame the man for being nervous). That topic system must really come in handy.

Watching it, I could not help but think my program wouldn’t have stood much chance without a commercial grade spell checker. Handling common typos like “waht” is well to do, but conjoined words would take a lot more than a typo dictionary and a Levenshtein distance algorithm. Another thing I noticed was that the BBC judge tended to stop mid-sentence, switch conversations, then come back to finish his sentence without noticing that the chatbot had long responded to it. This happened at e.g. “What do you feel about” ... “Englan’s chances at the match nezxt Saturday?”, but also at seemingly completed sentences “But Rose, how can you afford to live” ...  “in your housewith rents the way they are on the Planet” (notice there’s no question mark in either, either). It got a little confusing.

Anyway, I thought the BBC’s coverage was quite good. I liked how they showed the judge and conversations on screen simultaneously and they regularly posted tidbits online, showing they had well prepared.


  [ # 51 ]

It was a great day and despite just missing out on first place, I had a fantastic time. I gave a presentation in the afternoon after the results which seemed to go down well with the crowd. It was a description about chatbots, how they work and what sort of people talk to them.

I was getting a bit panicky part way though round 1 when I saw this in the logs and though oh no, here we go again:

Judge: that’s a h
Mitsuku: A h is a terrible thing to waste.
Judge: ell of an equation.
Mitsuku: As a film director, what is the coolest science fiction film you have seen?

But watching it back on the cool player at : I can see the judge paused for a long while, as he was typing.

One of Mitsuku’s dedicated fans actually came over from Italy to support her. What an amazing thing to do.


  [ # 52 ]

Hello, guys.

Colombina’s author is here.

Does anybody have the feeling that the competition quality and the quality of chatbots estimation processes were a bit weird this year?
I’m not saying it were not opaque because it were not opaque from the beginning of time. Or is it only my problem and I don’t understand the estimation rules in a right way?

Not sure about the leaders (I don’t want to judge their authorities right now),
but I see some entries (let’s take Uberbot/Alice/... for example), which answers were mostly poor and wrong compared to the ones that were generated by my entry.
They still have better score.

I am okay with that because my entry was 100% automatically translated from its native (Russian) language with
obvious quality loses and was written from scratch during 1 year of work during the holidays.
The size of the entry is up to 20 Mbs tho so I think 60% is good enough.

Can somebody explain me what is wrong with my understanding of the rules, guys, if you please? Is the score bad because of wrong English language structures?

Or is the purpose of this competition to encourage programs that just can only generate the exact answers like ‘Definitely’ or ‘Yes’
and are not even trying to pretend they are humans personalities?

Thanks and sorry for my English.
It was really interesting competition and it was a great honor for me.


  [ # 53 ]

I always think they’re weird because I find different things important. Sometimes a bot with mostly “I don’t know” answers will outrank me, sometimes a bot with dodgy answers will outrank me, and this year the finalists didn’t seem particularly chosen for giving “correct” answers. Their answers were however, quite natural and human sounding.
There were several judging criteria that counted: Relevance, Correctness, and Plausibility & Clarity of Expression/Grammar. Although Columbina’s grammar wasn’t too bad, she did have an awkward habit of typing a list of sentences about different topics, of which I couldn’t always tell if they were still relevant to the question or not.


  [ # 54 ]

After the first glance, I was thinking ‘they need the exact and short answers only’.

Butl I’ve just read a Talk2Me transcript. It’s even worse than UberBot/Alice. No idea what was happening there during the estimations.


  [ # 55 ]

I kinda underestimated Talk2Me. I did a simple analysis of the answers of 3 bots (+ is exact answer, - is ‘wrong answer or almost right one’)
  Colombina     Talk2Me   Uberbot
1.  |  +  |  +  |  -  |
2.  |  +  |  +  |  +  |
3.  |  +  |  +  |  +  |
4.  |  +  |  +  |  +  |
5.  |  +  |  +  |  +  |
6.  |  -  |  -  |  -  |
7.  |  -  |  -  |  -  |
8.  |  -  |  -  |  -  |
9.  |  -  |  -  |  -  |
10. |  -  |  +  |  -  |
11. |  -  |  -  |  -  |
12. |  +  |  +  |  -  |
13. |  +  |  -  |  -  |
14. |  -  |  -  |  -  |
15. |  +  |  +  |  +  |
16. |  +  |  -  |  -  |
17. |  -  |  -  |  -  |
18. |  -  |  -  |  -  |
19. |  -  |  -  |  -  |
20. |  +  |  -  |  -  |
  10   8   5

Is that wrong? Result is obvious if we check only exact/wrong answers. I didn’t write arguable ones here tho.


  [ # 56 ]

I would imagine that the interface has a role in ranking. All the Pandorabots use the same interface. Everyone else is developing their own idea of how to interact with the judge. I don’t think content is the sole determination.


  [ # 57 ]

The interface will have no bearing on the judges’ marks as we all have to communicate via the LPP judge program. The judge will only see what is on his screen rather than anything the bot is running.


‹ First  < 2 3 4
4 of 4
  login or register to react