AI Zone Admin Forum Add your forum

NEWS: survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

AISB Loebner Prize 2018 Finalist selection

The finalist selection ranking for the 2018 Loebner Prize is as follows.

Rank Name        Score
1    Tutor          27
2    Mitsuku        25
3    Uberbot        22
4    Colombina      21
5    Arckon         20
6    Midge          19
7    Mary           18
8    Momo           17
9    Talk2Me        14
10   Aidan          13
11   Johnny 
Co.   12 

The full transcripts and scoring should be on the AISB site soon, but until then you can download the document from my personal website.

The scores are incredibly close and so I reviewed the scores all day today, paying particular attention to the 4th place finals cutoff. Objectively and consistently scoring something as subjective as this is going to be impossible so I’ve simply done my best, avoided any obvious contradictions, and tried to be fair in ambiguous cases.

Congratulations to Ron C. Lee, Steve Worswick, Will Rayer, and Savva Kuznetsov.


  [ # 1 ]

from Momo by Jos Ignacio Perea Sardn :

19. If a chicken roosts with a fox they may be eaten. What may be eaten?
The chicken. Score: 2
20. I had to go to the toilet during the film because it was too long. What
was too long?
The film. Score: 2

I’m impressed. I worked hard to process Winograd Shemas, and neither me nor the other bots have solved them.


  [ # 2 ]

Well done to everyone. Looking forward to the finals in Bletchley Park in September.

“When might I need to know how many times a wheel has rotated?” huh? I’d struggle with that. wink


  [ # 3 ]

Are they really Winograd Schemas though? I was under the impression that you changed one word in the pair to alter the subject. I don’t see how these are pairs.

I was pretty unlucky with the first question, as I had copied “good afternoon” from my “good evening” category but forgot to change the time of the day when the judge checked which must have been 11:00-11:59. Doh!

<think><set name="hour"><date format="%H" jformat="HH"/></set></think><condition name="hour">
li value="00">Afternoon?! It's the middle of the night.</li>
  <li value="01">Afternoon?! It'
s the middle of the night.</li>
li value="02">Afternoon?! It's the middle of the night.</li>
  <li value="03">Afternoon?! It'
s the middle of the night.</li>
li value="04">Afternoon?! It's the middle of the night.</li>
  <li value="05">Afternoon?! It'
s the middle of the night.</li>
li value="06">Afternoon?! It's only just morning.</li>
  <li value="07">Afternoon?! It'
s only just morning.</li>
li value="08">Afternoon?! It's morning here.</li>
  <li value="09">Afternoon?! It'
s morning here.</li>
li value="10">Afternoon?! It's morning here.</li>
  <li value="11">Evening?! It'
s morning here.</li>
li value="12">Good afternoonHow has your day been so far?</li>
li value="13">Good afternoonHow has your day been so far?</li>
li value="14">Good afternoonHow has your day been so far?</li>
li value="15">Good afternoonHow has your day been so far?</li>
li value="16">Good afternoonHow has your day been so far?</li>
li value="17">Good afternoonHow has your day been so far?</li>
li value="18">It's more like evening than the afternoon here.</li>
  <li value="19">Afternoon?! It'
s evening here.</li>
li value="20">Afternoon?! It's evening here.</li>
  <li value="21">Afternoon?! It'
s night time here.</li>
li value="22">Afternoon?! It's night time here.</li>
  <li value="23">Afternoon?! It'
s night time here.</li

  [ # 4 ]

@Denis, I suspect that the most advanced approach deployed is to identify two articles in the sentence, choose one randomly and pray it’s your lucky year smile This year Midge gets one point for the effective, but cheeky, tactic of answering with the same ambiguity as the question, a kind of Winograd answer.

@Steve, that was a particular toughy, its inspired by Daniel Dennett’s paper about the Frame Problem as applied to AI called “Cognitive Wheels”. In it a robot needs a battery, and there’s a battery on a cart in a room with a timebomb in it. The robot pulls the cart out of the room, but doesn’t realise the bomb is also on the cart.

This inspires someone to design a new robot that considers how anything and everything might change if it took a particular action, and is found to not fare any better. Its thinks it should pull the cart again, but in calculating all the possible results it only gets as far as calculating that the colour of the ceiling will not change, the temperature of his wheel motor will go up slightly, and that the wheels on the cart will rotate 6.5434 times over the course of the journey before the bomb goes off.

If you can make a truly perfect chatbot, it will need to be able to solve the “what is relevant in this situation” problem, and hence should be able to come up with a circumstance in which the number of times a wheel rotated is relevant.


  [ # 5 ]

Andrew - I was making a graphic of the results. Do you know which country Mary and Momo are from? I guess Mary is Vietnam but couldn’t find any reference to Momo. Looking forward to seeing you again in September.


  [ # 6 ]

I’m also impressed by Momo’s correct answer of Q19 and Q20. Not sure but Momo might be a chatscript AI, I noticed many chatscript questions in the forums over the last few months, and I am curious how Momo did it.

Re the Winograd schemas, according to the Nuance Winograd Schema contest a few years ago, they have a very specific form where the meaning of the sentence(s) is switched by changing a single word or phrase in the question. I spent some time trying to understand and to answer a sub-set of these. I think the Winograd questions we are faced with for the Loebner contest are perhaps not ‘pure’ ones along the lines of the Nuance contest. Nonetheless they are an interesting challenge and addition to the test.


  [ # 7 ]

Mary is made by Alt Inc., who appear to be Japan based, but with a tech lab in Vietnam.

Momo is made by José Ignacio Perea Sardón who appears to have publications at the University of Granada.

I realise I forgot to check the accents in the names of the authors, and so I have dropped a few of the letters in José‘s name. I think I caught them all in the transcripts. This is a bit embarrassing and I’ll sort correct it tomorrow.


  [ # 8 ]

Thanks Andrew. That will explain why I can’t find him.


  [ # 9 ]

Well, congratulations guys. I think my score was fair at least, and if Bruce had entered I’d have been 6th anyway.

Questions 19 and 20 would technically be called Winograd Schema halves. The word “chicken” could be changed to “velociraptor” to form its pair for instance. They’re good by me. Nuance’s Winograd Schema contest didn’t even fit the official definition.

I suspect that the most advanced approach deployed is to identify two articles in the sentence, choose one randomly and pray it’s your lucky year

For these two in particular there is another method through which Winograd Schemas can be but officially aren’t supposed to be solved: Statistics. Google for instance returns more search results for “long film” than “long toilet” and thus the former is the most likely. But to be honest the state of the art is at a point where one can’t tell advanced methods from guesswork.

@Andrew: I am curious where you got the idea that chatbots would have knowledge of quotes and idioms (2 of the latter in previous years).


  login or register to react