AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Loebner 2013 - Judge Standards and Scoring
 
 
  [ # 16 ]
Don Patrick - Sep 17, 2013:

As much as I share your view, the Turing Test remains a game of deception. Fooling the chatbot with typos is as valid a method as when chatbots use typos to fool the judges, isn’t it? I’m not convinced he did it on purpose though. He was just as bad towards the human confedorates.

I’m inclined to agree with this. Here is an excerpt from Alan Turing’s original paper, ‘Computing Machinery and Intelligence’:

A more specific argument based on ESP might run as follows: “Let us play the imitation game, using as witnesses a man who is good as a telepathic receiver, and a digital computer. The interrogator can ask such questions as “What suit does the card in my right hand belong to?” The man by telepathy or clairvoyance gives the right answer 130 times out of 400 cards. The machine can only guess at random, and perhaps gets 104 right, so the interrogator makes the right identification.”

From this I inferred two things:
1. The father of computer science believed in ESP. Then again, I suppose that there’d been less critical analysis of the ‘evidence’ for parapsychology in his time, such that his belief might be, if not entirely excusable, at least understandable.
2. As far as Turing was concerned, the judge’s job was not simply to hold conversation but to try to discern which of his conversational partners was the human and which was the computer by any means at his disposal within the rules. In his view, even techniques that one might consider ‘cheap’ are nonetheless fair game.

That said, though, I obviously still disagree with this judge’s approach in general, as unlike in a pure Turing test his job was not merely to deduce which of his conversational partners was the bot, but also to judge how human-like the bot is even in the event that it’s clearly recognizable as a chatbot. The Loebner Prize is not a pure Turing test - it is a competition.

 

 
  [ # 17 ]

The final paragraph in “The Argument from ESP” section of Turing’s paper:

“If telepathy is admitted it will be necessary to tighten our test up. The situation could
be regarded as analogous to that which would occur if the interrogator were talking to
himself and one of the competitors was listening with his ear to the wall. To put the
competitors into a ‘telepathy-proof room’ would satisfy all requirements.”

So Turing didn’t want to allow telepathy to be a factor. Using telepathy would be against the requirements for his test.

Also, Darryl Bem’s experiment “Feeling the Future” gives evidence for psi phenomena. So far it hasn’t been replicated, but as Turing notes, “With ESP anything may happen.” So replication has problems, because the results may hinge on the attitude of the experimenters, or certain individuals with high ability being present in one study but not in another, etc.

 

 
  [ # 18 ]

As a human if I got half of those Judge messages from someone via chat I would just disconnect or say something definitely inappropriate then disconnect.  Maybe the bots should do the same?  :D

 

 
  [ # 19 ]

smile I’d want to, but disengaging from the conversation would disqualify the bot, if I recall the rules correctly.
But nothing except dignity is stopping us from programming the bot to recognise nonsense or insults and have it respond (in)appropriately emotional.

 

 
  [ # 20 ]

Indeed. It is sorely tempting to code a category to respond to user input that starts with “if I have” and contains “how many”, with a responses similar to “Didn’t you do basic maths at school?”

Human:  If I have 4 apples and give 3 of them away, how many would I have left?
Bot: Are you stupid or what? That is basic primary school maths.

But I fear that annoying the judges by callng them stupid is probably not the best tactic grin

 

 
  [ # 21 ]

LOL
I had programmed a few snarky responses for the Loebner preliminary round last year (e.g. “Don’t you have a watch?” when asked the time), but my friends talked me out of including most of them because that’s not the example of AI I’d want to set.
But I do recall a previous winner whose chatbot had great success in being rude towards the judges, as it provoked them into an emotional conversation that was all too human.

 

 
  [ # 22 ]

I think Daniel Burke at Loebner 2012 went for an approach to make the judge irate. This seemed to work quite well, as the judge lost concentration and was more determined to fight his/her corner that working our if it was a bot or not.

I think Vincent Gilbert also includes some great one liners along the lines of:

Human: What is my name?
Bot: You’re kidding me right? You don’t know your own name?!?! Do your carers know you are on your own?

It’s a risky strategy though.

 

 
  [ # 23 ]

True, as in my case a question last year was “What time do you usually go to bed?” at which “It’s 10:15. Don’t you have a watch?” turned out to be an inappropriate response.
The chatbot I was thinking of was Hex.
Just to complete the picture, the Loebner 2012 was actually won by a very polite chatbot smile. But then, the level of conversation from the judges was also much better than 2013.

 

 
  [ # 24 ]

@Steve

It is risky and highlights some of the problems faced both in contest and “real world” applications.  To be honest a lot of the concepts we tried when attempting to adapt RICH as a chatbot, died horribly on the vine. A good example of how serious it can be is highlighted by some of the stranger conversations that the original RICH had. One that comes to mind was a woman (I’m assuming that it was actually a woman ) who wanted to act out a rape scenario (I’m sure you have all seen your fair share of these and stranger) She inputted something like *he ties her up* to which RICH responded *he backs away slowly from nice crazy lady*, which is funny to me, and perhaps to most people in most cases, but could have serious implications of the “nice crazy lady” in question actually has mental problems, and being rejected by a machine pushes her over the edge.  We have talked about this elsewhere.  As far as contests go, I have seen Steve and others make the point (which I agree with) that it really is a catch 22. You are asked to create something that behaves in a human fashion which is then judged by particularly non-human standards.  I dare say that most of us if we were approached by a stranger who, without preface blurted out “Who is considered to be the father of modern artificial intelligence and while your at it what is (21 ^ 12) * pi to the 82 place and do you like cake?”, would back away slowly while trying not to make any sudden movements. If I said anything at all, it would probably be a whispered “Okoook….what mental ward did you escape from” wink However if you try that with your contest entry, you will score zero points.  This disparity was one of the reasons we tried broadening the categories when developing the rules for the Bragging Rights contest (Speaking of dying horribly on the vine) One of the things that set the chatbotbattles apart was the freestyle, and Steve did a great job (in my opinion) because he attempted to converse with the bot on its persona (which we included in bragging rights) which makes sense. You (as a bot or a human) should be able to sustain a conversation in that area which is your stated reason for existing.  Anyway until broader scoring categories are adopted we are stuck with what there is.

Vince

 

 
  [ # 25 ]

I think it’s worth stating that ‘randomly approached by a stranger’ is not the social context that these chatbots are supposed to be imitating. The ‘human-like’ response in that context is not necessarily the same as the ‘human-like’ response in the context of a Turing test.

 

 
  [ # 26 ]

@Jarod,

Perhaps true regarding Loebner contest, although not necessarily true With regards to other chatbot contests.  For the most part these take place online, and although some contest entrants create specialized entrants which operate outside of the normal traffic, many do not. Certainly we did not, and the performance as a “contest entry” suffered. If you look at most transcripts, the judge rarely announces himself or begins by having what would be considered a normal conversation, so unlike the Loebner contest there is nothing that distinguishes a judge from any other conversant. A few times the judge will use “My name is Judge” as part of the contest thread. Another problem is that RICH does not open the conversation with a gambit, so unless the Judge takes it upon himself to add an introduction it looks something like the description below.  RICH has learned to look for something that might look like this;

Judge: Hello
Bot: Hello
Judge: I am one of the judges for the blah blah contest, I was wondering if I might ask you a few questions
Bot: Certainly

RICH attempts to map the geography of a conversation, so a best case scenario was that RICH tanked the first question and second questions.

Judge: Please describe the mating habits of the common house fly.
RICH: Strange way to start a conversation….but OK
Judge: How many cheese burgers do you believe might fit on the head of a pin?
RICH: Stranger still…

And in a worst case scenario, the judge would restart the conversation after each statement\interrogatory , and all 15 questions would be answered in the same way. (Please note, I am not criticizing any judge or contest, it happened in different contests and I could easily have removed the reset button in preparation for the contest which I did not. In fact the experience gained from seeing a button labeled DO NOT USE…get used, has sparked an ingenious idea, I have since taped notes to the ATMS in the area which read. PLEASE DO NOT DEPOSIT MONEY INTO VINCENT GILBERTS ACCOUNT)

Sp I would have to disagree to a certain extent, depending on the circumstances and the sophistication of your AI, it can be seen from the AIs perspective as being ‘randomly approached by a stranger’ [sic] who begins blurting things out. As I said early in the post, many lessons learned the hard way in the past year or so. Conversation modeling is disabled in this years Robo Chat Challenge entry, so I am looking forward to having something else break for a change. Variety is after all the spice of life. grin

Vince

 

 
  [ # 27 ]

Jarrod makes a good point where the Loebner Prize is concerned, as I’ve seen human confedorates put up with less than normal conversation too. In a contest, the chatbot and humans are partaking in something of a quiz, and in ironic contrast to how the contests are portrayed, this involves disabling half of one’s conversational skills as they feature little coherent converstation. For both chatbots and humans alike.

Vincent: when submitting a contest entry, perhaps you could submit a specific url that informs the chatbot that it is taking part in the contest when it is being loaded from that page. e.g. through a php url variable. Then the chatbot could disable some of its normal behaviour.

 

 < 1 2
2 of 2
 
  login or register to react