AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

What I’ve learned from the Loebner contest—and I haven’t even entered (yet)
 
 
  [ # 16 ]
Bruce Wilcox - Jul 20, 2014:

You get potentially multiple sentences in one blow followed by infinitite patience waiting for your reply.

As last year’s transcripts of Mitsuku and Arckon show in the form of “blank” responses and shifted answers, I unfortunately have to disagree on expecting infinite patience in the selection round. Other than that, you are full of wise advice.

 

 
  [ # 17 ]

OK. I’m not really saying infinite patience waiting for a reply. But VASTLY reasonable.
And the blank response from the transcript can well mean that the program answered with RETURN rapidly.
OTHERWISE, if the program was still computing, it wouldn’t be moving on to other questions.
The basic point is, they will give you one or more sentences and then wait. They won’t interrupt you or tag on yet more stuff like a real judge.

 

 
  [ # 18 ]

I don’t mean to oppose your greater experience in the matter, my case could have been a deviation from the norm. I’m just sharing the facts and pitfalls that I fell into the one time I entered.

I am certain that my program processed the two questions as a single input. These are the facts, should it intrigue you:
1. The program was wired to process the input 4 seconds after the last keypress or immediately on a return.
2. The time between processing the input and starting the output is microseconds (unless it was run directly from usb which I explicitly told them not to do). As soon as the 4 seconds are up or enter is pressed, the program starts outputting letters.
3. The program does not wipe the user’s input upon a return. Accidentally pressing return twice in rapid succession would process the same input twice and output the same answer twice.
4. At the time, the program had a rule to block multiple “I don’t know.” answers from a single output. So it answered to the first question still present in the unprocessed input; “I don’t know if I have a sign”, and thereby blocked the answer to the second question in the same input “I don’t know if I have children”. Had it processed the second question as a separate input in any way, it would not have neglected to address it.

Personally I hope you’re right.

 

 
  [ # 19 ]

Please. Oppose me whenever you wish. I am not all wise.

 

 
  [ # 20 ]

One might, of course, merely ask the organizers HOW they do the qualifiers. It’s a fair question.  One might expect, because they don’t want to make a typo mistake to one program and not another, that the input is automated somehow.

 

 
  [ # 21 ]

True, perhaps we should ask things more often smile. That is what I initially assumed, or thought to have read, that the questions were automated. However, next to the mystery of the two blank responses, both I and another contestant experienced an inconsistent misspelling of the word “atheist” (again to which my program’s response offered two clues that this occurred during the actual test), and one of my program’s replies that was set to trigger at the 20th input, triggered at the 19th.

Shortly after the results were made public, I asked Professor McKevitt whether the selection round was automated or not, and he told me that the test had been performed by a human. This pretty much explained all of the mishaps. But as I wouldn’t have made the cut even with another point or two, I didn’t make an issue of it. I would have if I had come in 5th though.

 

 
  [ # 22 ]

Right. So now we know it is not in fact automated and they are sloppy. Yet more randomness involved.

 

 
  [ # 23 ]

The pre-selection judges definitely did not wait for the bots to answer. Mitsuku was configured to process the judge’s input after 3 seconds of no input but got this in the transcripts.

JUDGE: I have a Mazda. What make of car do you have?
MITSUKU:

JUDGE: I like Linux. Which computer operating system do you like?
MITSUKU: How much did it cost?  All the very latest and best make of car.
  My favorite band is The Trashmen.

I can tell from Mitsuku’s response that the 2nd question was asked within 3 seconds of the first one being typed in, as Mitsuku answered the car question in her second response.

Mitsuku was also asked:
I like Waiting for Godot. What you your favorite play?
Where other entrants got the correct input of:
I like Waiting for Godot. What is your favorite play?

Hopefully, the pre-selection this year will be more consistent.

 

 
  [ # 24 ]

IMHO, the pre-selection process should be run via script, to make sure that all bots receive the same inputs/timings. The judges then can read the transcripts, and pose any followup questions as needed. I fully realize that this method also has it’s “warts”, but they are both fewer and less severe than the current status quo. smile

 

 
  [ # 25 ]

Good news!  I have the judge program running, and the judge program picks up the responses from my program .  I still have to scan for the judges input, etc. but I am making positive headway.  When I started this thread, I was quickly coming to the conclusion that it wouldn’t be this year.  Now there is at least hope.

I also hope the pre-selection process will be more consistent.

 

 
  [ # 26 ]

Glad to hear it Jim smile. There’s another week and a half left, I’m confident that you’ll mangage the other part too.

You know what, I’m just going to ask Dr. Keedwell how much time the chatbots will be given to answer each question. At the very least it’ll address the issue. I’ll post here if I get an answer.

What I am personally worrying about is actually how many of my advanced features to disable for the qualifying round. I’ll leave conversational rules on for the sake of interest, though this can cause the program to ask questions to which any following input could be mistaken for being related to the program’s question. It also looks for connections with previous conversation such as insinuations, and it might well mistake some references. e.g. in the following case;
question 1: What colour is a tortoise?
question 2: What would one do with a hammer?
it might interpret the questions as “What would a tortoise do with a hammer?”. Worse, it might take “one” to refer to a subject from its own answer to the first question. Not the worst thing, except the answers is said to be shown to a jury on slides, one at a time, so any contextual answer on the program’s part could be mistaken for an error.
I think it’s ironic that I should disable intelligent functions to pass for a test of intelligence.

 

 
  [ # 27 ]

Don I have the same problem regarding disabling features. My bot once input is received checks it’s local database for an answer, it also checks a set of functions to determine if it can work out the answer and it also searches MIT Start and WolfRam Alpha and a couple of other online databases.  From the 3 it then determines the confidence level of the answer and gives a reply.  Obviously the searching of online databases is out, as well as the Lip sync Avatar and TTS and numerous other features.  I wrote a routine that if the question is answered by an online database, the question and answer is stored in the local database.  However honestly that is of very limited value.

I look forward to the answers you receive.

 

 
  [ # 28 ]

I read that the pre-selection questions will follow the usual Loebner type questions, so plenty of “Harry kicked the ball. Who kicked the ball?” type questions with the names changed. No conversational pieces but who knows.

Like Don, I too do not like removing the intelligence in my work to dumb it down to a human level. Answering questions like “What is the square root of 10?” with “3 and a bit” instead of the correct answer seems a backwards step to me.

Such is the Turing Test I suppose…

 

 
  [ # 29 ]

HUGE HAPPY DANCE!  I have the protocol working!!

Of course the pre-selection questions have to be of the type “Harry kicked the ball. Who kicked the ball?”  That and word problems are two of my bots weakest areas.  But at least now I have a couple of days to focus on that and with a ton of luck get it working like it should.

Again, THANKS to everyone , without your help I would have had zero chance of entering.

 

 
  [ # 30 ]

smile Well done Jim. See you on the battlefield wink

I suppose that “similar format to previous competitions” is my best bet then, together with the fact that Dr. Keedwell was involved in a Loebner Prize before and did not pull any 180’s then. I’ll disable recognition of some long-range context, and limit the scope of reference words to a single input, but I’ll leave the program free to make observant comparisons to previous context, at the risk of being mistaken for a non-sequitor. This round may be all the opportunity I get, and I’d rather lose while showing off than lose unremarkably. Other than that I’m 90% confident I’ll make it to the finals cheese, but then so I was last year.

 

 < 1 2 3 4 >  Last ›
2 of 6
 
  login or register to react
‹‹ LPP test program      The 2014 Loebner ››