AI Zone: chatbots.org

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

My thoughts on the 2014 competitiion

Posted: Nov 20, 2014

[ # 31 ]

Steve Worswick

Administrator

Total posts: 2048

Joined: Jun 25, 2010

E-mail Steve

Some of the lags I saw were up to 10 seconds.

It’s a shame the official logs don’t show what actually happened, as it makes our bots look pretty dumb in the public eye.

Posted: Nov 20, 2014

[ # 32 ]

Will Rayer

Experienced member

Total posts: 94

Joined: Jun 13, 2013

E-mail Will Rayer

It seems Rose and Mitsuku both had fragmented input. From my own logs I can see Uberbot’s input was OK and I don’t know about Izar. I thought fragmented input only happens if (1) the judge program batches up its output and creates the special folders all at once, instrad of in real time. Or (2) the bots only allow a short wait time between characters, thus accepting input prematurely. But there’s no reason to assume either of these happened so we need some other cause.

On stackoverflow.com I found this: http://stackoverflow.com/questions/5159220

It looks like on a LAN a client PC may cache files and directories for up to 10 seconds, to avoid network traffic. This would mean the AI bots (client PCs) may check the comms folder every 0.5 second (or whatever) but the LAN won’t refresh the ‘real’ list of what’s in the folder until 10 seconds are up. This cacheing doesn’t always occur, eg if you have an explorer window open to view the folder, cacheing isn’t used. This explanation is consistent with what happened on the day, although it doesn’t explain why Uberbot did not suffer the problem. It may be because I made my own custom client program.

Anyway it should be easy to reproduce the problem on a small LAN, and the cacheing can be deactivated via some registry tweaks as explained in the article. So hopefully this problem won’t happen again.

Posted: Nov 20, 2014

[ # 33 ]

Bruce Wilcox

Guru

Total posts: 2372

Joined: Jan 12, 2010

E-mail Bruce

Normally I look forward to debugging the logs to see what I can improve. This year Rose’s logs are a unmitigated pile of carp. And similarly the online logs of the humans are meaningless to me. Rose may have been the unanimous pick of the judges, but I have no clue why. She could equally have been the unanimous fourth place pick.

Posted: Nov 20, 2014

[ # 34 ]

Dave Morton

Administrator

Total posts: 3111

Joined: Jun 14, 2010

E-mail Dave

Will, that’s an interesting and very plausible explanation as to what may have happened at the LC. The only questions I have at this point would be whether the computers used in the competition were running Windows Server 2k8, or if not, whether the OS used might be vulnerable to the same issue. I’ve looked through the Stack Overflow article, and then peeked at my registry, and did not see the setting that was referred to in the answer to the question, so I did a little more digging, and it seems that, though that setting does not exist, according to this article, Win 7 still uses SMB2 (well, 2.1, actually), and could well be vulnerable to the caching issue mentioned, though I’m not 100% sure. I suppose I could try to set up an experiment, and see if I can detect a caching problem, though it would take me a few days to work out a script to detect such an issue. Lord knows I have enough computers on my network to try it out.

Posted: Nov 20, 2014

[ # 35 ]

Merlin

Guru

Total posts: 1081

Joined: Dec 17, 2010

E-mail Merlin

As I said, debugging the LPP is not straight forward.

Makes me wonder if further effort in this contest is worth it.

Posted: Nov 20, 2014

[ # 36 ]

Don Patrick

Guru

Total posts: 1009

Joined: Jun 13, 2013

E-mail Don

Will Rayer - Nov 18, 2014:
I would hate to have to code something using sockets,or TCP/IP, or serial comms or whatever. So please keep the LPP!

Will, you’re just saying that because you don’t want to go through it again . Consider the newcomers who didn’t have your amount of help. Transcripts too would be untangled with the use of carriage returns. The one thing I do like about the current LPP is its compatibility, so if I propose a new LPP it should be easier still or it wouldn’t be an improvement, now would it?

One more thing I wonder: Does this mean that the human confedorates and judges really had the patience to wait 5 to 10 seconds for people to finish their sentences? Didn’t anybody comment on having this problem and suggest to fix it before proceding?
I understand Bruce that it feels a hollow victory. Still I consider Rose a likely victor and as having the smarter language system.

Posted: Nov 20, 2014

[ # 37 ]

Steve Worswick

Administrator

Total posts: 2048

Joined: Jun 25, 2010

E-mail Steve

Bruce is indeed a worthy winner. All 4 bots were in the same situation. The way that Chatscript handles topics in a stricter manner than the AIML bots meant Rose was able to stay on topic (when she could find one from the garbage) and so provided more sensible responses.

I may be wrong but being top in the qualifiers and being voted top bot by all 4 judges is a first in the Loebner Prize and congratulations must go to Bruce again for that achievment too.

Posted: Nov 20, 2014

[ # 38 ]

Bruce Wilcox

Guru

Total posts: 2372

Joined: Jan 12, 2010

E-mail Bruce

I’m not saying I’m turning down $4K, because I can certainly use it. And I thank you for your applause. I am just sad at the uselessness of the data.

Posted: Nov 21, 2014

[ # 39 ]

Rob Ellis

Member

Total posts: 22

Joined: Nov 11, 2014

E-mail Rob Ellis

Hi everyone. Long time watcher, first time poster.

It seems silly to use the LPP over something that is much more commonly used and supported like TCP/IP. Since I’m new to the scene, how do other contests compare? With TCP you could even have remote bots being submitted which could be interesting.

Posted: Nov 21, 2014

[ # 40 ]

Dave Morton

Administrator

Total posts: 3111

Joined: Jun 14, 2010

E-mail Dave

Hi, Rob, and welcome!

The problem with remote chatbots is that you can’t really guarantee that it’s not a human on the other end, and that negates the value and importance of the contest. That said, however, if the competition is conducted over a closed internal network, the use of TCP/IP (or even sockets, for that matter) shouldn’t be a problem.

Posted: Nov 21, 2014

[ # 41 ]

Steve Worswick

Administrator

Total posts: 2048

Joined: Jun 25, 2010

E-mail Steve

Dave Morton - Nov 20, 2014:
The only questions I have at this point would be whether the computers used in the competition were running Windows Server 2k8, or if not, whether the OS used might be vulnerable to the same issue.

The chatbot computers were running Windows 7 64 bit. I wasn’t allowed on the judge computers but would assume they were the same. I don’t think it’s the LPP that’s at fault. All things considered, and especially with Will’s link, everything points to the cached folder setting on Windows 7.

Posted: Nov 21, 2014

[ # 42 ]

Don Patrick

Guru

Total posts: 1009

Joined: Jun 13, 2013

E-mail Don

That much is clear. Still it does not help matters that the LPP sends messages in bits and pieces in the first place. If messages were sent as whole sentences, then even a 10 second delay would have been no obstacle to anyone trying to understand the conversation.
Judges could then still use the trick of pressing enter to break up their sentences - ha-hah you got me -, and the imaginary AI could still use their superior understanding to deal with this, but there wouldn’t be as many technical difficulties with the interface itself as there have always been.

Posted: Nov 21, 2014

[ # 43 ]

Will Rayer

Experienced member

Total posts: 94

Joined: Jun 13, 2013

E-mail Will Rayer

Well I think Hugh’s point “none of the humans had problems with the fragmented input” is a valid one. IMO the LPP is valid and useful as everyone can use it and it only takes a day or so to code for it. To get past this current issue we need to:

1. Reproduce the problem on a LAN where the PCs use Windows 7.
2. Make the registry tweak on the clients and check the problem goes away.
3. (Optional) If you are coding an LPP client program, add code to check for Windows 7+ and check the registry setting. If the setting is not present, show a warning and optionally add the setting. It doesn’t need a reboot, according to the article.

Posted: Nov 21, 2014

[ # 44 ]

Brian Rigsby

Experienced member

Total posts: 47

Joined: Mar 8, 2013

E-mail Brian Rigsby

I believe it to be an assumption that none of the humans had difficulty with the fragmented input. We would all have trouble replying back to fragmented sensory input. Full analysis of the human logs would be required to determine that. I think we have to be careful going down a path where we explain away technical difficulties with this logic and steer towards an interface that provides clean input to both humans and the bots. If we do not correct this, perhaps next year we should all code our bots to spit fragmented completely non-nonsensical responses back to the human judges to simulate technical difficulties. This would likely not only win the event but even perhaps pass unfairly the Touring test making this contest a complete joke. Is this what we want? Trickery to win a contest and $100,000? Something to think about.

Likewise, what does allowed trickery from the judges really prove? Why do we bother trying to simulate/create an A.I. if the tests used to determine of the A.I. is thinking is nothing but trickery and garbage inputs? Everyone knows the following famous dialog “open the pod bay doors” between Dave and HAL from 2001: A Space Odyssey dialog. Would HAL have still been deemed sentient if the dialog went from this:

Dave Bowman: Hello, HAL. Do you read me, HAL?
HAL: Affirmative, Dave. I read you.
Dave Bowman: Open the pod bay doors, HAL.
HAL: I’m sorry, Dave. I’m afraid I can’t do that.
Dave Bowman: What’s the problem?
HAL: I think you know what the problem is just as well as I do.
Dave Bowman: What are you talking about, HAL?

... Yes, HAL is showing Sentience

to this:

Dave Bowman: Hello, HAL. Do you re
HAL: I can’t understand you Dave
Dave Bowman: ad me, HAL?
HAL: I can’t understand you Dave
Dave Bowman: Op
HAL: What are you talking about Dave?
Dave Bowman: en the pod ba
HAL: I can’t understand you Dave
Dave Bowman: y doors, HAL.
HAL: I can’t understand you Dave
Dave Bowman: What’s the problem?
HAL: The problem is I can’t understand you Dave
Dave Bowman: What ar
HAL: I can’t understand you Dave
Dave Bowman: e you tal
HAL: I can’t understand you Dave
Dave Bowman: king about, HAL?
HAL: No I am not a king Dave

... Is HAL showing sentience? Who the heck knows as HAL was not truly tested

Likewise, I wonder if HAL would have understood this typed input from Dave:

Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. Azanmig huh?

HAL would have just as likely replied “I can’t understand you Dave” to that input also proving nothing.

Come on people…when someday we have a first encounter situation with another life form how will we test them to see if they are sentient? Nonsensical and fragmented communication? I think not. I hope not.

Posted: Nov 21, 2014

[ # 45 ]

Dave Morton

Administrator

Total posts: 3111

Joined: Jun 14, 2010

E-mail Dave

Brian Rigsby - Nov 21, 2014:

Come on people…when someday we have a first encounter situation with another life form how will we test them to see if they are sentient? Nonsensical and fragmented communication? I think not. I hope not.

I think that, in that situation, there would likely be a smoking cinder where the Earth used to be.

< 1 2 3 4 >

3 of 4

‹‹ The Loebner Prize 2014 Loebner Prize 2015 ››

Search the Forum

Forum Profile

Forum Subscription

Forum Moderators

On Our Admin Forums

Partner Forums

Science Statistics

Chatbot Statistics