AI Zone: chatbots.org

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

My thoughts on the 2014 competitiion

Posted: Nov 17, 2014

Hugh Loebner

Member

Total posts: 29

Joined: Jul 7, 2009

E-mail Hugh

I want to start out by offering my congratulations to the finalists, and especially Bruce for his victory.

I also want to thank the Ed Keedwell for organizing the contest, and the AISB for being so kind as to take the contest under its wing and Bletchley for providing a location.

I’d like to explain, once again, the origin and raison d’etre of the LPP. One year in the distant past the contest was to be held in my apartment. Originally, I had absolutely no desire to develop a comm. protocol or the comm programs. I required each submitter to provide them. One entrant used telenet. Another used sockets. A third used Tcp/IP. It was quite tedious setting up, but I succeeded, and the contest was held in my apartment without, I thought, any problems. Then, after the contest had been concluded, one of the humans looked at the transcript of her conversation, and said “That isn’t what I typed.”(!!!!) I disqualified that entrant

It seems that the submitter’s protocol had garbled the human’s conversations. I decided that, of necessity, I would have to write the programs and decide on the comm. protocol.

Naturally, the person who used telnet was convinced telnet was the only way to go; the person who used Tcp/IP was convinced Tcp/IP was the only way to go and the person who used sockets was convinced that was the only way to go.

I had no desire to waste my time learning the ins and outs of any of them. I wanted a simple (and yes, the LPP is extremely simple) means of communicating. I wanted a character by character interaction, purely for aesthetic reasons. I decided to let Bill Gates and his minions, who had developed 40 million lines of OS, do the heavy lifting. I considered using the file system, either as a drop box for information, or by their names. However, that led to problems of opening, closing, sharing, protections, etc. It seemed to me that using directory names would require minimal OS overhead.

Advantages of LPP.
1. Debugging programs is a snap. By simply looking at the comm directory, one can see, in real time, the interaction.
2. By (temporarily) eliminating the delete directory command, one would have a log of the interactions.
3. It favored no one. It was unique, so no submitter would have the advantage of familiarity with it. I am happy to say that everyone hated it.
4. And, most important of all, I understood it.

With regard to its implementation at Bletchley 2014:

I must confess that I did not observe any interactions, so I can not speak to the existence of “bursts” of communications. I do know that there was some sort of problem with the webcasts.

However, no human, and no judge, seemed to have the slightest difficulty communicating using it.

If a human can deal with something, your program must do so also.

In any case, I can’t understand why there should be a problem with the timing of the characters, since when I used the protocol years ago things went very smoothly using more primitive hardware.

The bottom line is that the AISB will (I fervently hope) be in charge of the contest, and they are the people to whom to address your requests for change. I did not require that the LPP be used in the future, and there have been mutterings about changing the protocol at some point in the future.

Once again, though, congratulations, and heartfelt thanks to all.

Hugh

Posted: Nov 17, 2014

[ # 1 ]

Dave Morton

Administrator

Total posts: 3111

Joined: Jun 14, 2010

E-mail Dave

Thanks for the insights, Hugh. That goes a long way toward understanding the ‘whys’ behind the LPP.

Posted: Nov 17, 2014

[ # 2 ]

Christophe Finas

Experienced member

Total posts: 41

Joined: Sep 24, 2014

E-mail Christophe Finas

I won’t express thanks enough myself as I know how invasive, awkward and relentless I was during the event. I came from France to get the maximum of the Loebner experience, and I found the people there very welcoming despite my insistence in asking. And this was just everyone, from the AISB staff to the Bletchley staff ... and of course yourself Hugh. Thanks a lot.

Regarding the LPP:

Not being a PERL programmer myself (and I am sorry about that), I can confess working with the LPP made me scratch my head quite a few times, then improve my “computer-insulting skills”, but ended up being part of the fun.

I don’t think we can have a go at “making a super-human-bot” if we are just stopped by these LPP trivials.
It’s not this long to learn from it, make it work, and then go back to our main tasks.
If like me you do not have a PERL background, it may just take a few hours to customize it.

This doesn’t mean it is a perfect solution, or a good or bad one imho ; I believe you hit the most important point by mentioning that it is unique.

Being unique makes the Loebner contest maybe less accessible for newcomers, but still accessible to one hundred percent of the programmers really willing to be successful at the Loebner objective. So we won’t lose any “good guy” who wants the real thing done. Of course, the same could be said if the AISB decides a change in the matter and targets another exchange method.

As for the networking oddities observed lately, we’ll get around that too.
There are enough skills to have all that explained and addressed, and improved with the years.
I may sound like a natural optimistic but it is the way I feel.

I would think the best improvement would be to allow entrants and staff a few more time before the live begins. This time could be used to test-proof the live feed and the network behaviour on all platforms. And by that I mean that the entrants can fairly do that, and would be much willing to, considering they want everything to go smoothly and visitors to get the best feed.

How do the previous entrants feel about that?

Posted: Nov 17, 2014

[ # 3 ]

Hugh Loebner

Member

Total posts: 29

Joined: Jul 7, 2009

E-mail Hugh

You don’t need to know Perl. It’s the language I used for the Judge and Human programs, but all an A.I. program needs to do is detect, create, and delete directories. Almost any language can do that.

Posted: Nov 17, 2014

[ # 4 ]

Christophe Finas

Experienced member

Total posts: 41

Joined: Sep 24, 2014

E-mail Christophe Finas

Indeed.
There is a sample UI for judge and human that is available and is PERL-based.
But on the program end, the LPP itself relies on managing directories and the program just has to do that.
I made an amalgamation here between the UI and the real program requirement.

Posted: Nov 17, 2014

[ # 5 ]

Don Patrick

Guru

Total posts: 1009

Joined: Jun 13, 2013

E-mail Don

I do admire the LPP for its cross-platform simplicity, but the implementation has provided enough hitches to make entire rounds go to waste (e.g. last year’s complete disconnection with Bruce’s program provided a blank round, and I know this to be a side-effect of how the judge program works). I’m not just a programmer but also a comics artist who has helped organise comics conventions, so I understand the value of running a good show. Although theoretically AI should be able to do everything a human can, and such things would be possible with grammatical analyses, perhaps you can agree that these kind of technical difficulties reduce the entertainment value of the event for the public.

I can understand that letter-by-letter display looks nicer. It does add a bit more “life” to the chat although I think most people could do without watching people backspace. I wonder how you would feel about a sentence-per-sentence transfer of messages, as long as the text is still being displayed on screen as if it were gradually typed. i.e. if the programs were to send a complete sentence at once, and then the LPP ‘types’ it out letter by letter as an animation.

As for the toll on processing power that the LPP causes, I’m sure perl has the equivalent of a rest() or pause() function to put in its main while() loop and recude its searching from 120 times per second to 30 times per second.

Posted: Nov 17, 2014

[ # 6 ]

Hugh Loebner

Member

Total posts: 29

Joined: Jul 7, 2009

E-mail Hugh

There is such a function. I have used it, and to the best of my knowledge the current programs use it. I don’t think that has anything to do with network delays. Assuming only one program per computer (judge or human comm program) the computer has nothing to do except scan the comm folded for new sub directories and be interupted when a key is pressed.

Posted: Nov 17, 2014

[ # 7 ]

Steve Worswick

Administrator

Total posts: 2048

Joined: Jun 25, 2010

E-mail Steve

The problem wasn’t detecting the directory names in the folder but simply that the folder was not being populated in real time. Characters were being sent in clumps and then a wait and then another clump. Because most of the bots use a rule of “if the judge doesn’t type anything for x amount of seconds, assume he has finished his message” this wait was splitting inputs.

I would suggest next year that the bot program and judge program are run on the same computer to remove any network issues.

I agree with Hugh’s 4 advantages of using the LPP protocol and it was the ease of just looking at the comms directory that the issue was spotted and can hopefully be corrected for next year.

It certainly didn’t spoil the day for me though. It was great to meet up with everyone again and congratulations to both the finalists and contest management for running a great day. I’m already looking forward to next year.

Posted: Nov 17, 2014

[ # 8 ]

Don Patrick

Guru

Total posts: 1009

Joined: Jun 13, 2013

E-mail Don

You’re right Hugh, the rest() fix is unrelated to this year’s network problems, it is an older and smaller problem that can (and has during testing) cause communication delays up to one second, and makes computer fans run overtime. If a rest() call is already in there, then I’d just increase the microseconds parameter because humans don’t type at superspeed anyway. Even though the computer’s task is simple, it is scanning so often per second that it isn’t giving the processor many breaks to actually create the subdirectories. I’m just handing in solutions to problems that are at best trivially related to conversational AI.

Anyway, I’ll put my faith in the AISB from here on. It is rare that I find an event organised in professional manner, but apart from the network problem I was quite satisfied with the organisation this year, and I’m glad to hear it’s staying where it is.

Posted: Nov 17, 2014

[ # 9 ]

Merlin

Guru

Total posts: 1081

Joined: Dec 17, 2010

E-mail Merlin

Hugh Loebner - Nov 17, 2014:

I can’t understand why there should be a problem with the timing of the characters, since when I used the protocol years ago things went very smoothly using more primitive hardware.

The bottom line is that the AISB will (I fervently hope) be in charge of the contest, and they are the people to whom to address your requests for change. I did not require that the LPP be used in the future, and there have been mutterings about changing the protocol at some point in the future.

Hugh,
thanks for starting the contest. I think it will be in good hands with the AISB.

Let me give the alternate developer perspective of the LPP.

For me, the LPP had been a barrier to entering the the Lobner Prize competition. This has been the case for about 5 years. Only this year, after the hype around Eugene Goostman and a thread focused on the LPP, did I decide to take a crack.

Limitations of the LPP
- LPP is a non standard
Nothing I do for the LPP has any value for any other project. Like you, I had little desire to learn the ins and outs of a non-standard protocol that I can’t use for any other purpose. I found this to be a non-trivial task. I would say 60% of my effort this year was spent on creating and testing how the LPP and my interpreter interacted. This proved to be a problem for me, where a last minute change I made did not allow me to fully test my entry (and as luck would have it, broke the bot).

- Debugging the program is not a snap.
It is written in PERL and there is no technical support. I don’t do PERL.
To compete, I need to make sure I can run the Judge program, my protocol interface, and my chatbot interpreter and have them interact correctly in a single computer and network environment. There is no one to ask questions to or help identify problems. I didn’t know where the official benchmark judge program can be found, and no one running the contest seemed to be able to answer technical queries. The Chatbots.org thread did provide me with enough information to get something working.

- Every year, there seems to be some problem with the LPP
I have been watching this contest for years, and there have been many instances of the protocol getting in the way of a bot being at its best. I don’t mind the judge not thinking my bot is human, or getting beaten by better bots. But, glitches in the protocol prevent bots from showing their best and can make hundreds of man hours of work go up in smoke.

I suspect that this year’s problem is a result of adding in the ability to send out a real-time web feed. One of the problem’s I hit when trying to implement the polling protocol is that if you do not throttle the polling, you will max out the CPU and not leave time for the other programs to multitask. This could have caused the burst like network problems.

I believe many of the problems could be eliminated if instead of “polling”, the bots and humans use the “carriage return” as the indication of a complete input. Hopefully, the AISB will be willing to make this change.

Posted: Nov 17, 2014

[ # 10 ]

Don Patrick

Guru

Total posts: 1009

Joined: Jun 13, 2013

E-mail Don

Merlin - Nov 17, 2014:
One of the problem’s I hit when trying to implement the polling protocol is that if you do not throttle the polling, you will max out the CPU and not leave time for the other programs to multitask. This could have caused the burst like network problems.

This is the same problem I mentioned, though I wouldn’t know if it has any particular effects on networks. I haven’t found any rest() calls or similar in the perl file to slow its infinite loop down, but I also just read that perl’s sleep() function only works with full seconds. So if we’re going to fix the CPU usage lag time, we’re going to need something other than perl, or indeed, simply a carriage return instead of a continuous loop.

Posted: Nov 17, 2014

[ # 11 ]

Merlin

Guru

Total posts: 1081

Joined: Dec 17, 2010

E-mail Merlin

The problem gets bigger when you run everything on the same machine.
You could have 3 things polling; the LPP, the chatbot interface, and a web connection (if you are broadcasting).
Not to mention needing time for the chatbot interpreter itself.

If you go to:
http://www.kentm.co.uk/loebner2014/index.php?round=1&judge=1#base

the web page is still polling, making it impossible to try to scroll through the transcript.

Posted: Nov 17, 2014

[ # 12 ]

Denis Robert

Experienced member

Total posts: 92

Joined: Apr 24, 2012

E-mail Denis

Merlin - Nov 17, 2014:

If you go to:
http://www.kentm.co.uk/loebner2014/index.php?round=1&judge=1#base

the web page is still polling, making it impossible to try to scroll through the transcript.

You can see it, but you have to disable the JavaScript in your browser.

On IE:

Tools > Internet options > tab “security” > button “custom level” > “Scripting” + “active scripting” > radio button “disable”

Posted: Nov 17, 2014

[ # 13 ]

Merlin

Guru

Total posts: 1081

Joined: Dec 17, 2010

E-mail Merlin

Yeah, with IE I loose the formatting, making it harder to tell who says what.
With Firefox the formatting is better, but they took out the button that makes it easy to turn JS on and off.
Opera is better, you get the formatting and you can turn JS off.

But, in the end I am hoping some one will just publish the transcripts without all the hoopla (or just republish the pages without the polling refresh JS).

Posted: Nov 18, 2014

[ # 14 ]

Don Patrick

Guru

Total posts: 1009

Joined: Jun 13, 2013

E-mail Don

Okay, so that’s how that works. From the looks of things they’ve changed the broadcast format, presumably also the broadcasting program. While I didn’t have a broadcasting program running during my one-second-delay tests, if one was running on the judge computers and ran no-holds-barred just like the LPP program, then together they could have occupied all the computer’s resources and stalled it. From the looks of things the broadcast format is the only element that is different from previous years.

Now as to the logic that faster computers shouldn’t have any speed issues: If computers are twice as fast, then the LPP will be scanning twice more often. What an unrestrained while() loop does is push CPU usage to the ceiling, regardless how high the ceiling. It’s like revving a car that can go 300mph: You’ll still overheat the engine if you keep the pedal to the metal. Judging by Mitsuku’s transcript the problem didn’t come about until after 10 minutes. I’d say that’s the point at which the CPU overheated.

Thus Inspector Lestrade, we come to the conclusion that it must have been the broadcast program, with his accomplice the judge program, who murdered the transcripts in a heated rage. I would suggest the AISB look there.

Posted: Nov 18, 2014

[ # 15 ]

Don Patrick

Guru

Total posts: 1009

Joined: Jun 13, 2013

E-mail Don

Since everyone is all over this now, I’ll just mail the AISB and ask if we can cooperate in designing and programming an LPP with less technical difficulties for next year.

1 2 3 > Last ›

1 of 4

‹‹ The Loebner Prize 2014 Loebner Prize 2015 ››

Search the Forum

Forum Profile

Forum Subscription

Forum Moderators

On Our Admin Forums

Partner Forums

Science Statistics

Chatbot Statistics

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

How many team members (marketing, sales, IT and customer support) will be involved in
your chatbot system?

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

We're putting your report together.

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

What chat automation functions are most important to you? Check all that apply.

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

Who should we send the information to?

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

Who should we send the information to?

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

What is the best number to reach you?

Search the Forum

Forum Profile

Forum Subscription

Forum Moderators

On Our Admin Forums

Partner Forums

Science Statistics

Chatbot Statistics

Use our Chat Match Tool to get started with Chatbots for Business

Compare features, pricing, and reviews from award-winning providers based on best fit for your business.

How many team members (marketing, sales, IT and customer support) will be involved in your chatbot system?

Compare features, pricing, and reviews from award-winning providers based on best fit for your business.

We're putting your report together.

Compare features, pricing, and reviews from award-winning providers based on best fit for your business.

What chat automation functions are most important to you? Check all that apply.

Compare features, pricing, and reviews from award-winning providers based on best fit for your business.

Who should we send the information to?

Compare features, pricing, and reviews from award-winning providers based on best fit for your business.

Who should we send the information to?

Compare features, pricing, and reviews from award-winning providers based on best fit for your business.

What is the best number to reach you?

Compare features, pricing, and reviews from award-winning providers based on best fit for your business.

Subscribe

Use our Chat Match Tool to get started with
Chatbots for Business

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

How many team members (marketing, sales, IT and customer support) will be involved in
your chatbot system?

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.

Compare features, pricing, and reviews from award-winning providers based on best
fit for your business.