In the past I have had 3 judges give a response of 3-4 and one give a zero. The same judge that gave a zero then went on to give another bot a 4 for something that I thought was at best a vague response. That 7 point swing could make a difference in the results. Years ago only 1 judge scored your bot. I hated it when I got the Russian Judge.
To overcome this in 2011, each Judge scored all the responses and they discussed why they scored them the way they did. In earlier years the top 10 bots automatically got in based on scores. Now, the top 9 are automatic and they discuss which should be the tenth.
I know Wendell is tying hard to make it a fair and well run competition. Scoring human conversation is not easy, and I suggest everyone try it at least once. You will find the results interesting. Scoring the top and bottom bots is easy, but the ranking of those in the middle are often influenced by personal taste.