AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

New Annual Contest with $25000 First Prize
 
 

http://www.benzinga.com/press-releases/14/07/4733216/nuance-announces-the-winograd-schema-challenge-to-advance-artificial-in

Nuance Communications, Inc. (NASDAQ: NUAN) today announced an annual competition to develop programs that can solve the Winograd Schema Challenge, a test developed by Hector Levesque, Professor of Computer Science at the University of Toronto, and winner of the 2013 IJCAI Award for Research Excellence. Nuance announced the challenge at the 28th AAAI Conference in Quebec, Canada.

 

 
  [ # 1 ]

AWESOME! cheese I was hoping for a contest like this. I’m game.
Thanks for posting Andrew!

 

 
  [ # 2 ]

fta-

Sounds like an interesting contest… but I hope this example question is not representative of them all:

‘An example of a Winograd Schema question is the following:

“The trophy would not fit in the brown suitcase because it was too big.

What was too big?

Answer 0: the trophy or
Answer 1: the suitcase?”

A human who answers these questions correctly typically uses his abilities
in spatial reasoning, his knowledge about the typical sizes of objects,
and other types of common sense reasoning, to determine
the correct answer.’

lol- what is the “it” refering to in this example?!  The way it is worded, ‘it’ is the suitcase that is too big, and thus the sentence actually makes no sense (the suitacase was too big for the trophy… wat/huh?)!  Would it actually take a machine to get the “correct” answer (thus defeating the purpose of the contest)?

 

 
  [ # 3 ]

This is really just a restricted Turing Test. The computer’s answer is compared to a “phantom” human’s,  i.e.  what a presumed human would answer to a restricted type of question.

There are several problems with the premises of the contest. 1.  They presume that a TT is a short conversation.  2. They presume that during a TT the judge will not ask questions requiring “intelligence. ” 3. It does not permit questions of an a/v nature.

BTW,  Carl,  it is the trophy that’s too big.  The sentence makes no sense if the suitcase is too big.

This is one reason that one really needs a human to which to compare the computer.

 

 
  [ # 4 ]
Hugh Loebner - Jul 29, 2014:

BTW,  Carl,  it is the trophy that’s too big.  The sentence makes no sense if the suitcase is too big.

This is one reason that one really needs a human to which to compare the computer.

My point exactly- does this then discern what someone was “thinking” when they answered it (the question), or just force the response to be a syntactical qualifier (due to a sentence with an “it” trailing multiple nouns)-


Judge: ““The trophy would not fit in the brown suitcase because it was too big. What was too big?”
Bot “what ‘what’(‘it’) are you refering to (that ‘was too big’)- the suitcase(last stack noun) or the trophy(second to last stack noun)?”
Judge: “the trophy”
Bot: “OK, the trophy was too big”.
Judge: “doh!”

 

 

 
  [ # 5 ]

It’s a practical linguistical challenge. Of course a human would figure out the answer after a “huh?” and a re-read, but this ambiguity in language is something computers have difficulty with and would be jolly useful if solved. There is no judge to engage into conversation, the AI has to figure out the logic by itself. Replace the word “too big” with “too small” and you’ll see how the meaning of “it” changes despite that grammar and syntax stay the same. Since the grammar is a constant, the distinction must be made through other means, such as knowledge and comparison.

 

 
  [ # 6 ]

http://spectrum.ieee.org/automaton/robotics/artificial-intelligence/winograd-schemas-replace-turing-test-for-defining-humanlevel-artificial-intelligence

Here is another article about the new contest. It has a much better explanation of exactly how Winograd Schemas are formulated. It’s interesting to note that this test doesn’t really invalidate the principle of the Turing Test, it is just a lot less tolerant of the cheating that has characterised the Turing Test in practice.

 

 
  [ # 7 ]

The trophy would not fit in the brown suitcase because it was too big.
What was too big?

The trophy would not fit in the brown suitcase because it was too small.
What was too small?

Surely, these mean the same thing? The trophy is too big and the suitcase is too small.

 

 
  [ # 8 ]

My understanding is it’s a more complex set of questions a bot should be able to understand. Eg if C = client and R = robot reply, you should see a conversation something like:

C: The trophy would not fit in the brown suitcase because it was too big.
R: OK, I got what you said.
C: What was too big?
R: The trophy was too big because it did not fit in the suitcase.
...
C: The trophy would not fit in the brown suitcase because it was too small.
R: OK, got that.
C: What was too small?
R: The suitcase was too small, otherwise it would have fitted the trophy.

So it’s a test of parsing sentences and understanding the input. Whether any bots are up to this standard is another question smile I haven’t tested it with Uberbot yet.

 

 
  [ # 9 ]

But surely, there’s no need to say the second varient of the sentence? All the information is given in the first sentence.

C: The trophy would not fit in the brown suitcase because it was too big.
R: OK, I got what you said.
C: What was too big?
R: The trophy was too big because it did not fit in the suitcase.
C: What was too small?
R: The suitcase was too small, otherwise it would have fitted the trophy.

In fact, you don’t even need the second part of the message:

The trophy would not fit in the brown suitcase

indicates that the trophy is too big to fit in.

 

 

 
  [ # 10 ]

I see, so you’re saying that “The trophy would not fit in the brown suitcase” would be sufficient for a program of some capability to deduce the answer to “What was too big/small?”. I hadn’t thought of it that way but that would be correct in this case. The object of the game however is just to identify what the word “it” refers to, and there are other questions whose two versions are not as related as being an opposite. Well observed though, there are different ways to answer different Winograd Schemas.

The lawyer asked the witness a question, but -he- was reluctant to [answer/repeat] it.

Winograd Schemas do not feature back-and-forth conversational interaction. It’s much like a highschool multiple choice test, and all the program has to do is say A or B at each question. While this makes the program’s job easier, it also prohibits it from dodging it with a “human” answer, or faking it with an interpretable answer.
http://www.hlt.utdallas.edu/~vince/data/emnlp12/train-emnlp12.txt

With multiple-choice tests like this the biggest problem will be guesswork. It is possible that programs who mindlessly guess their way to a 50% score will outstage the more serious attempts at first, and I am sure there will be some about.

 

 
  [ # 11 ]

- is say A or B at each question. While this makes the program’s job easier, it also prohibits it from dodging it with a “human” answer, or faking it with an interpretable answer.

Determine what each “it” refers to. Options: The saying of A or B, each question, the program, the job, the making things easier, the prohibiting, the dodging, an answer tongue rolleye

 

 
  [ # 12 ]

You missed one, Don. “faking it” may also relate to the unmentioned (but indirectly eluded to) notion of understanding the subject matter/context. cheese

 

 
  [ # 13 ]

Well spotted smile. Not to mention the ever elusive proverbial “it”. Sometimes even I don’t know what I’m saying.

 

 
  [ # 14 ]

The Winograd Schema Challenge is testing knowledge, common sense, but not intelligence. For me, knowing that a suitcase too big can containing any thophy or anything else is only common sense.

The food in the refrigerator went bad because it was not plugged in.

This question, for example, is purely knowledge. If you know that food can’t be plugged, it is easy to found what “it” is meaning.

However, the test is interesting, because it needs a good syntax analysis. I will see if my bot is able to proceed this type of question. Maybe I will try to enter this contest.

 

 

 
  [ # 15 ]

I agree that most of the Winograd Schemas can be solved by looking up two simple facts and weighing which is the more commonly/statistically true. However, one can in fact plug a potato into an outlet. It wouldn’t be pretty or functional, but to an open mind unhindered by human conventions, it is a definite possibility. Common knowledge, as “common sense” should be termed in my opinion, is still one of the greatest challenges in AI. Even if it’s not so intelligent in its application, just gathering or programming such a huge amount of common knowledge with flexible application would be a grand breakthrough.

Although I would certainly put a brake on the claims of intelligence that one might assume from yet another behavioristic test, I have to wonder at what point the comparison of probabilities of two bits of knowledge still differs from a process of reasoning.

 

 1 2 3 >  Last ›
1 of 6
 
  login or register to react