AI Zone Admin Forum Add your forum

NEWS: survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Kaggle: 4th-grade exam contest

Coming October, a new contest will be launched by Kaggle, inviting AI to take fourth-grade science tests (examples). With this they hope to stimulate the development of common sense knowledge in AI. The prize is $50000.

One example currently already handled by Aristo, who seems to be using inference and/or a web lookup:

Growing thicker fur in winter helps some animals to:
A) hide from danger
B) attract a mate
C) find food
D) keep warm

Another professor suggested that the following would be a tougher test of common knowledge than a children’s exam.

Sally’s favorite cow died yesterday. The cow will probably be alive again
A) tomorrow;
B) within a week;
C) within a year;
D) within a few years;
E) the cow will never be alive again.

Although I think I could answer the first one, most of it is a little out of my league for now, both the format of the questions and the required cause-and-result mechanisms that I haven’t started on. But whichever further challenges we’re going to throw at AI, I think this is a good direction.


  [ # 1 ]

Sounds exciting. Aristo isn’t close to passing the fourth-grade science test, and the Kaggle competition is going for 8th grade level. Should be a good test.

The Allen Institute for Artificial Intelligence has some good resources, for AI types. I like their attempt to see if we can get an AI to handle 4th grade work before anything tougher. One of the biggest problems is getting a standardized format for question input/output.


  [ # 2 ]

The competition for 8th grade science is now in progress.

The Allen Institute for Artificial Intelligence (AI2) is working to improve humanity through fundamental advances in artificial intelligence. One critical but challenging problem in AI is to demonstrate the ability to consistently understand and correctly answer general questions about the world. 


  [ # 3 ]

170 teams entered this competition, top score 60%. From what I’ve gathered, a lot of entries used statistical methods with word association probabilities, rather than reasoning one’s way through the questions. As such, the scores don’t surprise me, and most people seem to agree that neural nets and the like just won’t cut it, because the questions require precise interpretation.
A more rigorous approach however, wouldn’t be something one could have built in the time between announcement and submission of this contest.


  [ # 4 ]

I took a crack at this competition and found exactly what Don indicated.
- Random choice gives 25% correct
- Statistical methods get you to the mid 30s (easy/quick to do, but limited success)
- the Lucene based Wikipedia benchmark gets 43.25% (using all of Wikipedia)

I stopped when I realized that there was no easy way to get a passing grade on the test (to date, no one has gotten over 60%). Some approaches need lots of data (Wikipedia and other sources). There is no good training data that relates well.

The best approaches would have taken longer than the time allowed and would require lots of resources and testing. I hear they may run the test again next year.


  login or register to react