AI Zone Admin Forum Add your forum
OpenEphyra
 
 

This thread brings up the interesting task of answering questions such as: Who wrote “Moby Dick”? Who was the first president of the United States? etc.

At first I started trying to get my logic agent to handle these questions. Then while reading about the architecture of Watson, I came across a citation to OpenEphyra. I guess IBM tried it in early stages of the Watson effort, but it didn’t perform at championship level.

Although Open Ephyra is rather slow, it is able to answer questions such as “Who wrote ‘East of Eden’?”, “Who was the 42nd President?”.

In keeping with Watson’s approach, I think I’ll try to make an agent out of Open Ephyra, as well as continuing to try to make my logic agent handle these types of questions. From the paper on Watson’s architecture:

For the Jeopardy Challenge, we use more than 100 different techniques for analyzing natural language, identifying sources, finding and generating hypotheses, finding and scoring evidence, and merging and ranking hypotheses. What is far more important than any particular technique we use is how we combine them in DeepQA such that overlapping approaches can bring their strengths to bear and contribute to improvements in accuracy, confidence, or speed.

I have 29 agents so far :) And the controller to select among their responses is not very sophisticated. But at least I’m not using UIMA to wrap messages, which seems to me to be a way too overengineered and verbose XML. I think simply passing natural language strings among agents is far easier and forces the programmer to handle natural language constructs such as ambiguous delimiters, etc. at a high level of their programs’ interfaces, thus making the programs more independent (a user can interact with each agent without having to wrap queries in UIMA or whatever).

Anyways, here’s the (heavily abbreviated) output of Open Ephyra:

Question: who was the 42nd president?

+++++ Analyzing question (2011-06-13 13:17:59) +++++
Normalization: who be the 42nd president

Answer types:
NEproperName->NEperson

Interpretations:
Property: IDENTITY
Target: 42nd president
Property: NAME
Target: 42nd president

Predicates:
-

+++++ Generating queries (2011-06-13 13:17:59) +++++
Query strings:
42nd president
(42nd OR “atomic number 60” OR neodymium) president
“42nd president” 42nd president
“42nd president” 42nd president
“was the 42nd president”
“the 42nd president was”

+++++ Searching (2011-06-13 13:17:59) +++++

+++++ Selecting Answers (2011-06-13 13:18:01) +++++
[...]

Answer:
[1]    Bill Clinton
      Score: 2.6550815
      Document: http://www.whitehouse.gov/about/presidents/williamjclinton

Question: Who wrote “East of Eden”?

+++++ Analyzing question (2011-06-13 13:18:33) +++++
Normalization: who write east of eden

Answer types:
NEproperName->NEperson

Interpretations:
Property: AUTHOR
Target: East of Eden

Predicates:
-

+++++ Generating queries (2011-06-13 13:18:33) +++++
Query strings:
wrote East Eden
(wrote OR indited OR pened OR penned OR composed) “East of Eden”
“East of Eden” wrote East Eden
“wrote East of Eden”

+++++ Searching (2011-06-13 13:18:33) +++++

+++++ Selecting Answers (2011-06-13 13:18:35) +++++
Filter “AnswerTypeFilter” started, 523 Results (2011-06-13 13:18:35)
[...]

Answer:
[1]    John Steinbeck
      Score: 2.3400908
      Document: http://www.chacha.com/question/who-wrote-east-of-edan

Question: Who starred in East of Eden?

+++++ Analyzing question (2011-06-13 13:18:47) +++++
Normalization: who star in east of eden

Answer types:
NEproperName->NEperson

Interpretations:
Property: ACTOR
Target: East of Eden

Predicates:
-

+++++ Generating queries (2011-06-13 13:18:47) +++++
Query strings:
[...]

Answer:
[1]    James Dean
      Score: 0.98565185
      Document: http://www.killermovies.com/e/eastofeden/articles/4248.html

Question: what would I use a knife for?

[...]

Answer:

—-

Note it provides no response to the last question, I guess because it isn’t a factoid question. So another agent would have to handle that…

 

 
  [ # 1 ]

Robert, that’s good stuff! The only bit of information that I would have liked to see would be elapsed time for each volley. Any idea of roughly how long each one took?

 

 
  [ # 2 ]

Think frameworks and libraries .. multi-agent systems are built on frameworks using libraries ..

For example, distributed multi-agent systems might take advantage of SaaS, API, or Webhooks ..

A cloud based framework, for instance Google App Engine or Amazon AWS, might be used to integrate various and sundry APIs into a unified frontend ..

I believe that Apple Siri was made in this way ..

I would remind everyone reading to think about potential “Open Chatbot Standards” .. 

I’m particularly interested in XMPP as an open communication protocol among intelligent agents ..

- Marcus Endicott

 

 
  [ # 3 ]

Question: Who was the first President of the United States?
[...]
Answer: George Washington [about 10 seconds, two times]

Question: Who is the Secretary-General of the UN?
Answer: Ban Ki [about 15 seconds]

Question: Who wrote “King Lear”?
Answer: Shakespeare [about 8 seconds]

Question: Who wrote The Iliad and The Odyssey?
Answer: Robert Graves [heh. About 7.5 seconds]

Question: Who wrote The Iliad?
Answer: Virgil [heh. About 7 seconds]

Question: Who wrote The Odyssey?
Answer: Samuel Butler [heh. About 8 seconds]

Question: Who wrote “2001: A Space Odyssey”?
Answer: Clarke [about 10 seconds]

So it gets some of the answers wrong, and takes longer than my default timeout on waiting for agent responses. However, I have a way of responding with one agent, while letting another agent continue to work on a problem, and when it gets an answer output that to the user ... An Open Ephyra agent will certainly put that mechanism to the test :)

Just brainstorming for a minute, I would like to make corrections when OpenEphyra gets an answer wrong; the way that would work is that another agent (for example, the logic agent) would store the correct response, and the score of OE’s response to the specific input would be lowered; so that in the future the same inquiry would not get the same (incorrect) response…

I just noticed that if you type “Who wrote The Iliad” into google, it will come back with a “Best Guess” that correctly answers “Homer”. However it doesn’t have a Best Guess if you ask “who wrote The Iliad and The Odyssey”, just the regular listing of web pages. Still, a Google Agent might be faster than Open Ephyra. You just have to parse the Google output yourself looking for Best Guess; and if it changes in the future, make the bot figure out how to modify the html parsing :)

 

 

 
  [ # 4 ]

Have you seen this one yet?  How to build your own “Watson Jr.” in your basement => http://tinyurl.com/623te8x ..

 

 
  [ # 5 ]

Thanks, Marcus. Maybe Jeopardy will start letting computers compete regularly :)

Watson uses UIMA extensively: ‘The “heart and soul” of IBM Watson is Unstructured Information Management Architecture’ (from the Tony Pearson blog linked to above). To me, UIMA seems overly complicated. For example, from the UIMA tutorial:

UIMA uses the concept of an Analysis Engine, which analyzes
the data (possibly in conjunction with other Analysis Engines) and saves the
information in a comment analysis structure (CAS) object. Because the CAS object
is a standard structure, any application that understands it, no matter what the
platform or development environment, can use it.

Why do we need a structure for the data? Why not encode it in the way we’re encoding information here, in these forum posts, in natural language? That makes it human-readable, and removes the overhead of translating between natural language and some arbitrary structure that doesn’t have the benefit of thousands of years of evolution. In my humble opinion UIMA and other protocols for communicating between agents create too much of an “impedance mismatch”. The questions are in natural language, the documents are in natural language, the answers are in natural language, why not use natural language for the in-between steps too?

That’s my working hypothesis at any rate :)

 

 
  [ # 6 ]
Robert Mitchell - Jun 14, 2011:

Why do we need a structure for the data? Why not encode it in the way we’re encoding information here, in these forum posts, in natural language? That makes it human-readable, and removes the overhead of translating between natural language and some arbitrary structure that doesn’t have the benefit of thousands of years of evolution. In my humble opinion UIMA and other protocols for communicating between agents create too much of an “impedance mismatch”. The questions are in natural language, the documents are in natural language, the answers are in natural language, why not use natural language for the in-between steps too?

That’s my working hypothesis at any rate smile

The problem comes down to the fact that there are many ways to express the same information in natural language. And although we express information in the form of language, this isn’t the way our brains store and process that info. So I don’t think one can make an argument of language being an evolved storage mechanism. At any rate, the advantages that language offer the brain have more to do with limitations of our specific brain hardware: the number of ideas we can process at a time and the ways in which they can be processed.

Computers are hardware with different limitations. Having a protocol that is more conducive to the way a computer searches/accesses/processes information would be useful. This being said, I’ve never looked at UIMA and won’t comment on how well it accomplishes this.

EDIT: After a cursory glance at UIMA’s wikipedia page, I’m inclined to think it is more designed as an NL parsing tool for use with software that requires only certain unambiguous, structured input. Since equipping all such tools with extensive parsers of their own would be time-consuming, repetitive, and ancillary to what they are trying to accomplish, better to have UIMA do the work to structure the information in an easily accessible way.

Perhaps it is insufficient for parsing information of great subtlety, but for “data such as repair logs and service notes [translated] into relational tables”, it is probably highly effective.

 

 
  [ # 7 ]
Marcus Endicott - Jun 13, 2011:

Have you seen this one yet?  How to build your own “Watson Jr.” in your basement => http://tinyurl.com/623te8x ..

Thanks for the link! I especially like the comparison of processing times depending on the number of cores—really emphasizes the power of parallelization.

I wonder how effective OpenNLP actually is in practice, when one is not limited to questions only of the form recognized by “Watson Jr”.

 

 
  [ # 8 ]
C R Hunt - Jun 14, 2011:

EDIT: After a cursory…

Did you guys finally get your “Edit” button? smile

 

 
  [ # 9 ]

Computers don’t store and process info the way UIMA does either.

My experience with over-engineered software like UIMA is that invariably, it’s not designed flexibly enough to do what I want with it. I downloaded it and tried it out; although it worked on their examples, when I pasted parts of the tutorial text in their demo window, iirc (it’s on a different computer) it didn’t find that “UIMA” was a named-entity (or that “John” was a name in “John read a book”)...so I would have to dig around to find out why…and probably end up constructing a bunch of java factories and consumers, etc. (I used to do Java, and chose to leave it for a reason :)

It’s not very hard to have a natural-language wrapper around a program. I use my MyAgent (subbot.org/myagent) framework to add basic natural-language processing to programs and scripts written by others (such as google calculator and translator scripts written in PHP, a wiki lookup program written in Python, wordnet written in C).

So instead of being limited to “.wik encyclopedia” I can, at runtime, define new ways of invoking the wikipedia agent, such as “look up (.*) in wikipedia”, “wiki (.*)”, etc.

 

 
  [ # 10 ]
Dave Morton - Jun 14, 2011:
C R Hunt - Jun 14, 2011:

EDIT: After a cursory…

Did you guys finally get your “Edit” button? smile

As Jan put it,

Jan Bogaerts - Jun 14, 2011:

and long live the edit button!

 

 
  [ # 11 ]

Robert: As I said, I’ve never messed with UIMA and wouldn’t begin to speculate on how well it performs and where its limits lie. Just trying to point out that a non-NL protocol might work better than an NL-based data storage system because it could better take advantage of the way computers process/search data.

Out of curiosity, how well does MyAgent handle parsing wikipedia pages? What would you say is the success rate? Can you show some examples, or point me to a site with some examples? Just interested to see it at work. smile

 

 
  [ # 12 ]

C R Hunt: The wikiagent (subbot.org/wikiagent) uses the MyAgent (subbot.org/myagent) wrapper to add basic natural-language-like commands to the native commands of a Python module that I took from the irc bot Phenny (inamidst.com/phenny). The wikiagent looks up a term in wikipedia and returns the first sentence of the article. For example (also at subbot.org/wikiagent/lookup.txt):

> .wik encyclopedia
“An encyclopedia (also spelled encyclopaedia or encyclop√¶dia) is a type of reference work, a compendium holding a summary of information from either all branches of knowledge or a particular branch of knowledge.” - http://en.wikipedia.org/wiki/Encyclopedia

> show api
add_synonym | (.*?) is a synonym for (.*)
save_synonyms | save syn
load_synonyms | load syn
add_pattern | add pattern (.*), (.*)
defining_method | >(.*)
restart | restart
show_api | show api
show_one_method_api | show (.*)‘s api
wikipedia | who is (.*)

> show wikipedia’s api
wikipedia | who is (.*)
wikipedia | .wik (.*)

> look up (.*) in wikipedia is a synonym for .wik (.*)
Okay, (?i-mx:look up (.*) in wikipedia) has been added, and will now call wikipedia.

> Please, look up “chatbots” in wikipedia. Thanks!
Chatterbot, a type of conversational agent, a computer program designed to simulate an intelligent conversation with one or more human users via auditory or textual methods.” - http://en.wikipedia.org/wiki/Chatbot

>

—-

“Show api” tells the wikiagent to list the mapping between the regexes it can handle, and method names. More than one regex can be mapped to a method; for brevity the “show api” command lists only one. “Show method_name’s api” lists all the regexps associated with method_name. Then I use ____ is a synonym for _____ to create a synonym for an existing regex.

I notice that the .wa (look up in Wolfram Alpha) command seems to be new since I last checked Phenny. I’ll have to make an agent out of that module too, so that I can create new ways of invoking it that are more intuitive than “.wa” :)

—-

In conclusion, the Phenny agent’s parsing of Wikipedia articles is limited to extraction of the first sentence. I would like to do far more with wikipedia, of course; this is only a start. Next step would be to extract more of the text and learn basic factoids such as (.*) is (.*), etc.

 

 
  [ # 13 ]

Neat stuff. This reminds me of a course I took in undergrad that focused on how people interact with/experience technology. For our final assignment, one student developed a python “know it all” bot that would look up and print the first sentence of a wikipedia article of your choice, then look up an article linked within that sentence, and so forth. It was interesting to see where all the chaining of topics led to. Question: does MyAgent care if you use quotes around the term you are searching for? It doesn’t seem necessary from the regex.

I wonder what Wolfram Alpha returns, since much of its output isn’t structured as NL sentences.

Mostly I was curious when you mentioned parsing wikipedia, since building a bot that can do this is one of my goals. There exist python programs that will extract wikipedia text from the website—turning this raw text into accurate parse trees is another level entirely.

 

 
  [ # 14 ]

You might look into Wordnik also, you can extract (using their API) a number of things from their db including definitions, synonyms, antonyms, example usage sentences, etc.

 

 
  [ # 15 ]
C R Hunt - Jun 15, 2011:

I wonder what Wolfram Alpha returns, since much of its output isn’t structured as NL sentences.

True, and Wolfram Alpha often returns a lot more information than asked for.

The following dialog with subbot.org/waagent demonstrates some of the capability of Wolfram Alpha. It uses Sean Palmer’s calc.py module (included in http://inamidst.com/phenny), which isn’t perfect - it doesn’t handle quotation marks properly, and returns an error message (containing “precioussss” :) when WA appears not to understand the question but does return something. It does do algebra and calculus, however! (The quotation mark issue shouldn’t be difficult to fix, I’ve dealt with it before in other agents.)

Full dialog at subbot.org/dialogs/wa.txt.

—-

> who wrote “Moby Dick”?
D->i->c->k->_;d->i->c->k->y;Dickens, Dickensian, Dickensians, Dickerson, Dickinson, Dickson(6 words)

> who wrote Moby Dick?
Moby Dick->author;Herman Melville;original title->Moby-Dick; or, the Whale., author->Herman Melville, first published->1851 (160 years ago), publisher->Richard Bentley (UK) -> Harper and Brothers (USA), original language->English

> who starred in East of Eden?
East of Eden->cast and roles;actor->character(s), Julie Harris->Abra, James Dean [...]

> who starred in Blue Velvet?
Blue Velvet->cast and roles;actor->character(s), Isabella Rossellini->Dorothy Vallens, Kyle MacLachlan->Jeffrey Beaumont, Dennis Hopper->Frank Booth, Laura Dern->Sandy Williams, [...]

> Who wrote East of Eden?
East of Eden->author;John Steinbeck;author->John Steinbeck

> who wrote the Iliad and The Odyssey?
Iliad, Odyssey->author;Iliad->Homer, Odyssey->Homer, [...]

> who was the second president of the United States?
United States->President->2nd;John Adams [...]

> when was Herman Melville born?
Herman Melville->date of birth;Sunday, August 1, 1819;08/01/1819 (month/day/year);191 years 10 months 16 days ago;10011 weeks 5 days ago;70082 days ago;191.88 years ago;213th day;30th week;(no official holidays or major observances);Birth of Herman Melville (author);sunrise->4:55 am LMT, sunset->7:15 pm LMT, duration of daylight->14 hours 21 minutes;waxing gibbous moon

> what is 90 degrees in radians?
convert 90°  (degrees) to radians;pi/2 radians;1 quad (quadrant);0.5 semicircles;0.25 rev (revolutions);plane angle;[angle]

> 2x = 10
2 x = 10;2 x-10 = 0;x = 5

> what books did Dostoyevsky write?
Couldn’t grab results from json stringified precioussss.

> who is the current Secretary of State of the United States?
Couldn’t grab results from json stringified precioussss.

> what is the integral of x^2
integral x^2 dx = x^3/3+constant

> what is the derivative of x^2?
d/dx(x^2) = 2 x;line

—-

It’s faster than OpenEphyra :) With a little post-processing, it should prove useful…

 

 1 2 > 
1 of 2
 
  login or register to react