AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

WordNet definitions not meaningfully ordered
 
 

I’ve been experimenting with ChatScript’s WordNet database to develop a chat ‘AI’ which tries to ask the user further questions about subjects they mention, eg:

User: I like apples
AI: Does this mean you also like pears?
User: Yes
AI: So is it fair to assume you like all types of fruit?

The problem I have is that ChatScript’s WordNet database isn’t ordered according to common usage, so the system often ends up using one of the more obscure definitions.

eg. In ChatScript:

:word fruit

  Meanings:
  noun
  1: fruit~1 an amount of a product
    synonyms:  yield~2 *fruit~1
  2: fruit~2 the consequence of some effort or action
  3: fruit~3 the ripened reproductive body of a seed plant


... but the WordNet online search gives what I’d expect:

S: (n) fruit (the ripened reproductive body of a seed plant)
S: (n) yield, fruit (an amount of a product)
S: (n) fruit (the consequence of some effort or action) “he lived long enough to see the fruit of his policies”

Did this information get lost forever when WordNet was integrated with ChatScript, or is there still a way to obtain it?

 

 
  [ # 1 ]

I have altered the code to reverse the order of display, which is what you are seeking. This will apply to
:word
and
^define

 

 
  [ # 2 ]

Thanks - I initially thought that might be what was going on but then thought I’d discovered exceptions to that case, obviously not!

This seems rather unintuitive though.  Using ‘:word design’ as an example, I get 14 definitions.  design~7 is the most common noun definition.  Is is possible to obtain this in script without hardcoding ~7, ie. in pseudo-code:

design~(design.num_nouns) ?

Verbs appear to always follow on from the nouns so I guess for those I’d need something like:

design~(design.num_nouns + design.num_verbs) ?

 

 
  [ # 3 ]

Currently there is no reason to believe you can write script to differentiate between the definitions, other than noun vs verb. So what are you really hoping to do?  When you want to create ontology, you need to know the specific synset involved, which may or may not be the most common one.

 

 
  [ # 4 ]

I’ve just noticed that going back to the first example I gave (of the word ‘fruit’), CS doesn’t seem to list them in reverse order.  WordNet gives:

(10) the ripened ... body
(3) an amount of a product
(1) consequence ...

(where frequency count is in brackets)

but CS gives:

an amount of a product
consequence ...
the ripened ... body

I see that the ontology data is the way to explore relationships between the words, but it would still be very useful to obtain the most common definition.  My application’s probably a little different to the usual as it’s for a robot - part chat, part learning and part information.  The most basic example where this would be useful is:

User: What does ‘pure’ mean?
Bot: Well, the word pure can mean a few things but the definition I’m most familiar with is ‘free of extraneous elements of any kind’.

Even better, if WordNet’s frequency count were also available then it might be plausible to throw out more unusual meanings and/or ask the user for clarification if the most common meanings rate similarly.

Is it possible to iterate backwards through the word’s data in script and search for the first noun/verb found in order to find the most common definition?  My apologies if this is an obvious question, I’ve only explored CS for a day so far.

 

 
  [ # 5 ]

^define gives you the definitions (currently in reverse order but will be fixed in next release) so you could ^burst it to get the most popular definition. You want the meanings in english, not something for script pattern matching

 

 
  [ # 6 ]

OK, thanks for the pointers re. ^define and ^burst.

You responded so efficiently that I was still editing my last post, I thought I’d get away with it!  Since you probably missed it, I’ll run the risk of a cut and paste:

I’ve just noticed that in the first example I gave (of the word ‘fruit’), CS doesn’t seem to quite list them in reverse order.  WordNet gives:

(10) the ripened ... body
(3) an amount of a product
(1) consequence ...

(where frequency count is in brackets)

but CS gives:

an amount of a product
consequence ...
the ripened ... body

So the most common definition does come last, but the less common ones have been swapped.

Another example: for the word ‘instrument’, WordNet’s most common noun definition appears in CS as ~2 of 6 possible definitions.

 

 
  [ # 7 ]

I get them in the order read in from Wordnet files. And at this point I cannot afford to change that order.

 

 
  [ # 8 ]

OK, that’s fair enough.  Maybe I should investigate the possibility of using the SQL version of the WordNet database and get it from the horse’s mouth, so to speak.

If your main concern about re-ordering is backwards compatibility, a solution could be to augment your current word database with WordNet’s frequency count.  This would preserve the current ordering and provide the required info.  However I suspect your other concern is also about extra workload, and I can see that being non-trivial if the journey from WordNet to CS was one-way.

Actually this have given me a third idea - maybe I could write something to query the SQL database, try and match the definition with those found in DICT/BASIC, and then add the frequency count to that definition.

 

 
  login or register to react