AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Pseudo Parsing - Determining if a word is a noun
 
 

Ok, so I am having some challenges detecting if a word is a noun.  I have created a test pattern to illustrate different results with different methods of testing if a word is a noun:

u: (is {a} _*1 a noun) ^noerase() ^repeat()
  if (_0 ? ~NOUN) {1. yes \n} else {1. no \n}
  if (_0 ? ~NOUNS) {2. yes \n} else {2. no \n}
  if (_0 ? ~noun) {3. yes \n} else {3. no \n}
  if (_0 ? ~nouns) {4. yes \n} else {4. no \n}
  if (^HASPROPERTY(_0 NOUN)) {5. yes} else {5. no}

results:

alaric:_> is a corner a noun?
HARRY:_ 1. yes
2. no
3. yes
4. no
5. yes
alaric:_> is a car a noun?
HARRY:_ 1. yes
2. yes
3. yes
4. yes
5. yes
alaric:_> is driving a noun?
HARRY:_ 1. no
2. no
3. no
4. no
5. yes

Why are the results different?  Are the concepts built from the WordNet dictionary definitions or are they different?  Where are the concepts defined that are created when you run :build0?
What is the difference between ~NOUN and ~NOUNS? 

I know that you said Part of Speech (POS) tagging was a work in progress.  I tried :SHOW POS and it showed some interesting tags but not a tag for each word in the sentence.  I would like to try “parsing” a sentence to build rules when the input sentence contains “If” and “then”.  I have the logic to burst and loop through each word but I am getting some inconsistent results when testing and flagging nouns as shown above.  What is the best way to tell if a word in a variable might be a noun (nouns can be used as adjectives so I realize that context will need to be considered eventually)? 

Thanks.

 

 
  [ # 1 ]

Here is more output illustrating my problem:

alaric:_> I went to the store today on the corner in the historic district.
HARRY:_ This sentence has 13 words.
I (?)  went (verb)  to (preposition)  the (article)  store (verb)  today (adverb
)  on (preposition)  the (article)  corner (?)  in (preposition)  the (article)
historic (adjective)  district (?)
alaric:_> is district a noun?
HARRY:_ 1. yes
2. yes
3. yes
4. yes
5. yes

 

 
  [ # 2 ]

Here is the code to generate the POS tagging; I created my own concepts for articles and prepositions:

concept: ~myarticles(a an the)
concept: ~myprepositions(to for of with from at in on toward above below before after under beneath over through by)

topic: ~myparsing (parse parsing)

t: ^noerase() ^repeat() I like to parse sentences.

u: (is {a} _*1 a noun) ^noerase() ^repeat()
  if (_0 ? ~NOUN) {1. yes \n} else {1. no \n}
  if (_0 ? ~NOUNS) {2. yes \n} else {2. no \n}
  if (_0 ? ~noun) {3. yes \n} else {3. no \n}
  if (_0 ? ~nouns) {4. yes \n} else {4. no \n}
  if (^HASPROPERTY(_0 NOUN)) {5. yes} else {5. no}

s: (_*)  ^noerase() ^repeat()
  @1 = ^burst(‘_0)
    $sentencelength = ^length(@1)
  This sentence has $sentencelength words. \n
  $word = ^FIRST(@1subject)
  $count = 0
  loop($sentencelength)
  {
$count = $count + 1
If ($count != $sentencelength) {$nextword = ^FIRST(@1subject)} else {$nextword = _ }
    if($word ? ~pronoun) {$word (pronoun)}
    else if($word ? ~noun) {$word (noun)}
    else if($word ? ~myarticles) {$word (article)}
    else if($word ? ~myprepositions) {$word (preposition)}
    else if($word ? ~adjectives) {$word (adjective)}
    else if($word ? ~verbs) {$word (verb)}
    else if($word ? ~adverbs) {$word (adverb)}
    #else if(^ENDSWITH($word ly) == true) {$word (adverb)}
    else if($previousword ? ~verbs) {$word (adverb)}
    else {$word (?)}
    $previousword = $word
    $word = $nextword  
  }

 

 
  [ # 3 ]

Now turning my attention to these questions…

1. ~noun is the “correct” set to use for nouns, being based on either postagging or what the dictionary has on a word
2. ~nouns is from a private hierarchy I was using in an application. 
3. all concept sets are considered to be lower case so ~noun and ~NOUN are equivalently ~noun.
4. Where are the concepts defined that are created when you run :build0?  in TOPIC/keywords0.*

 

 
  [ # 4 ]

Then there’s the question of driving…..
It is a gerund (a verb acting like a noun).  The system will label it as a NOUN_GERUND, but not consider it a noun member of ~noun.  It is considered a member of ~verb.  It’s an interesting question how to label it. I could be argued into making it be considered a noun, not a verb.

 

 
  [ # 5 ]

Then there’s the question of driving…..
It is a gerund (a verb acting like a noun).  The system will label it as a NOUN_GERUND, but not consider it a noun member of ~noun.  It is considered a member of ~verb.  It’s an interesting question how to label it. I could be argued into making it be considered a noun, not a verb.

Under current house rules, you could check for ~noun_bits to detect anything the pos tagger considers an actual noun or a gerund.  NOUN_BITS = NOUN_SINGULAR | NOUN_PLURAL | NOUN_PROPER_SINGULAR | NOUN_PROPER_PLURAL | NOUN_GERUND | NOUN_CARDINAL | NOUN_ORDINAL .

 

 
  [ # 6 ]

Chatsscript 2.97 now released will treat gerunds and “to infinitives” as ~noun

 

 
  [ # 7 ]

I’m playing with the tagger. An example sentence
“How well does this system understand me?”

If my responder has _{~VERB} _{~NOUN},
then both variables are matched to “well”. 
One thing is to accurately have the word match to verb, but in the meantime, is there a way I can prevent ~noun matching to “well”, if the word “well” has already matched to ~verb?
It seems ^unmark and ^retry can help…

Thanks

 

 
  [ # 8 ]

I never seem to get the time to upgrade the tagger/parser world. I keep hoping to.

Any responder that has {~verb} {~noun} would not match well as a noun, if it matches as a verb to also as a noun, because the pattern says find a noun immediately after the matching verb. So if it did, that would be a chatscript engine bug.
This pattern is spurious, of course, because it says match ANYTHING, since both terms are optional.  So, in theory, if it finds a verb it will note that. And then if it can find a noun immediately afterward it will note that. If noting fails for either, it notes nothing. If it can find no verb, but can find a noun, it will note that.

 

 

 
  [ # 9 ]

Me again oversimplifying.
The responder is more complex:
( how _[ ~AUX_VERB ~ADVERB] * _{ I you } << _{ ~VERB } _{ ~DETERMINER } _{ ~NOUN }>> )

 

 
  [ # 10 ]

Finding in any order a verb and a determiner and a noun…. why would you NOT expect the same word could match both verb and noun?  any order means it doesn’t care about what came before.

 

 
  login or register to react