AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Help in POS tagging
 
 

I downloaded some code posted by Schenk on pos tagging a sentence using burst but most of the concepts specified for POS in system variables manual dont seem to work. Is this because of a version change or something else in POS?

It works for most if I add list to the conceptname but I’m not sure how to handle pronouns and plurals. I have tried ~pronoun, ~pronoun_bits and ~pronoun_subject/object.
Also ~noun_plural and ~plural

 

 
  [ # 1 ]

Insufficient data. Start by doing

:prepare My dog is green

and give me the trace from that.

 

 
  [ # 2 ]

My dog is green does not contain any pronouns, numbers or plurals. Like I said, conceptnames like ~noun or ~adjective don’t match but ~nounlist or ~adjectivelist does. Here is the script I am trying to use to match POS tags. It is inside a loop. The first three dont match what they should. The last three work by assigning variables to a rule match of ~mainsubject and so on. ~prowords is a custom concept of pronouns but I dont understand why the concepts for POS dont work. Walking a burst seems the best way to traverse a sentence and assign POS values to them. Better than using rule matches and retries. I guess I only need a working concept for plurals and numbers.

  if ( $$NextWord ? ~pronoun) { $$Output = ^join( $$Output _ \( pronoun \) ) }
  if ( $$NextWord ? ~plural) { $$Output = ^join( $$Output _ \( plural \) ) }
  if ( $$NextWord ? ~number ) { $$Output = ^join( $$Output _ \( number \) ) }
  if ( $$NextWord ? ~determinerlist ) { $$Output = ^join( $$Output _ \( determiner \) ) }
  if ( $$NextWord ? ~nounlist ) { $$Output = ^join( $$Output _ \( noun \) ) } 
  if ( $$NextWord ? ~prowords ) { $$Output = ^join( $$Output _ \( pronoun \) ) }
  if ( $$NextWord ? ~verblist ) { $$Output = ^join( $$Output _ \( verb \) ) }
  if ( $$NextWord ? ~auxverblist ) { $$Output = ^join( $$Output _ \( auxverb \) ) } 
  if ( $$NextWord ? ~adjectivelist ) { $$Output = ^join( $$Output _ \( adjective \) ) }
  if ( $$NextWord ? ~possess ) { $$Output = ^join( $$Output _ \( possessive \) ) }
  if ( $$NextWord ? ~adverblist ) { $$Output = ^join( $$Output _ \( adverb \) ) } 
  if ( $$NextWord ? ~extensions ) { $$Output = ^join( $$Output _ \( conjunct \) ) }
  if ( $$NextWord ? ~prepositionlist ) { $$Output = ^join( $$Output _ \( prep \) ) } 
  if ( $$NextWord = $$mainsubject ) { $$Output = ^join( $$Output _ \{ MAINSUBJECT \} ) }
  if ( $$NextWord = $$mainverb ) { $$Output = ^join( $$Output _ \{ MAINVERB \} ) }
  if ( $$NextWord = $$mainobject ) { $$Output = ^join( $$Output _ \{ MAINOBJECT \} ) }
  $$Output \n

 

 
  [ # 3 ]

Here are the traces for “My dog is green” which tags well in my script posted above and “I have three books to give him” which doesnt.

Command: :prepare my dog is green
TokenControl: DO_NUMBER_MERGE DO_PROPERNAME_MERGE DO_DATE_MERGE DO_SPELLCHECK DO_PARSE


Original User Input: my dog is green
Tokenized into: my dog is green
Actual used input: my dog is green

Xref: 1:my >2   2:dog   3:is c4   4:green  
Fragments: 1:my   2:dog   3:is   4:green  
Tagged POS 4 words: my/I (Pronoun_possessive)  dog (MAINSUBJECT Noun_singular)  is/be (MAINVERB Verb_present_3ps)  green (SUBJECT_COMPLEMENT Adjective_normal) 
  MainSentence: Subj: [ my] dog   Verb: is   Compl: green PRESENT


Concepts:

1: my (raw):  +~pronoun_possessive +~possessive_bits +~kindergarten +my +~possess //
1: I (canonical):  +I +~prowords +~vowels +~omnivore +~letters // 

2: dog (raw):  +~noun +~noun_singular +~singular +~normal_noun_bits +~noun_bits +~kindergarten +~mainsubject
.  +dog +~soundmaker +~moving_generic +~animate_move_verbs +~use_movement +~animate_verbs
.  +~verblist +~active_verbs +~use_intentionverbs +~carnivore +~pet_animals +~pet_store +~store_type
.  +~animals +~rideable +~functions +~eatable +~burnable +~beings +~tool +~animate_thing +~objects +~nounlist
.  +~animals_generic +~animal_kingdoms +~mammals +~stronggoodness +~goodness +dog~n +being~1 +~nounroot //
2: dog (canonical):  // 

3: is (raw):  +~verb_present_3ps +~verb_bits +~verb +~kindergarten +~mainverb +is +~linkingverb +~auxverblist
.  +~wordnetpropogate +~equals //
3: be (canonical):  +be +~tobe +~be_verbs +~states_of_being +~static_verbs +~usefulfactverb // 

4: green (raw):  +~adjective +~adjective_normal +~kindergarten +~subjectcomplement +~sentenceend +green
.  +~green +~colors +~color_adjectives +~esthetic_adjectives +~physical_properties_adjectives
.  +~adjectivelist +~goodness //
4: green (canonical):  // 

Sequences:
After parse TokenFlags: PRESENT USERINPUT

 


Command: :prepare i have three books to give him
TokenControl: DO_NUMBER_MERGE DO_PROPERNAME_MERGE DO_DATE_MERGE DO_SPELLCHECK DO_PARSE


Original User Input: i have three books to give him
Tokenized into: I have three books to give him
Actual used input: I have three books to give him

Xref: 1:I   2:have o4   3:three >4   4:books   5:to >6   6:give >2   7:him  
Fragments: 1:I   2:have   3:three   4:books   5:to v1   6:give v1   7:him v1  
Tagged POS 7 words: I (MAINSUBJECT Pronoun_subject)  have (MAINVERB Verb_present)  three/3 (Adjective_number)  books/book (MAINOBJECT Noun_plural)  to (<Verbal To_infinitive)  give (ADVERBIAL VERB2 Noun_infinitive)  him/he (OBJECT2 Pronoun_object Verbal>) 
  MainSentence: Subj: I   Verb: have [v1]    Obj: [ three] books PRESENT
Verbal 1:  Verb: [to] give


Concepts:

1: I (raw):  +~pronoun +~pronoun_subject +~pronoun_bits +~kindergarten +~mainsubject +I +~prowords +~vowels
.  +~omnivore +~letters //
1: I (canonical):  // 

2: have (raw):  +~verb_present +~verb_bits +~verb +~kindergarten +~grade3_4 +~mainverb +have
.  +~causal_to_infinitive_verbs +~misc_parsedata +~auxverblist +~mental_states +~static_verbs +~own +~possess
.  +~possession_verbs +~social_verbs +~animate_verbs +~verblist +~active_verbs +~use_intentionverbs
.  +~do_with_titles //
2: have (canonical):  // 

3: three (raw):  +~adjective +~adjective_number +~kindergarten +~daynumber +~number +~timebasedreference
.  +three //
3: 3 (canonical):  +3 +~month_names_index // 

4: books (raw):  +~noun +~noun_plural +~plural +~normal_noun_bits +~noun_bits +~kindergarten +~mainobject +books //
4: book (canonical):  +book +~tool +~thrift_shop +~store_type +~flammable +~functions +~classroom +~bookmagazine
.  +~reading_stuff +~entertainment_stuff +~artifacts +~objects +~nounlist +~human_data +~book_store +~surfaces
.  +~stronggoodness +~goodness +book~n +~nounroot // 

5: to (raw):  +~lowercase_title +~to_infinitive +~kindergarten +~locationword +~locatedentity +~there
.  +~verbal(5-7) +to +~directionpreposition +~spacepreposition +~prepositionlist +~focus +~directions //
5: to (canonical):  // 

6: give (raw):  +~noun_infinitive +~verb +~kindergarten +~verb2(5-6) +give +~give +~acquiring_verbs +~gain_verbs
.  +~possession_verbs +~social_verbs +~animate_verbs +~verblist +~active_verbs +~use_intentionverbs
.  +~acquire_imperatives +~stronggoodness +~goodness //
6: give (canonical):  // 

7: him (raw):  +~pronoun +~pronoun_object +~pronoun_bits +~kindergarten +~object2 +~sentenceend +him
.  +~prowords //
7: he (canonical):  +he // 

Sequences:
After parse TokenFlags: PRESENT USERINPUT

 

 
  [ # 4 ]

I’m guessing (since your script isnt complete) that your loop looks something like this:

u: (_*1)  $$nextword = _0
if ( $$NextWord ? ~pronoun) { $$Output = ^join( $$Output _ \( pronoun \) ) }
      retry(rule)
The problem is that pos concept sets are not actually normal concepts (there is no list of words for them). They are dynamically marked by the engine as a result of pos-tagging. Since the ? operator in the if is looking at a real set, it will fail. Pattern matching, however, does work with the marking system, which is why you see ~pronoun marked when you do :prepare I love you.  So the way to do this is as follows:
u: (_*1)  refine()
  a: (_0?~pronoun) ...  ^retry(toprule)
  a: (_0?~noun) ...  ^retry(toprule)

 

 
  [ # 5 ]

Actually in this script $$nextWord is a factsubject retrieved from a burst factset of the input. I knew the pos tags were dynamic concepts but was still able to find real set that contained the data to compare. I was able to find equivalent real set for most categories except plural and number. Are there any?

Won’t the matching script you have shown match one pos tag for one word and then fail due to _*1. My script will match all possible tags(e.g. verb and noun) and can then possibly be worked on to clear up the ambiguities.

 

 
  [ # 6 ]

a) there are no equavalents to plural and number
b) use of nounlist and verblist will react to things which are NOT nouns and verbs in the sentence

use of u(_*1) ....
  ^retry(toprule)  will keep retrying with new matches of _0 until all are used up and it fails.

 

 
  login or register to react