AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Catching Tokenized Input
 
 

Hi, all.

This is, I fear, an incredibly simple question.

I have the following line in my .top:

#! Who owns the fox?
u: ( << who [has got owns] [fox “the fox”] >> ) It’s Lucy’s fox.

Interestingly, the system tokenizes “the fox” in a manner that prevents my test text (or user input) from triggering the desired output. See below for the :prepare. I, for the life of me, can’t figure out how to write a version of this script that actually catches what I want it to. Help!

Thanks,
Rob


_> :prepare the fox

Original User Input: the fox
Tokenized: the fox
TokenControls: DO_SUBSTITUTE_SYSTEM NUMBER_MERGE PROPERNAME_MERGE SPELLCHECK INTERJECTION_SPLITTING POSTAG PARSE
Spelling fixed: the fox

Concepts:

1: the raw=  +~determiner(1)  +~determiner_bits(1)  +~kindergarten(1)  +the(1)  +~determinerlist(1)
.  +~good_intelligence_adjectives(1)  +~intelligence_adjectives(1)  +~adjectives(1)  +the~1(1)  //
1: a canonical=  +a(1)  +~letters(1)  +a~1(1)  //  +The(1)

2: fox raw=  +~verb_present(2)  +~verb_tenses(2)  +~verb_infinitive(2)  +~noun(2)  +~verb(2)  +~noun_singular(2)  +~noun_bits(2)
.  +~kindergarten(2)  +fox(2)  +T~bookreport (2)  +~mammals(2)  +~animals(2)  +T~animal_kingdom (2)  +~rideable(2)  +~functions(2)  +~eatable(2)
.  +~burnable(2)  +~beings(2)  +~tool(2)  +~animate_thing(2)  +~objects(2)  +~nounlist(2)  +~animal_kingdoms(2)  +fox~1(2)  +canine~1(2)
.  +carnivore~2(2)  +eutherian_mammal~1(2)  +mammal~1(2)  +craniate~1(2)  +chordate~1(2)  +fauna~1(2)  +being~1(2)  +animate_thing~1(2)
.  +whole~1(2)  +object~1(2)  +physical_entity~1(2)  +entity~1(2)  +~nounroot(2)  +slyboots~1(2)  +fox~3(2)  +pelt~2(2)  +animal_skin~1(2)
.  +animal_product~1(2)  +~animal_product(2)  +T~animal_rights (2)  +animal_material~1(2)  +stuff~7(2)  +substance~1(2)  +matter~1(2)
.  +portion~6(2)  +relation~1(2)  +abstract_entity~1(2)  +fox~4(2)  +throw~6(2)  +trick~8(2)  //
2: fox canonical=  //  +Fox(2)  +Fox~1(2)  +Algonquian~1(2)  +Amerind~1(2)  +tongue~4(2)  +language~3(2)  +communication~1(2)  +Fox~2(2)  +Algonquian~2(2)
.  +Red_Indian~1(2)  +Amerindian~1(2)  +~americanindian(2)  +T~american_indian (2)  +~intelligent_being(2)  +somebody~2(2)
.  +causal_agency~1(2)  +person_of_color~1(2)  +George_Fox~1(2)  +Charles_James_Fox~1(2)

  sequences=
  +the_fox(1-2) +~noun_title_of_work(1-2) +~noun(1-2) +~noun_proper_singular(1-2) +~noun_bits(1-2) +The_Fox(1-2) +~movie(1-2) +~book(1-2)
.  +~library(1-2) +~store_type(1-2) +~store(1-2) +~attributes(1-2) +~nounlist(1-2) +a_fox(1-2)


Tagged POS 2 words: the/a (Determiner)  fox (Noun_infinitive Noun_singular Verb_infinitive Verb_present) 
—-
TokenFlags: PRESENT SIMPLE_TENSE USERINPUT
TokenFlags: PRESENT SIMPLE_TENSE USERINPUT

 

 
  [ # 1 ]

Interestingly, I’ve had a similar issue with certain other kinds of input not catching as I expected. Specifically cartoons.

Example code:
u: (<< [pyro pyromaniac “into fire” “obsessed with fire” “talking about fire” “light a fire” “start a fire” “love fire”] >> ) Some people just want to watch the world burn.
#! Alfred said that.
a: ([Batman joker alfred]) You got me. I love that movie.

The test text fails and the input doesn’t catch. I’ve had the same issue with other cartoons like South Park.

Anyway, just in case this makes sense/helps/whatever.

Thanks,
Rob

:_> :prepare alfred

Original User Input: alfred
Tokenized: alfred
TokenControls: DO_SUBSTITUTE_SYSTEM NUMBER_MERGE PROPERNAME_MERGE SPELLCHECK INTERJECTION_SPLITTING POSTAG PARSE

Concepts:

1: Alfred raw=  +~noun(1)  +~noun_proper_singular(1)  +~noun_bits(1)  +~propername(1)  +T~trashrask (1)  //  +Alfred(1)  +Alfred_the_Great(1)  +~military_man(1)  +~cartoon_character(1)  +T~cartoons (1)  +~malename(1)  +~he(1)
.  +~age_gender(1)
1: Alfred canonical=  //

Tagged POS 1 words: Alfred (Noun_proper_singular) 
—-
TokenFlags: PRESENT SIMPLE_TENSE USERINPUT
TokenFlags: PRESENT SIMPLE_TENSE USERINPUT

 

 
  [ # 2 ]

The underlying issue is that CS comes with a bunch of books and movies in the worlddata. These are detected and merged as proper names, hence “the fox” becomes The_Fox” because it was a title. A new release this weekend of CS should fix this issue.  The system will not merge into one token anything considered as a NOUN_TITLE_OF_WORK unless you capitalized it (meaning you intended to refer to it as a title). The system will still detect the potential title, however, because it also checks sequences of words to match to keywords. So the individual words “the fox” will remain and be parsed etc, but the system will still return a mark on ~movie because it saw “the fox”.  In an ideal world this would not happen, because in an ideal world, if a user types in “the fox” in lower case, the system would trust the user. But as users pay no attention to cased typing, CS tries to figure out both upper and lower case intentions, in case the user is not capitalizing things.

 

 
  login or register to react