AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Pattern matching issue with period.
 
 

Hi,

I am running into a strange problem here.  Here is the test output:

=====================================
:testpattern (salad)salad.

Incoming data- francis | aimee | :testpattern (salad)salad.

Original User Input: salad.
Tokenized: salad. 

Concepts:

1: salad. raw=  ~mainsubject ~noun ~noun_singular ~noun_bits salad.  //
1: unknown-word canonical=  unknown-word // 

Tagged POS 1 words: salad./unknown-word (MAINSUBJECT Noun_singular) 
—-
  MainSentence: salad.  PRESENT

  he=  she= chef heshe=  it=salad.  they= many meetings here=  there=
. ( salad Failed
========================================
If I do:

:testpattern (salad.)salad. then it matches.

I seems that it is treating “salad.” as a whole word.

Thanks

 

 
  [ # 1 ]

Sounds like you are using a slightly earlier version wherein I accidently lost the period substitutions behavior.
I have no such problem in version 2.9 that I detect, having fixed the systemessentials.txt livedata file previously.

 

 
  [ # 2 ]

Thanks for the quick reply.  I am using the 2.9 version on Linux.  It seems to only happen to the word “salad”.  Is there a way for me to fix it directly? 

Thanks,

===========================

./LinuxChatScript32 local
ChatScript Version 2.9 32 bit
  Full dictionary code.
: Aug 18 09:07:58 2012 words=52,168 specials=2261 waste=28465 hash=65536 avgseek=1.1 maxseek=7 facts=32,475 dtext=3961052
Keyword0 altered Micronesian
Build0: words=59188 facts=116931 dtext=1042044 stext=0
Build1: words=621 facts=20147 dtext=19040 stext=74672
Currently have a substitute for jewellery in jewellery jewelry => jewellery
Livedata: entries=11877 facts=919 dtext=202912 posrules=485
Used 27MB: dict 154,844 (14245kb) fact 170,472 (6818kb) text 5302kb buffer (12x80000= 960kb) user (1x7000= 7kb)
Free 49MB: dict 893,731 fact 629,528 text 24,697KB

:prepare salad.

Original User Input: salad.
Tokenized: salad. 

Concepts:

1: salad. raw=  ~noun ~noun_singular ~noun_bits salad.  //
1: unknown-word canonical=  unknown-word //

Tagged POS 1 words: salad./unknown-word (Noun_singular) 
—-
  MainSentence:  PRESENT

  he=  she= chef heshe=  it=  they= many meetings here=  there=

 

 
  [ # 3 ]

For me, on windows and Linux (amazon cloud), there is no problem. The system tokenizes correctly.
The substitutions file does not play a part, since it is showing as tokeninzing with the period attached to the salad.
Obviously it shouldn’t.  If, somehow, you had a dictionary entry with salad.  as a legal word, then that would be one explanation.

I’m guessing that you actually have u: (salad.) somewhere, causing that word to be added into the local dictionary in TOPIC.
Because salad.  would be an erronous pattern.  You could search dict0 and dict1 in TOPIC to find it. Or you can search your source code for it.

 

 
  [ # 4 ]

You are right!  I made a mistake in my pattern, putting (salad.).  Deleting TOPIC files and rebuild solved the problem.  Live and learn!  Thanks again.

 

 
  [ # 5 ]

Normally, of course, ChatScript would give you a warning message on keywords it doesnt recognize.
But that is explicitly disabled for words with _ and period and a few other things where it is expecting it might be phrases or abbrevaited words like “St. Helena”, etc. So you got no helpful warning from the system.

 

 
  login or register to react
‹‹ ChatScript 2.9      Reading system time ››