AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

How chatscript works for missesplelled words?
 
 

In the manual it says:

u: ([scare afraid])

means find anywhere in the sentence one of the words scare or afraid. And scare can be in any of its related forms: scared, scare, scaring, scares.

But, what if the user write “scre” with 1-letter mistake, or “horizntl” for 2-letter mistakes, or other kind of mistakes like “enviromnent” instead of “environment” ? How chatscript challenge with wrong words?

Also I need to know, can we write patterns like this [scare scre scar sare] to prevent user’s mistakes?

 

 
  [ # 1 ]

Probably better to handle spelling errors using spellfix.txt. You can enter as many variations of a word as you want and CS will substitute the correct word to your pattern.

You can also use wildcards to account for common errors. My students can’t seem to spell hospitalizations correctly so I just use ?: ([hosp*l hosp*s hosp*n]) to catch all of the variants.

You can also just do what you suggested.

#! Are you scared
?: ([scare scre scar sare]) I am not scared

 

 
  [ # 2 ]

editing spellfix.txt subjects you to overwrite issues when bringingin in new release. The equivalent in script that is safe because script is yours, is
replace: badspell goodspell

And I advise against putting non-words in patterns because the dictionary will not function as well (it will think sare and scre are real words and will spell check other wrong words toward them).  Real words are fine.  As is Doug’s suggestion of wildcarding some words

 

 
  [ # 3 ]

I should have considered that. I use my own LIVEDATA and spellfix.txt so things don’t get overwritten with updates.

 

 
  [ # 4 ]

CS will automatically fix some simple spelling mistakes, and not just those in spellfix.txt.

In certain situations we are using a Levenshtein routine (actually Damerau-Levenshtein) to correct spelling. There are many javascript implementations out there are that can be embedded into CS with little to no modification beyond the wrapper definition.

And in a similar vein I’ve just been experimenting with Metaphone to find misspellings, particular of proper names. Not finished with it yet, but it is looking promising.

 

 
  [ # 5 ]

We have been experimenting with a combination of ChatScript and a convolutional neural network to manage dialogue. The CNN doesn’t help all that much yet, but one area where it consistently improves our accuracy is with misspelled words.

 

 
  [ # 6 ]
Doug Danforth - Apr 3, 2018:

We have been experimenting with a combination of ChatScript and a convolutional neural network to manage dialogue. The CNN doesn’t help all that much yet, but one area where it consistently improves our accuracy is with misspelled words.

Hey Doug,
How do you make the integration? Do you have a “router” service layer where you integrate the CNN with CS, or some other architecture?

Sounds very interesting regardless of the specific results.
Would be really nice to hear more.

 

 

 
  [ # 7 ]

Well I certainly don’t know all of the specifics as most of that is done by my linguistics and computer science collaborators, but we/they have been trying to use machine learning (ML) to help with our virtual patient dialogues for the past several years. Just using ML by itself does not work well due to the sparse data problem - we only have a few hundred dialogues with a few thousand (<15,000) questions and ML generally needs a lot more than that to be any good. As such, we have looked at ways to combine ChatScript (CS) with ML to see if we could improve our accuracy.

As we examined the questions missed by both systems, we found that the CNN (we currently use a word and character based CNN classifier but have tested others) was bad at rare questions (no surprise) but good at questions with mangled words. There are other bits to the model (for example deciding whether to suggest a match when CS does not match anything) but one of the biggest gains was on horribly misspelled words. Based on these data we built a classifier that would choose either CS or the CNN based on the probability that one or the other was correct.

The classifier is part of a web services back end which gets the input from the student, routes it to CS, gets the response from CS, asks CS :why to get the corresponding label, and decides whether CS is likely correct or whether the CNN is likely correct. If CS is chosen then the original answer is provided to the student. If CNN is chosen then the system sends that label (as the canonical form of the question) back to CS and gets the corresponding answer and sends it back to the student.

It is a roundabout approach I know (we were in a hurry to get this implemented) and we are working on making it more efficient, but the drag on the system is the chooser, not CS. CS is so fast that it matches the input, provides the answer, answers why, and (if necessary) responds again in ΒΌ of the time it takes the chooser to decide. The other drawback is that resubmitting the canonical form of the question does not ALWAYS result in the correct match, but we can solve that pretty easily.

We are currently working paraphrasing and memory augmented models to combat the sparse data problem, as well as ways to generate more training data such as getting all patterns that match an input, translating from English to other languages and back, and even putting a version of our Virtual Patient in the local Center of Science and Industry to get random schoolchildren to ask him questions. What could go wrong with that grin.  The problem with all of these is that some human generally has to decide if the responses/matches are correct and that human is me.

Probably more than you wanted to know but hope this helps,

Doug

 

 
  login or register to react