AI Zone Admin Forum Add your forum

NEWS: survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Is there any posibility to implement a Spanish Chatbot with spell checking in RiveScript

Hi, I would like to know if there is any posibility to implement a Spanish Chatbot with spell checking in RiveScript.
Thanks Advanced.


  [ # 1 ]

I am not expert in Rivescript, but faced this problem before, with early-AIML system about 10 years ago.
I was wanting to make a Spanish chatbot, and the real problem was not only the spellchecking but the whole system of simlpe pattern matching is collapsing on any flexionfull language like Spanish, French, Italian, Portuguese, etc.
Each word in those languages has too many forms, and thus, using a simple pattern yields failure (lots of missing or you are urged make lots of patterns - too much quantity for the system to be responsive…) and the combinational power of AIML reduction systems goes to hell!

Therefore I decided to build my own dictionary, capable of dealing with Spanish (and other’s) inflections. But the way spanish is used, too many compund verbs, too much adjectives, including gender, number etc. forced me to walk out of simple pattern matching schemas. Then I decided to desing a new system, then there was no chatscript nor rivescript, or at least I did not met them, and begun to construct a whole new concept, a semantic+sintactic pattern matcher. Also the dialog flow of AIML was not too contollable, adn the flow of the conversations is too weak for me.

The inflection-capable dictionary worked fine! but there remained several problems, as Spanish has too many diacritics (written accents and marks) and the people do not use them a lot, the pattern matching capability was severly mined by the simple spelling errors, which are too usual on chat users. So I went into a second challenge:  a spanish one-time spellcorrector (I mean one-time because Word fo example, only corrects giving you a list of possible words, and you are not there to select which!) as well MS-word spells words too solwly (I tried to use it as backend, in a COM-model, I got severly dissappointed! because of the spelling speed was only 2-3 words per second!!!) and the precision was less then 60%, so get 2 of 3 words corrected soppending 3 seconds for each word is dull!
Then as second stage I faced to build a Spanish Spell-Correction system, capable of correcting at least 90% of the input word-stream and being successfully fast, at least 10 words/second! - was my goal!

After 4 years of hard work including my These on NLP-Engineering Electronics at University of Buenos Aires done with 2 Engineers+PhD/doctors, a recognised lingüist PhD and very much research, many International Research Publications, I succeeded (I guess) with a fair Spellcorrector (not spellchecker),

It is now capable of fixing severe spelling errors in context (close-sintactic, not yet semantic) at a speed of up to 900 words/second about 300 times better than word in many ways, and with a precision of at least 98% of the most common spell-errors like diacritics, also capable of handling multiple errors on the same word, making sound-like (phonetic) corrections. As an example you may spell (in Spanish) VAYEMA meaning BALLENA and you got 67% misspelled characters, including an addition, and the system desn’t hesitate to give a unique correct answer.

As a side effect, the system is capable of correcting well written words, ¿what is this? - if the words are well written but sintactically not well placed, and another word, with a spell-error is the best chance for that place, it changes it - no doubt- like the following sentence: “el komia carrne ezta manana” -> “él comía carne esta mañana” in a brize!

I designed a Dialog Description Language acronymed: DDL, a whole new concept, like AIML but not a XML-like pattern matching scritpt, instead of this crap I designed a whole new language, defining a regular syntax and building afterwards a full flagged compilator (not interpreter) to engage it for speed. It has over 100 built-in lingüistic functions, is extensible, can include any .net code and is faced towards high availability and speed. So robust thet you can change parts of a compiled + running program on-the-fly, dynnamically recompiling all internal links, like no other known computer language. (the main idea taken from the web frameworks, where you throw a new source an the system recompiles it an places it to work)

After 3 years, the system includes many state-of-the art semantic matching algorithms, including SVM + LSA methods, for matching whole buchs of text, training them on-the fly (as well as you throw the new text into the repository) and allowing to access a whole SQL-selected data from any database as a single pattern, including phonetic+spellcorrection. We also built a question-classifier, capable of extracting the subject and matter of any question, even of ill-behaved grammar, the F-Score of this multiple classifier, is over 88% being close to 100% for person, time places, faling only on common objects vs. matters.

to be continued…


  [ # 2 ]


A few years ago, the DDL system was talking with 60 simultaneous users, at a constant flow of 10 messages each single second, and we needed to upgrade the language personality core… We simply throw all the new code into the repository directory using FTP and the system in a breeze, detected the new code, tested it, then instanced a new engine-class, then parallelled both engines ina weighted round-robin way, allowing each conversation to end flawlessy and as each new user or conversation came in, it was transparently redirected to the new instanced engine, and as well as the old engine has stopped precessing each pending answers, its memory and state of each user-conversation was transferred to the new engine, and then it got killed and dismounted from memory, all in a dynnamical and transparent way! No user saw any slowliness unrisponsiveness nor disruption of the service! - end of story!

At this point, in 2015 the platform has undergone many test including international recognized publications on congresses and is somehow mature, very advanced (I guess), with a real-world compilator inside, converting the DDL into machine-language to execute fast in the server, is also capable of doing full spanish parsing, (only recommended for special patterns) due to chat jerga is not as grammatical as you may whish! also capable of huge lingüistic pre-and post-processing.

We recently included some unique NLG (Natural Language Generation) skills, like say “ordinal or cardinal” numbers in words, as well as romanic ones.

You can also inflect any word, nost just specifying gender, number, tense and mode, but also based upon the complex inflection rescued from another user-provided inflected word!

For example wou may say something and the user anwers with an inflected strange-verb which is unknown or unexpected for the pattern, but seems to be ‘slightly correct’, you can re-ask using this user-worn inflection on a phonetical-similar (system-picked) verb, from a list you handle. Here is an example:

        bot - “decime que hicistes ayer?”   
        user - “ayer, despunté con amigos”

> here “despunté” is in past, 1st person, and you may want to ask using your phonetically guessed-verb “desfiler, instead of despuntar”, related to the conversation and immitating the original verb, just answering:

        bot - “no habrás querido decir: ‘ayer desfilé con amigos’ en lugar de lo otro..”

in this case, you simply change a “captured verb” using your infinitive form “desfilar” and the inflected user-provided form using a function. -that’s simple!

- hope this enlightens your perception of the existing Spanish chatbot platforms!



  [ # 3 ]

Hi Andres,
Yes, indeed it looks a great contender to ChatScript, able to match many of its meaning-recognition features but in Spanish. It would be nice to test it, or look examples of where has been applied. Thanks for the info.


  [ # 4 ]

Hi Eduardo,

It does not try to be a contender to ChatScript, AIML or Rivescript, they are indeed good options!

I prefer to say it is a thourough and prior design, started somewhen in 2005 which has necessarly many features found in other products, which are out there, just because those features are completely logic to exist, and even good to have some compatibility on the concept, like the star (Kleene)  pattern, but it has a set of ‘oteher’ unique characteristics which makes it specially useful for all kind of languages, not being constrained to English as most NLP packages nowdays are.

Nothing more, but also nothing less.

If you like, please contact me directly and I’ll supply you with more information by email.
hope you like it!



  login or register to react