AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

How to implement a Chatscript bot that talk Spanish
 
 
  [ # 76 ]

Hi Bruce,
please could you tell me what’s the difference between level0 and level1? I see these fact0, fact1, dict0, dict1 files in the TOPIC folder, what is the difference between those, why two of them???
Thanks Advanced. Ill keep reading manuals.

 

 
  [ # 77 ]

Level 0 loads first. Then level 1 loads.  How you use the levels is up to you.

Some people put all their code and the CS Ontology and world facts data in level 0.

I put all the basic chatscript predefined data in level 0 and my bot in level 1. That way my compiles are faster because I dont need to recompile all the basic cs stuff every time I change my bot.

In a dynamic app context, level 0 is typically the stable stuff you dont expect to change, and level one is all the stuff you might revise and redownload onto the app. Again, cuts down on the network transfer size.

 

 
  [ # 78 ]

Eduardo,

I have been experimenting for a few years off and on, sometimes with long breaks.  I’ve been more active recently as there have been many updates to chatscript this year.  I have never really tried to create a general discussion chatbot, but have just explored various issues, problems, and challenges with implementing a more advanced chatbot.  I am interested in AI, Machine learning, NLP, NLU, and knowledge representation.  I like that Chatscript provides a way to store and reference facts and would really like to be able to parse an English sentence and map the semantics to facts that are stored so that a chatbot would actually be able to learn from conversation or from reading. Chatscript has POS tagging which helps but I am still learning.  As I learn I post examples and I post questions so that I and others can learn.

My occupation involves business intelligence and data warehousing so I have some technical background but I am not a developer.  I can read C# and javascript but have not done much low level coding.  Chatscript allows me to focus on the creation of a chatbot at a higher level than having to write all of the code myself as others posting on this forum have done.  It may seem advanced but it saves you alot of time.  Many people who post on the forums have been working on their chatbots for years and I find their insights helpful and interesting.  However, if you keep at it you should be able to create a basic chatbot in a few months that can respond to common questions, etc.

I am interested in the creation of a Spanish Chatbot because, by starting from scratch and not using the built in dictionary, etc. I come to better understand how Chatscript works and what all of the files are used for that I have taken for granted in the past.

Since creating a chatbot is a large task, I am interested in collaborating with others.  Several people have suggested creating a common base for an open source chatbot.  I will continue to post things as I learn and perhaps others may have some insight into creating a Spanish chatbot and will post their ideas and/or questions.

The key is to focus on specific tasks and post specific questions and keep working on it a little at a time.

 

 
  [ # 79 ]

While we figure out the verbs, here is an example of how to add prepositions.  Add the following to a new file called prepositions.tbl:

concept: ~Preposiciones (~preposition)

table: ~Preposiciones(^preposición)
^addProperty(^preposición PREPOSITION)

DATA:
a
abajo  
alrededor 
apagado 
arriba 
athwart 
circa 
con 
contra 
contrario 
dado 
de 
desde  
durante 
en
encendido 
encima 
entre 
excepto 
exterior 
fuera 
gustar 
hacia 
hasta 
interior 
junto 
más 
menos 
no 
para 
pasado 
pero 
por 
que 
relativa 
restricción 
ritmo 
ronda 
salvar 
según
siguiente 
sin 
sobre 
totalmente 
tras
vale
veces

 

 
  [ # 80 ]

Hi Alaric,

is good to know your experience, and your willing to collabarate with other in order to achieve this task. Im new at Chatscript, I never did a chatbot before, I just have read the basic manuals, and have seen how the engine works. But I do have to build this spanish chatbot, and now finally I have get time to do it, hope we could collaborate each other.

 

 
  [ # 81 ]

Hi Bruce, another tasks stole me some days, so it took me some time to read most of the chatscript manuals (Basic User, PosParser, System Functions, Fact n Advanced User) hope it was the right approach to read it in a swap, at least now Im familiar with most CS terms, thought I had some doubts here and there.

Also I looked for a spanish wordnet and found some project called MCR, that claims to have the same format that the Princeton Wordnet 3.0, please Bruce could you give it a look? here is the link, is free to download and its compatible with postgreSQL http://adimen.si.ehu.es/web/MCR/
that web also links a README file that explains what is MCR about.

Hope this spanish dict could be usefull, hope it could somehow be converted into a spanish dict that could be place in the CS DICT folder. please correct me if Im wrong. I know it will work only for spell checking, but can the system call from DICT/SPANISH ? or it only call from DICT/ENGLISH? sorry if that question is kinda simple.

Thanks Advanced Bruce.

 

 
  [ # 82 ]

It tells you how to map some meaning in one language to one in another. It’s actual spanish raw data is not anywhere close to the format of princeton wordnet and my tools wont work on it.

 

 
  [ # 83 ]

Oh I see, I will transfer the most important words to the DICT folder following the format you suggested

Bruce, could you please post and example of how should I type the script table so it works with the verb (enviar = to send || envié = sent || enviare = will_send) could you please post an example with an input an output? how could CS could tell the different between the user saying “I sent you” (te envié) and “I will send you” (te enviare).

So far I understood that the script table will create the verb conjugations automatically in the DICT folder and in the topic, am I right? please correct me if Iam wrong.

What is the advantange of using the script table to handle verb conjugations compare with just placing all conjugations in the DICT file? I would like to differenciate between a verb in past and a verb in future. Thanks.

Thanks Advanced
Bruce.

 

 
  [ # 84 ]

Hi Bruce
I deleted all contents of DICT folder, and created a small file with basic spanish words, save it as “m” in the DICT folder.

then in my Lucia bot (a variation from HARRY) I created a topic with the word manaña, y placed the word mañana y the keyword as well as in the pattern, and it work, it recognize the word, it also make the spell checking, but only if I create a concept with that word at the top, like this…
concept: ~spanishnouns NOUN ( mañana)

if I erase that line, it won’t work, please Bruce correct me if Im wrong, so I guess the DICT/m file that I did with the spanish words is not working by its own right? only the concepts at the begining of the top file are working in order to add spanish words, am I right?? I really would like to avoid doint al the concepts for all words, and only a DICT file, is there something I am doing wrong??

Thanks Advanced.

 

 
  [ # 85 ]

the file “m” will not be recognized by the dictionary builder.  The prior file was named “m.txt”

 

 
  [ # 86 ]

Hi Bruce,
sorry my mistake at the moment of typing the post, yes, I name it “m.txt” and saved as UTF8, here is a sample of its contents, every word is in a single line…

áéíóú ( NOUN_ABSTRACT NOUN ADVERB NOUN_SINGULAR COMMON4 COMMON1 KINDERGARTEN posdefault:ADVERB TIMEWORD )
mañana ( NOUN_ABSTRACT NOUN ADVERB NOUN_SINGULAR COMMON4 COMMON1 KINDERGARTEN posdefault:ADVERB TIMEWORD )
instante ( NOUN_ABSTRACT NOUN ADJECTIVE ADJECTIVE_NORMAL NOUN_SINGULAR ADJECTIVE_NOT_PREDICATE COMMON4 COMMON1 KINDERGARTEN posdefault:NOUN posdefault:ADJECTIVE TIMEWORD )
cerocoma ( NOUN_ABSTRACT NOUN ADJECTIVE ADJECTIVE_NORMAL NOUN_SINGULAR ADJECTIVE_NOT_PREDICATE COMMON4 COMMON1 KINDERGARTEN posdefault:NOUN posdefault:ADJECTIVE TIMEWORD )
bastante ( ADVERB PREDETERMINER EXTENT_ADVERB COMMON4 COMMON1 KINDERGARTEN posdefault:ADVERB )
mogollón ( ADVERB PREDETERMINER EXTENT_ADVERB COMMON4 COMMON1 KINDERGARTEN posdefault:ADVERB )
mazo ( ADVERB PREDETERMINER EXTENT_ADVERB COMMON4 COMMON1 KINDERGARTEN posdefault:ADVERB )

but my childhood.top topic is not recognizing the word mañana by using the DICT/m.txt file, it only works (including the spell checking) if I create the concept at the top of the file, like this…
concept: ~spanishnouns NOUN ( mañana)
Why it isn’t working using the DICT/m.txt file alone?

PD: I’ve placed the word mañana in the keyword as well as in the pattern.

Thanks Advanced Bruce

 

 
  [ # 87 ]

sorry double post

 

 
  [ # 88 ]

First do a :prepare mañana and see if it survives or if it gets spell checked.  In my test I used cerecoma so I didn’t have to enter accented spanish characters from keyboard. Despite the fact that my dictionary only had your words, it got changed. This is because all of :build 0 ontology was there and I had a flaw in spellcheck. I have fixed the flaw in spell check and removed build0, and then when I entered cerecome, it correctly changes that to cerecoma. So spell check worked to fix wrong to right.  I’ll release a new version of CS today (but that fixes wrong to right).  You should email me your modified harry files and a :trace all log from your test. Then I can fully reproduce what happens to you. Otherwise I’m guessing.  You could also just send me a log from :trace all of your input and I can probably diagnose it.

 

 
  [ # 89 ]

Hi Bruce, thanks for you reply

I did “:prepare mañana” and this is what it showed…

>:prepare mañana
TokenControl: DO_SUBSTITUTE_SYSTEM DO_NUMBER_MERGE DO_PROPERNAME_MERGE DO_DATE_MERGE DO_SPELLCHECK DO_INTERJECTION_SPLITTING DO_PARSE


Original User Input: ma&eana;
Tokenized into: ma&eana;
Spelling changed into: ma±ana
Actual used input: ma±ana
Xref: 1-ma±ana
Fragments: 1:ma±ana
badparse Tagged POS 1 words: ma±ana <Adverb>
MainSentence: PRESENT


Concepts:

1: na±ana raw= +~adverb<1> +~Kindergarten<1> +~timeword<1> +~sentenceend<1>
+ma±ana<l> //
1:  na±ana canonical=  // +*utf8<1>

sequences =
After parse TokenFlags: SPELLCHECK PRESENT USERINPUT FAULTY_PARSE NOT_SENTENCE

>

PD: the & signs mean rare symbols that I couldn’t type or copypaste,
I will email my modified Harry files ASAP

Thanks for your support Bruce

 

 
  [ # 90 ]

Since original input was: maheana then I would expect incoming data was not utf8 spanish word you wanted.  It’s already too late.  Try putting your word in a file, saved as utf8, and then doing :source filename.

 

‹ First  < 4 5 6 7 8 >  Last ›
6 of 17
 
  login or register to react
‹‹ Web interaction      Why no direct_so query? ››