AI Zone Admin Forum Add your forum

NEWS: survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

creating a new dictionary

I plan, in the long run, to add a French dictionary. I expect quite a lot of work, but top level linguistic resources are available for French.

At the moment I’m just having a look at it from a very technical point and not linguistic of view.

I created a French dict (DICT\FRENCH). It’s an ambitious 3 words dictionary for the moment:

immobilier (NOUN)
fiscal (ADJECTIVE)
tutu (NOUN

When I run chatscript with language=FRENCH command line param:
log says:
  read 3 raw words
  binary dictionary 50008 written
and dict.bin & facts.bin files are created in DICT\FRENCH
which seems quite good.

However, when I use words that are defined in this dictionnary, here in a rejoinder, I still get:
  *** Warning- line ... immobilier is not a known word. Is it misspelled?
which was exactly was I excepted to avoid.

Where am I wrong?


  [ # 1 ]

Sorry for not being able to answer your question and to exhume an old post :o(

But I wanted to say that what you have undertaken is fantastic! Is your work integrated in recent versions of ChatScript? There is still work ... If you need help with some simple but tedious tasks, I’m here.

I work on a French translation of the ChatScript manuals on my side. It seems to me that it has not been done yet ... am I wrong?


  [ # 2 ]

Note that current CS comes with a french Dictionary already, and supports treetagger for pos tagging and french livedata folder for substitutions, date and currency info.  (treetagger requires a commercial license)


  [ # 3 ]

French support is quite good in ChatScript now. There’s a French dictionnary which works well in the release.
We have developped the support for numbers which is also in the open source release.
French POS Tagger is not integrated (proprietary licence) but is not mandatory - it depends on the usecase.

Do you really want to traduce the whole manual in French? The quality of the manual in English is very good; moreover the traduction + maintenance effort would be quite important.

I’d rather start with creating just a specific tutorial in French (I think that the current ChatScript tutorial in the manual is not that good).


  [ # 4 ]

Hi all!

I contributed a bit to the github wiki documentation updates:, translating old PDFs in markdowns for better web browsing usability/readibility (phase I).

I think that just maintaining and getting better the above linked English language documentation could be more helpful/universal that translating in different languages, also because CS is an alive project and docs change frequently!

In facts “tutorial” doc originally by Erel Segal (, immo is very useful, even if contains few minor code errors and probably I agree it could be updated a bit.

Maybe the language localization (using CS for not-English languages) maybe could be explained better (e.g. showing with details the free/commercial licence treetagger features).

In my mind I thought about a possible small re-organization of documentation (phase II), to be discussed with Bruce, and I’m available to share with all contributors our gists (btw, my mail: .(JavaScript must be enabled to view this email address))

I noticed that Not-English dictionaries (French, Italian) contains somewhere tagging errors (I tell especially about the Italian dictionary) and native languages contribution updates to github .txt files could be very helpful (that means possible sub-projects).
BTW, which is the font of Italian dictionary you KINDLY updated on github? TreeTagger database?



  [ # 5 ]

Non english dictionaries come from treetagger. Labelling of the words in english pos tags is a translation of the nominal pos tags in native tongue. eg for dutch the tag file at top level of dict says:
det__demo # attributively used demonstrative pronoun
det__indef # attributively used indefinite pronoun
det__poss # attributively used possessive pronoun
det__quest # attributively used question pronoun
det__rel # attributively used relative pronoun

And if the ENGLISH is wrong, then it could be changed and I could rebuild the dictionary. Of course not always is there a translation across.


  login or register to react