AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

CLUES Chatbot Engine in C++
 
 
  [ # 16 ]

Glad to hear it Victor.

Here is a ranked RegExp “dna” benchmark for a variety of languages:

JavaScript V8 1
C GNU gcc 1.28
C++ GNU g++ 1.41
JavaScript TraceMonkey 1.52
Python PyPy 3.46
Python CPython 4.92
Python 3 4.94
Ruby 1.9 5.93
Java 6 -server 6.47
Scala 6.49
ATS 7.13
Ada 2005 GNAT 7.14
PHP 7.85
Ruby JRuby 8.43
Racket 9.19
Perl 9.23
Clojure 11.75
Ruby MRI 11.98
C# Mono 17.9
Python IronPython 18.75
Erlang HiPE 22.94
Java 6 -Xint 69.12
Smalltalk VisualWorks 69.95
Go 6g 8g 111.31

 

 
  [ # 17 ]

@Vic: actually strings are the coded representation of what we verbally wanting to say anyway. In return what we want to say, i.e. which words we want to generate with which intonation, is in return just a model of our intention which we want to express.

In other words: the sooner you convert strings into some coded meaning, the more it can match reality.

You’re in the right direction!

(I had some difficulties phrasing this posting, hopefully it’s clear)

 

 
  [ # 18 ]

Erwin - yes, I know exactly what you were trying to say.

Yes, the words themselves don’t play a central role in CLUES.  It is relationships that give meaning.

I can’t wait until I am completed the “V3” (V1 and V2 was CLUES in Perl, V3 is CLUES in C++).

Merlin - what do the numbers indicate, is Javascript V8 a ‘unit’  (the ‘best’.. that is, ‘fastest’) and Perl is 9.23 slower, or is Go 6g 8g the very quickest?

 

 
  [ # 19 ]

V1 & V2 : proof of concept, heavy on hash tables, strings, regexes, low on memory, heavy on CPU.

V3: no hash calcs, no strings (except stage 1, not stage 2 which is the most work), no regexes, much higher memory usage (but still nothing, 300 MB is NOTHING these days), and using all kinds of smart caching and space/time trade off to reduce CPU tremendously .

 

 
  [ # 20 ]

Victor,
Javascript V8 is the fastest and is the “unit”.

Perl is 9.23 times slower. Again, this is a very specific Regular Expression only test. But you get the idea that development of JavaScript because of “Browsers Wars” is happening much faster than we would have thought a few years ago, and some of the functions are being specifically optimized to try to gain a competitive edge.

 

 
  [ # 21 ]

By the way, V8 is actually written in C++.
V8 can run standalone, or can be embedded into any C++ application.

 

 
  [ # 22 ]

Ahhh… Browser Wars, of course… that makes sense ! Perl being that far behind, that is interesting.  I wonder how hash table lookup performance compares, Perl and others.  Well, I’m in the top 3, since I am using GNU C++ (g++).  But not using regexes anyway, but good to know.

 

 
  [ # 23 ]
Merlin - Dec 29, 2010:

Glad to hear it Victor.

Here is a ranked RegExp “dna” benchmark for a variety of languages:

JavaScript V8 1
C GNU gcc 1.28
...

impressive list. What do it exactly say? JavaScript is on top, apprently a winner, but a winner on what?

 

 
  [ # 24 ]

impressive list. What do it exactly say? JavaScript is on top, apprently a winner, but a winner on what?

This was a benchmark of a regular expression test, run on the same hardware but implemented in different computer languages. V8/JavaScript is the fastest on this test and has been assigned a value of 1. Everything else is slower, Perl is 9 times slower.

It is only testing regex performance (and a very specific benchmark, but one that has been used to test a number of browsers/languages). This benchmark may also be impacted by the quality of code written for the various language.

My original point was that it will depend more on how your AI is implemented than on the language. Victor has gone from; “proof of concept, heavy on hash tables, strings, regexes, low on memory, heavy on CPU” to “no hash calcs, no strings (except stage 1, not stage 2 which is the most work), no regexes, much higher memory usage”. Had he implemented both of these in either Perl or C++ I think he might have found the the second version was faster in either example.

It all depends on your goals and the approach you take. If your AI makes heavy use of regular expressions (like Skynet-AI) then regex performance is very important. In Victor’s second implementation he has eliminated regular expressions so it won’t matter to him.


For those interested in some of the other benchmark comparisons:

Computer Language Benchmark Game
http://shootout.alioth.debian.org/help.php#inputvalue

 

 
  [ # 25 ]

I agree that if I made the same changes with Perl, it would speed up tremendously also.  But C++, being a strictly typed language, where I am in full control, I think it was the best choice.

 

 
  [ # 26 ]

Thus, V3 is first version of CLUES in a strictly typed language, C++.

 

 
  [ # 27 ]

Thanks Merlin, now it’s clear to me.

 

 
  [ # 28 ]

Victor, I’m just starting my own Chatbot in C++ (I’ll be posting another thread in the “New to Chatbot programming” area).


Obviously you’re quite advanced in your knowledge of the field—the method you’re using seems to be the most sophisticated method I’ve heard of so far.

Would you happen to have any links you could point me towards to start reading up on methods like that? By no means do I expect to understand them anytime soon, but I would love to at least read a high level paper describing the method you’re using.


I look forward to hearing about the evolution of CLUES V3, it sounds very promising.

 

 
  [ # 29 ]

Hi Garrett

Yes, the philosophy of the version 3 is fundamentally the same as V1 & 2.  V1 & 2 were proof of concepts, and worked nicely.    The main reason for rewriting in C++ was purely for speed.  Now some of the optimizations I am using in the C++ version, if I had used in the Perl version, would have sped it up tremendously also, but I chose C++ because I wanted full control of dynamic memory and strongly typed variables.

Actually I am still using Perl; I have Perl scripts which dynamically generate C++ code.  The main reason is because the core parse tree generation deals ONLY with integers, no string manipulation, string concatenation, or regexes (all of which are rather CPU intensive).    Now, instead of writing a rule that consists of a series of integers (for example saying “IF 5, 8, 15, 21) THEN 29 , which could mean if 5=subject noun, 8 = verb, 15 = direct compliment, 21= prepositional phrase, then 29 (simple-sentence),  I want to just be able to use the texual descriptions.

SO, I wrote ppp.pl (perl pre-processor), it takes a parse tree generator which uses only textual names of parts of speech, and converts them to integers via a lookup table.  ppp.pl generates the C++ version of the parse tree generators (which now only contain integers instead of string representations of parts of speech).  Then I use the C++ compiler (I use GNU G++ on Linux), and thus the parse tree generators are converted to machine code, with all “part of speech” identifiers not strings , but integers.  We all know how fast microprocessors work with integers as opposed to strings.. thus it is an incredible speed up.

Now if I would have continued work with the algorithm in Perl, it would have still worked and I would probably be at the point right now where you could start teaching it things via natural language.  I decided to digress and rewrite for speed.  The other point was that, in addition to parse tree generation (which is very CPU intensive, especially if you do deep parsing), there is the whole reactor logic (when you have figured out which interpretation of the many parse trees you have generated is probably the correct one, what do you do? that is, once you know what the user meant by there string of input characters, what processing do you do).

As for documentation, I have to write it yet.  I will keep you posted!

 

 
  [ # 30 ]

Here is an example, textual representation of a CLUES parse tree

pos = simple-sentence
subject.num-noun = 1
subject.noun.1.val = he
num-predicate = 1
predicate.1.num-verb = 1
predicate.1.verb.1.val = going
predicate.1.verb.1.num-auxiliary-verb = 1
predicate.1.verb.1.auxiliary-verb.1.val = was
predicate.1.verb.1.num-prep-phrase = 1
predicate.1.verb.1.prep-phrase.1.num-prep = 1
predicate.1.verb.1.prep-phrase.1.prep.1.val = to
predicate.1.verb.1.prep-phrase.1.num-noun = 1
predicate.1.verb.1.prep-phrase.1.noun.1.val = north gower
predicate.1.verb.1.prep-phrase.1.kow = to-place-any

Now, internally though, the V3 deals only with integer representations of all that, here is the same tree but with more verbose output….

pos (as-num: 1,types: 0) = 3 ‘simple-sentence’ (type 1)
subject.num-noun (as-num: 2.5, types: 0.0) = 1 ‘1’ (type 2)
subject.noun.1.val (as-num: 2.3.1.4,types: 0.0.1.0) = 1000002 ‘he’ (type 0)
num-predicate (as-num: 9, types: 0) = 1 ‘1’ (type 2)
predicate.1.num-verb (as-num: 6.1.7, types: 0.1.0) = 1 ‘1’ (type 2)
predicate.1.verb.1.val (as-num: 6.1.8.1.4,types: 0.1.0.1.0) = 3000004 ‘going’ (type 0)
predicate.1.verb.1.num-auxiliary-verb (as-num: 6.1.8.1.19, types: 0.1.0.1.0) = 1 ‘1’ (type 2)
predicate.1.verb.1.auxiliary-verb.1.val (as-num: 6.1.8.1.20.1.4,types: 0.1.0.1.0.1.0) = 2000003 ‘was’ (type 0)
predicate.1.verb.1.num-prep-phrase (as-num: 6.1.8.1.27, types: 0.1.0.1.0) = 1 ‘1’ (type 2)
predicate.1.verb.1.prep-phrase.1.num-prep (as-num: 6.1.8.1.28.1.13, types: 0.1.0.1.0.1.0) = 1 ‘1’ (type 2)
predicate.1.verb.1.prep-phrase.1.prep.1.val (as-num: 6.1.8.1.28.1.14.1.4,types: 0.1.0.1.0.1.0.1.0) = 4000005 ‘to’ (type 0)
predicate.1.verb.1.prep-phrase.1.num-noun (as-num: 6.1.8.1.28.1.5, types: 0.1.0.1.0.1.0) = 1 ‘1’ (type 2)
predicate.1.verb.1.prep-phrase.1.noun.1.val (as-num: 6.1.8.1.28.1.3.1.4,types: 0.1.0.1.0.1.0.1.0) = 5000007 ‘north gower’ (type 0)
predicate.1.verb.1.prep-phrase.1.kow (as-num: 6.1.8.1.28.1.17,types: 0.1.0.1.0.1.0) = 10 ‘to-place-any’ (type 1)


So the stages of processing are:

(a) word information retrieval (find out parts of speech of each term) of user’s input.
(b) parse tree generation - all possibilities.
(c) common knowledge and inference to ‘judge’ each parse tree, pick the most likely one.
(d) reactor logic (or domain specific rules and facts) applied to selected tree.
(e) deduced (or found) response from (d), synthesize a response and display to screen (or later speech synthesis).

I’m currently busy on (a) through (c), hoping mid-summer or latest the fall, to be in (d) and (e).

Hoping for new years 2012 to start teaching it ‘about the world’ via NLP.

 

 < 1 2 3 > 
2 of 3
 
  login or register to react