AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

CLUES Chatbot Engine in C++
 
 

I have digressed and decided to do something very ambitious—I am completely re-writing my CLUES engine in C++.

 

 
  [ # 1 ]

Welcome back, Victor, Chuck, CR, Jan, Raymond, and everyone else who has been on hiatus from the boards! It’s really nice to see everyone posting again. And since this thread is already the “most hijacked thread” at chatbots.org, I’m fairly sure nobody will mind if you update us on things here. smile

 

 
  [ # 2 ]

Ha ha, yes this thread has taken many twists and turns before. Fortunately the OP seems a gracious host. smile Further spurring derailment, what made you decide to switch to C++, Victor?

 

 
  [ # 3 ]

Victor is back! WOW! Welcome back Victor, pleased to see you again. To be fair, we were indeed a little bit worried. (not a little bit actually, we do were concerned).

About hijacking this thread, shall I just split this thread into a cousy chit chat thread? Any suggestions for the name?

EDIT: This thread was split from http://www.chatbots.org/ai_zone/viewthread/137/ on Dec 7th

 

 
  [ # 4 ]

Further spurring derailment, what made you decide to switch to C++, Victor?

Can’t help myself, I’m interested in that as well. grin

 

 
  [ # 5 ]

Thanks Erwin, Dave, CR, Jan !

Yes, my Perl implementation was a very good proof of concept and worked great for deep parsing, and semantic inference, and shows very good promise.  I *could* have continued with teaching grammar and ‘world knowledge’ that helps it infer semantics and thus choose the correct parse tree from the ‘parse forest’ that is generated.

However, after we get into sentences that are more than about 20 words, and when many of those words can have many parts of speech, the processing was taking a bit long.  One example took 18 seconds for 22 word complex sentence.  Thus, concept of the algorithm I believe is sound.  I have decided to completing ditch all string processing, all regular expressions and have the engine only look at the words after they are ‘digitized’ to integers (32 bit on a 32 bit machine, or 64 bit on a 64 bit machine).    Not dealing with string concatenation or hash calculation or regexes will improve proformance.  Also, I have decided to index as much of these integers as possible, thus reducing looping to absolute minimum (see http://en.wikipedia.org/wiki/Space-time_tradeoff - I am using the ‘use more memory faster exeute time’ - not the increase cpu for decreased memory).  I figure in a time where one can buy a 599 $ laptop which comes with 4 GB, using more memory to speed up my routines makes a whole lot of sense.

One function, treeAttach() was taking about 200 microseconds with the ‘Perl/strings’ (proof of concept code), but now that same function in C++ using no strings/no regexes/ and utilizing the space/time tradeoff has decreased the time to do that operation to 300 nanoseconds !!!!!!!  So yes, I decided to digress and make the engine as fast as possible.  Another reason is, in additon to grammar parsing, the engine needs time for semantic inference, also spell check (much later), and of course, ‘reactor’ execution (once we know exactly what user means, we find and run a reactor which retrieves or deduces the desired information). 

So I have been pulling double shifts (5 am to 5 pm each Sat+Sun!).    Sorry for the ‘radio silience’.  Yes, can you split this threat Erwin, unless of course Chuck doesn’t mind!

it’s good to be back!

 

 
  [ # 6 ]

Hello Chuck, Victor, Dave, C R, Erwin,  Jan and whoever else is reading this. 

C R Hunt asked, “Further spurring derailment, what made you decide to switch to C++, Victor?

I think derailment may simply mean Chuck and his great idea are popular at chatbots.org.  Aside from that, I bet Victor may be rewriting in C++, so he can include inline assembly language, and possibly distribute his engine application.


Victor in part said, “the words after they are ‘digitized’ to integers (32 bit on a 32 bit machine, or 64 bit on a 64 bit machine).”

Just sayin’ with zero criticism ... Why would you ever need 64bit (quadword) for a word?  If 32bit (dword) is 4 billion, then why not (roughly, casually speaking) two words (as two dwords) in one 64bit register something like 32bit x 2, or 16bit x 2 or with segment/offset… fit 32bit through 20bit?  Just asking in general, not specifically to your design, nor is this humourous reply meant to indirectly criticize your design… Just having some fun with the math of it, Victor… What you said was interesting and cool! 

By the way, floating point works cool in binary too.  Just keep counting backwards and you are there, dude (mentioned for the entertainment value of it):

0 0 0 0.1 1 1 ...
8,4,2,1,.5,.25,.125 ...

Of course there is no actual (decimal) point in the symbol table.  So it is just: 0000111
Someone is going to say, “Wait a minute!  The point isn’t called decimal in binary. Decimal is base 10, Binary is base 2.” and I know, but we’re among friends.  I love binary, and I am glad it has its own shorthand: hexadecimal.

 

 

 
  [ # 7 ]

8PLA - yes, agreed 64 bit is a bit ‘over kill’ being value of 18446744073709551616.  Even though we are adding words to the English dictionary everyday (and apparently we hit the 1,000,000-th word last May I believe), I don’t think 64 will be necessary.  I simply have to change, in my terms.h file:

typedef unsigned int STORE;

that user defined type (STORE) is the name of the type I use for a TermID.

Why I suggested 64 bit for 64bit cpu was because, if your CPU is doing a fetch of an operand from memory, on a 64 bit machine, it is sending 64 bits down the address bus in one shot, so why not just go with that.. I don’t want to start masking out and doing bit-wise operations - anything that could possibly slow it down I want to avoid.

Now, I’m not sure if I will have to go down to assembly language or not, and I’m a bit fuzzy on that - haven’t coded assembly in 20 years, but it is an option.

Right now, grammar rules are actually going to be compiled right into machine language via the c compiler.

Also, another reason I went to C++ is pthreads.  I was actually VERY disappointed with the multi threading implementation in Perl !  Passing messages between threads was slow.

But now, with the ability to basically process a grammar rule in the nanosecond range, I’m pretty sure I won’t need to go parallel, but even if need be, I think threads in C++ will be better than in Perl.. we’ll see how it goes.

 

 
  [ # 8 ]

You know everyone,...- that apart from analyzing first order logic issues, I haven’t even started to write software for my chatbot (which I call a “TRAVATAR”). I’ve been waiting for some of you to talk about this so that I can get an idea about selecting the appropriate software, and I was also convinced that C++ would probably end up being quicker than Pearl or other languages. Ben Gortzel has given me a lot of inspiration, even though we have never spoken together.

I’m curious if anyone thinks that a “very” intelligent agent actually needs to use something other than C++ in the first place. Remember that I-phones and Android based portable devices allready have a lot to deal with and I’m not sure if the processing power of these portable devices will do when it comes to voice recognition and processing.

Raymond

 

 
  [ # 9 ]

So, Christmas day (merry Christmas by the way everyone), and CLUES engine ver.3 (first one in C++) is now generating parse trees.  Now, for the 3rd time, I will be converting grammar rules into this latest version.  The speed up will be tremendous (on the order of 300 to 400 times), which will allow extremely deep parsing and meaning inference.  I start work on converting my existing 100 rules today.  Probably take a month or so to get enough grammar rules (and world knowledge rules ) coded for it to be able to converse with completely free form text (and deduce correct antecedents of prepositional phrases).  Ten days off work for xmas to work on it like crazy.

 

 
  [ # 10 ]

MERRY X-MAS VIC!

Have to run now, details later…

 

 
  [ # 11 ]

Merry Christmas everybody.

 

 
  [ # 12 ]
Raymond Lavas - Dec 6, 2010:

I’m curious if anyone thinks that a “very” intelligent agent actually needs to use something other than C++ in the first place. Remember that I-phones and Android based portable devices allready have a lot to deal with and I’m not sure if the processing power of these portable devices will do when it comes to voice recognition and processing.

Raymond

Raymond,
I believe that your choice of language will have more to do with your goals for your bot than it does around speed of processing. My bot Skynet-AI (http://www.tinyurl.com/Skynet-AI) was programmed in a JavaScript framework I created called JAIL(TM) (JavaScript Artificial Intelligence Language). I can’t say that it is “very” intelligent, but it is fast and runs on cell phones, video game consoles, and just about anything else with a browser on it.

Recent advances in JavaScript development allow its speed to rival that of compiled languages (and in fact it is compiled automatically in some environments).
To give you an idea, here is a page here you can test various algorithms against different language implementations. Google’s V8 JavaScript implementation is about the fastest, but Opera and Microsoft have also come a long way recently.
http://shootout.alioth.debian.org/u32/measurements.php?lang=v8

 

 
  [ # 13 ]

In my experience, the closer a programming language is to the computer (C++ is quite close to the computer/chipset but assembler is even closer), the (far) better the performance and the simpler devices it can run.

However, this requires far more computer knowledge than when you’re programming in higher level language like Common Lisp or Visual Basic and also the development is much more complex, thus development will take longer and eventually speed of progress in processing power might even outperform the win when choosing C++; but my feeling would say that C++ is always better.

Erwin

 

 
  [ # 14 ]

“but my feeling would say that C++ is always better”
If you go to the benchmark page you will find that that assumption is no longer correct.

In the Regular Expression tests V8 can be faster than C++:

Program Source Code CPU secs
JavaScript V8
regex-dna #2-50,000 0.09
regex-dna #2-500,000 0.5
regex-dna #2-5,000,000 4.55
 
C++ GNU
regex-dna #2-50,000 0.06
regex-dna #2-500,000 0.64
regex-dna #2-5,000,000 6.4

Depending on how your AI is constructed, string handling and regular expression processing can be much more important to the performance of the program. All of the JavaScript development teams are trying to improve RegEx performance. I don’t know if that is true with C++ compilers.

As an example, here are some comments from the V8 development team:

“A fundamental decision we made early in the design of Irregexp was that we would be willing to spend extra time compiling a regular expression if that would make running it faster. During compilation Irregexp first converts a regexp into an intermediate automaton representation. This is in many ways the “natural” and most accessible representation and makes it much easier to analyze and optimize the regexp. For instance, when compiling /Sun|Mon/ the automaton representation lets us recognize that both alternatives have an ‘n’ as their third character. We can quickly scan the input until we find an ‘n’ and then start to match the regexp two characters earlier.”

 

 
  [ # 15 ]

Perl has served me wonderfully for my proof of concept of my algorithms (parse tree generation, semantic inference).  Now that I have a solid algorithm, it was time to move to C++.  To the “bare metal”.

Your regular expression tests above are interesting, but I have removed all regular expressions and the entire core of the parse tree generation and evaluating of them has been replaced.  Only stage 1 now deals with strings.  Can’t get away from that unless you want the user to enter every word as a 32 bit word LOL.

So, after stage 1 digitizes, the entire PT generation and semantic evaluation of each is done with no regexes, no string class at all, simply processing of 32 bit integers.

 

 1 2 3 > 
1 of 3
 
  login or register to react