AI Zone Admin Forum Add your forum
Programmatic access to RiveScript’s guts
 
 

Hey

I’ve added a couple new features to the Perl RiveScript library today that will allow developers to get convenient access to the internal data structures of the RiveScript object.

There’s a new deparse() method, which returns a big ol’ Perl data structure that represents the entire RiveScript brain as loaded into the object. With this it’s easy to loop over the topics and triggers and config variables and etc. that was loaded in from the RiveScript documents.

There’s also a write() method, which will use deparse() and write RiveScript code to disk using all the data it has in memory.

What’s the application for this? Well, you can use it to create a user interface for manipulating your RiveScript documents without needing to touch the RiveScript code. The module could load an RS file, deparse it to give your program really clean and easy access to the data represented by that RS file, make some changes to it, and write the changes back to disk.

I’m slowly working on creating a Pandorabots-like service for RiveScript bots, and this was a necessary step along the way to that. If there’s enough interest in this feature, I may port it over to the Python library at some point as well.

This feature comes with perl-RiveScript 1.24, which should be on CPAN in a few hours, or can be downloaded from http://www.rivescript.com/interpreters#perl (there are RPMs available for Fedora and EL6) or https://github.com/kirsle/rivescript-perl

 

 
  [ # 1 ]

> https://code.google.com/p/rivescript-cpp/

Noah, is there any RiveScript Interpreter for C++ ?

 

 
  [ # 2 ]

I was working on one (that one there), got stuck when I needed regular expressions (couldn’t figure how to link boost with it), stalled and haven’t touched the code since. smile C++ isn’t my strong suit obviously.

In the mean time though, that’s why I wrote the `rivescript` script for the Perl version and made rivescript.py executable in itself, so that a third party program (written in C++ maybe) could open a pipe to one of those scripts and communicate back and forth using JSON… so, C++ bots don’t have to miss out on RiveScript just because there isn’t a rivescript.cpp yet. wink

 

 
  [ # 3 ]

Here’s some C++ code I scraped together from googling things… it opens the Perl `rivescript` program using a pipe to read/write from it… apparently doing this is harder than I thought it would be. smile

#include <sys/types.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string>
#include <iostream>
#include <ostream>

#define READ 0
#define WRITE 1

// popen2 adapted from http://dzone.com/snippets/simple-popen2-implementation
pid_t
popen2
(const char *commandint *infpint *outfp)
{
    int p_stdin[2]
p_stdout[2];
    
pid_t pid;

    if (
pipe(p_stdin) != || pipe(p_stdout) != 0{
        
return -1;
    
}

    pid 
fork();

    if (
pid 0{
        
return pid;
    
}
    
else if (pid == 0{
        close
(p_stdin[WRITE]);
        
dup2(p_stdin[READ]READ);
        
close(p_stdout[READ]);
        
dup2(p_stdout[WRITE]WRITE);

        
execl("/bin/sh""sh""-c"commandNULL);
        
perror("execl");
        exit(
1);
    
}

    
if (infp == NULL{
        close
(p_stdin[WRITE]);
    
}
    
else {
        
*infp p_stdin[WRITE];
    
}

    
if (outfp == NULL{
        close
(p_stdout[READ]);
    
}
    
else {
        
*outfp p_stdout[READ];
    
}

    
return pid;
}

int main 
(int argcchar **argv{
    int infp
outfp;
    
char buf[1024];

    if (
popen2("./rivescript --json", &infp;, &outfp;) <= 0{
        printf
("Unable to exec rivescript\\n");
        exit(
1);
    
}

    
while (true{
        std
::string input;
        
std::cout << "You> ";
        
std::cin >> input;

        
std::string outgoing "{\\"username\\":\\"cpp\\", \\"message\\":\\"" + input + "\\"}\\n";
        
write(infpoutgoing.c_str(), outgoing.length());
        
write(infp"__END__\\n"8);
        
close(infp); // TODO: can't we just flush the stream and not close it? :/

        
*buf '\\0';
        
read(outfpbuf1024);
        
printf("Got: %s"buf);
    
}

    
return 0;

If you run it, it prompts for a message and sends it to RiveScript and prints the JSON response back:

g++ -o popen popen.cpp 
$ ./popen 
You
hi
Got
{
   
"reply" "How do you do. Please state your problem.",
   
"status" "ok",
   
"vars" {
      
"topic" "random",
      
"__lastmatch__" "(hello|hi|hey|howdy|hola|hai|yo) [*]"
   
}
}
__END__
You
> ^

Unfortunately that’s where it stops. The only way I can figure out to get the response from the command is to close the input filehandle (see the “TODO” comment in the code). I was trying to find a workaround (flush the filehandle maybe) but to no avail.

Of course, you could just open a pipe, print one JSON input, close it and get the response back, but I’d think having a stateful pipe would be better (and faster, too, since the RS code doesn’t have to be parsed on every single request). A C++ JSON library could also be used to encode/decode the JSON data instead of just dealing with it as raw strings.

Anyway, getting a proper C++ library for RiveScript isn’t super high on my list of priorities, but I wouldn’t mind if somebody else wants to go ahead and write one. smile

 

 
  [ # 4 ]

CAUTION: This experiment does not suggest nor imply anything about RiveScript.  As a new chatter robot experiment it is not optimized, so it is most likely the cause of its own performance issues and therefore is not meant to reflect poorly on RiveScript Python.  It is just an experiment.

RiveScript Python Experiment Description:

Here is an AJAX interface I am experimenting with RiveScript Python. So far in the experiment, AJAX provides constant feedback during the response time, which is quite slow under Python CGI right now, but does respond eventually. On the other hand, the response content benefits from the latest RiveScript Converter. The interface logs the latest entries in reverse chronological order, like a social network.

An easy way to optimize response time right now would probably be to simply switch from Python CGI to Perl CGI which is easy to do.  However, there are plenty of Python optimizations to try out first in this experiment, which makes it good for learning.  Especially, since Python supports plenty of awesome A.I. development that is very interesting such as the Natural Language Toolkit.

http://8-i.us/rivescript-python

 

 
  [ # 5 ]

Thanks for the report, I did some digging into it with Alice’s brain… the Python lib takes a good 24 seconds or so to load Alice’s brain from disk (as opposed to 8 for the Perl version). Furthermore, I found one somewhat-annoying quirk in Alice’s brain and a way to crash the Python library.

Alice (the RS version at least) has these triggers:

// salutations.rs
hello
{@hi}
Hi there!

// reduction2.safe.rs
hi
hello 

It’s bad enough that the Python module can take a good 18 seconds sometimes to fetch a reply, but these two triggers pointing at each other can make that last much longer.

It took your Python CGI about 60 seconds to get an answer to “who is dr wallace”... it said “processing” for 42 seconds and then took the rest of the time to give me the response. If it takes about 24 seconds to load the RS from disk and then 18 to get the reply, that gets you 42, so it looks like your Python CGI is needing to load the whole reply set for each request. I’d suggest looking into using FastCGI or something to help out with that.

As for how I crashed the Python library… I found a bug in the code, where I did random.choose instead of random.choice (the first one doesn’t work).

I have some good news though! I’ve made a few improvements to the Python lib to speed things up a lot.

Python’s regular expression engine is slow compared to Perl’s, so when it goes through and matches possible triggers to your message, it will take a shortcut now: if the trigger it’s looking at is atomic (no wildcards, optionals, alternations, etc—just a verbatim string, like “who is richard wallace”)... it will just do a quick == comparison with your message. It will only use the regexp engine for the triggers that need it. So, I was able to get replies to questions like “who is dr wallace” in 3 seconds, “who are you?” in 5 seconds, “my name is Kirsle” in 8 seconds… all of which is much better than the 18 seconds I was getting before.

I’ll have to look later if anything can be done to improve the RS parsing time, 24 seconds to parse is unacceptable considering Perl can do it in 8.

The new rivescript-python fixes have been uploaded to github: https://github.com/kirsle/rivescript-python/commit/708e77554d5553b20570f91d4b67c9aaa4128eba

I’ve also fixed Alice’s AIML in the aiml2rs project on github so that “hello -> hi -> hello -> hi -> ...” nonsense is taken care of, and fixed a bug in the conversion script that was resulting in “blank” lines getting added to the RS files (i.e. having an “@” command but no data.. the Python lib cares more about this than the Perl one and gives out warnings when it sees that, so I’ve fixed most of them).

(edit: there was another git commit since that one, that fixes the split(” “) when the {random} is separated with “|”)

 

 
  [ # 6 ]

CAUTION:
This experiment is not meant to suggest nor imply anything about RiveScript or AIML.


Experiment UPDATE:

Between the latest RiveScript Python lib, and the latest “Better AIML-to-RiveScript Converter” that optimizes RS data to fix needless recursion…. The response time in this experiment has now improved to almost twice as fast as before.**  Stability has also improved in terms of its brain running a super clean dataset, which may qualify this experiment as running one of the most optimized, intelligent chatter robots on the web:

http://8-i.us/rivescript-python.

** Python response time optimization remains in research.

 

 
  [ # 7 ]

Is the python rs response time still being optimised? I’m trying to think of how I could improve it myself, but have had no luck.  Noah’s optimisation is great for the atomic responses, but what about the regex searches?

 

 
  [ # 8 ]

I haven’t done much work on the Python lib since optimizing the atomic replies.

Python’s regular expression engine is pretty slow. One possible way to optimize it a little bit would be to pre-compile a majority of the regular expressions (generate them ahead of time, before any replies are fetched). The Python lib already pre-compiled some commonly used regexes like “splitting at space characters”, this could be extended to the triggers too.

This won’t work for *all* triggers, though. For example, any trigger that uses a <bot>, <input> or <reply> tag, because these insert dynamic values that are able to be changed. For all the other triggers should be able to be precompiled.

 

 
  [ # 9 ]

Thanks for the tip! That seems to have cut down the time by quite a bit… I’m going to need more ammo though.  Looked into pre-compiling regex’s, didn’t realise that would save that much time.

Would distributing the work between threads help? During the match searching that is. Then joining the result.

 

 
  [ # 10 ]

I’m not sure threading will be useful here. RiveScript creates a sort buffer where it arranges all the triggers in a way that they’ll find the best matches for the message (from the most specific down to the least). With threading, you’d probably divide up the sort list and have each thread look through a different part of it, but I don’t think that would be useful.

For example if you had the following triggers:

  • who is linus torvalds
  • who is *
  • who *

The internal sort buffer would order those triggers in the way they’re listed here (from the most specific, “who is linus torvalds” to the least specific, “who *”). The best match is the first one. If you had several thousand triggers, and a dozen threads looping over different sections, one of the threads would find the best match (the first one) but the other threads might find matches in those other triggers. It may be a tricky problem to figure out. smile

For a while I was considering Program V’s approach to organizing its AIML brain, where it would take the first word of each trigger and use that as a hash key, and then have a sorted list of the remainders of the triggers under it, like:

$brain {
   
"WHO" => [
      
"IS LINUS TORVALDS",
      
"ARE YOU",
      
"IS *",
      
"*",
   
],

I’m not sure this will be the best way to go either, because it would radically change RiveScript’s method of reply matching: given an existing RiveScript brain, it would match different triggers for certain inputs than it did before. For example, a trigger that begins with a wildcard might be a better match for what the user said, but because the user’s first word happened to also be the first word in some other triggers, one of those triggers might erroneously be chosen to match their message.

Example:

+ * or something
- Or something. <@>

are you *
I dont know if I am <star>. 

Using Program V’s style, there would be this:

$brain {
   
"ARE" => [
      
"YOU *",
   
],
   
"*" => [
      
"OR SOMETHING",
   
]

For the user’s input of, “are you a computer or something”, Program V would see that your message begins with “are” and look through the “ARE => YOU *” and choose that as a match instead. The way RiveScript currently works, though, “* or something” would be the matching trigger because it has the same number of words as “are you *”, but it’s longer. smile

Anyway, if you find ways to speed up RiveScript.py, I’ll accept patches so that I can get those improvements back into the main git repo.

 

 
  [ # 11 ]

I agree, the “first word key” method doesn’t work for the pattern matching for the reasons you gave.  I’m sticking to my compiled regex’s for the non-atomic ones.

I have however tried implementing the “first word key” method on the atomic ones.  That hasn’t saved me much, but at least it’s not going to try and match all the atomic triggers, only those which start with the same word (given atomic has more priority than pattern matched triggers according to your draft). 

Now I’m just trying to be smarter with my triggers and recursion.  Trying to use more atomic phrases and saving the star variables.  It’s not as clean, but it’s faster!

 

 
 
  login or register to react