AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Unicode in RiveScript
 
 

Are there any Perl RiveScript users out there?

I added a bit of experimental UTF-8 support to RiveScript-Perl last week. You can enable it by passing the option “utf8 => 1” to the constructor, or use the “-u” option with the “rivescript” script that comes with the Perl library.

What does this change allow for?

[ul]
[li]Your triggers can contain foreign characters, like accented letters or Japanese symbols or whatever.[/li]
[li]Relaxes the filtering on user messages, so the user can say words with foreign characters (or even characters like @ and ., like for an e-mail address), and match triggers and the <star> tag will include these special characters in their message as well).[/li]
[/ul]

More specifically:

[ul]
[li]Triggers can contain literally any character except for uppercase letters, backslashes, and periods (because periods are special characters in regular expressions).[/li]
[li]User messages can contain any character except for backslashes, and HTML tag characters < and > (to protect careless RiveScript developers from extremely obvious cross site scripting attacks).[/li]
[/ul]

Note that Unicode support is optional and must be explicitly enabled (and this will probably continue to be the default, even after the UTF-8 support is no longer “experimental”). Without UTF-8 mode enabled, all the same restrictions you know and love are in place (no foreign characters allowed in triggers, input messages stripped of all characters except letters and numbers).

I call this support “experimental” at this time because I haven’t fully made sure I’m not opening the code to any kind of security hole by allowing a much wider range of characters to be allowed, especially when the regular expressions come in to play. There might be more characters in the triggers that should be restricted for example. Perl shouldn’t be able to execute any code from them without the /x switch on the regexp, and even with /x, Perl won’t execute code from an interpolated variable… but ya can’t be too careful. smile

If anyone wants to play around with this you can get the latest code from github: https://github.com/kirsle/rivescript-perl

This feature will eventually come to the Python version, then JavaScript (maybe—JS isn’t known for its outstanding Unicode support), then Java if I’m feeling up for it. smile

 

 
  [ # 1 ]

I am not perl user. But it very useful option for make a bot for my language

I have already used some trick to make a Vietnamese QA system by using JS Interpreter. I hope you will improve for each interperters. Python, JS, Java, C#.

Fisrt, I have a question. Could i get a part of trigger with an email address directly (not have a space and not run substitution or work outside of RiveScript) like this ?

my email is *@*.*
Your username is <star1

  Your triggers can contain foreign characters, like accented letters or Japanese symbols or whatever.
  Relaxes the filtering on user messages, so the user can say words with foreign characters (or even characters like @ and ., like for an e-mail address), and match triggers and the <star> tag will include these special characters in their message as well).

Hope you give a complete example of this case ? And will we have a new RiveScript version ?

Thanks!

 

 
  [ # 2 ]

With the UTF-8 update, this actually works:

my email address is *
- <
set email=<star>>I'll remember your e-mail address.

+ what is my email
- Your email address is <get email>. 

You> my email address is .(JavaScript must be enabled to view this email address)
Bot> I’ll remember your e-mail address.

You> what is my email?
Bot> Your email address is .(JavaScript must be enabled to view this email address).

 

 
  [ # 3 ]

Easy to get an email smile

 

 
  [ # 4 ]

If there are any Python RiveScript users here, I’d like some help in testing Unicode in the Python library. I have a branch on github for it here: https://github.com/kirsle/rivescript-python/tree/unicode

It includes a unicode.rs file in the root folder that has a few simple tests I wrote, and you can test it by doing:

python rivescript.py -./ 

in the root folder. The -u (or—utf8) option is necessary to “activate” the UTF-8 support, which is not on by default. In my light testing it seemed to work out fine, there’s a bug report on github for this where the original bug author has problems with it (https://github.com/kirsle/rivescript-python/pull/2) so I’d like a wider audience to test and see what they can get to work (or not).

UTF-8 in JavaScript will be the next target, but that one may be tricky for a whole other set of reasons (i.e. requiring the web server to be configured to send the *.rs files in UTF-8 mode, etc.)

 

 
  login or register to react