AI Zone Admin Forum Add your forum

NEWS: survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Input Storage

In what way do format user input to learn from? Do you let your bot learn from input? I have been playing around with various formats for storing input, and so far have settled on a log file style format:

(sessionid), [timestamp], <speaker>, message 

Right now, these log files are acting just as logs - nothing interesting. They keep track of user and bot exchanges. I have been thinking about building a small analyzer that will go through the log and parse out all non-bot messages, and apply various filters to them for learning. This would allow me to have logs of each interaction as well as a way to rebuild a corpus based on user input, find common questions, etc.

Does anyone else do something like this?


  [ # 1 ]

I get my bot to email me with things that people have tried to teach her. That way, I can decide whether to manually include the knowledge in her database. At the moment, she will learn whatever facts are taught to her for the current user only. I tried making the learning an automatic thing to apply to all users but found she was learning rubbish and it took me longer to clear all the dross out than just to add the genuine facts.

I turn the automatic learning on for a day and she learned 1,500 new things of which only 3 were any use.

Most of it was rubbish like:

Human: John is fat
Mitsuku: Ok, I will remember that.
Human: Who is fat?
Mitsuku: John.
Human: lololol

Be wary of making a bot that learns automatically unless you have a trusted client base.


  [ # 2 ]

Input-Output logs can be extremely useful in a number of ways. 

[unknown word error] 

In this most simple example, the log file can be searched for “[unknown word error]” and all input not recognized can be examined and the appropriate action taken (in this case, maybe a spell check to convert “hy”->“hi”).

Once the log has a sufficient number of volleys, the log becomes invaluable for determining the frequency, and common variations, of specific input and for further automated analysis/learning.

recording meta data is really helpful too, depending on the sophistication of your bot you may also want to log:

-user id
-user ip
-user name (if they can tell your bot their name)
-state variables (session id; AIML type “topic”, “it”, “that”; POS “who/what/where/when”; bot “emotional” state; etc)
-bot version (since responses will vary with different versions as you modify the bot)
-any other meta data that shapes the bot response

A real example, we have trouble responding correctly to input “Some people *”, where “*” can be anything

A sampling of inputs in our db that match “Some people *”:

[ul]some people call themselves cracks[/ul]
[ul]Some people have had bad experiences.[/ul]
[ul]some people say many things, but only few speak the truth[/ul]
[ul]Some people on this site are bored, some are testing your logic.[/ul]
[ul]some people hate clowns[/ul]
[ul]some people are really weird[/ul]
[ul]some people want to become like them,others get along better with those types of people[/ul]
[ul]Some people like being slaves[/ul]
[ul]some people like darkness[/ul]
[ul]some people want to go to mars[/ul]
[ul]Some people find bellybuttons sexy.[/ul]
[ul]Some people are old.[/ul]
[ul]Some people make a living from playing chess.[/ul]
[ul]Some people have great breasts and some do not.[/ul]
[ul]Some people manage it.[/ul]
[ul]some people like it and some don’t, if you do not like it then do not do it you buffoon[/ul]
[ul]Some people think love is comfort, other people think love is sex[/ul]
[ul]Some people can do it.[/ul]
[ul]some people envy rich girls[/ul]
[ul]Some people say.[/ul]
[ul]Some people fool themselves into believing that they are something that they’re not.[/ul]

You can now try variations of parsing/responding to “some people *”, preferably automated.  You could also use a corpus approach to get more context using fragments beginning with/containing “Some people”.





  [ # 3 ]
Steve Worswick - Aug 5, 2014:

I get my bot to email me with things that people have tried to teach her.

Steve Worswick - Aug 5, 2014:

learn whatever facts are taught to her for the current user only.

Both key points about “automated” (by direct user input) learning!

Also can be useful to have the bot email (sms, whatever) you for instances of errors or unknown inputs.


  [ # 4 ]

IP address is key if you use a web bot and consider banning a user.

Learning in my bot is also only for a single session (although I believe I will turn it on for a single user).
I also believe there is too much junk to let learning happen automatically on a global basis.

I archive and mine the logs on a regular basis. You need to be able to replicate the response from a given input.

Another useful tool is the ability to have the bot talk to itself.


  [ # 5 ]

Ah yes, how could I forget something as important as IP addresses!

I have a whitelist of users / host masks / addresses that I am allowing to be “trusted sources” for learning from to try and mitigate some of the junk input.

Do you manually input items for training? I am experimenting with ways to try and automate some of this stuff, since right now I do it all manually.

Your bots “learn” in a per-session environment, so that it quickly tailors to the person they’re talking to, but doesn’t actually insert new stuff into the database, due to the fact that a lot of chat is garbage. That is really sensible, and after I finish implementing this, I think I will have to not clean the data so often. (Who would have thought a bot could bring out the worst-mindedness in people…)

Hmm, this has my brain formulating new ideas! Thank you. Having my bot only have a select audience for a good amount of its time online, it is really easy to overlook these (now obvious) steps.


  [ # 6 ]

I dump it all as rows in a mysql database, with a timestamp and a ‘conversation id’ as well as a brief summary of what my bot made of interpreting it. It makes it possible to show the entire conversation pretty nicely.

All transcripts are open to read, and with that storage I can format it nicely:


  login or register to react