AI Zone Admin Forum Add your forum

NEWS: survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

AIML Generator


I have been reading about software that can generate AIML sets from different input corpora, such as dialogs.

Searching Google produces some extremely interesting scholarly articles on the subject. Some of these describe the process and outline the challenges in great detail and are a fantastic read.

Has anyone ever actually seen one in the wild though? I can’t find any example code anywhere. Is releasing code not something that academics do?

Maybe I’m not looking in the right places. If anyone knows of an example of a program to convert corpora to AIML, even if it doesn’t work too well, PLEASE let me know.

I’m moving towards building my own scripts to tackle this but I don’t want to reinvent any wheels. If anyone knows anywhere I might start I would be grateful.



  [ # 1 ]

Hi, Michael. This is an area that I’ve discussed before, but really haven’t pursued, I’m afraid. the main reason for this is the fact that I’ve lacked access to decent standardized format transcript documents that would make for useful conversations. One of the challenges one would face with creating a conversion script or program is that the documents to be converted would need to be in a format that could be consistently “read” by the conversion tool, and the various parts of the conversation identified, analyzed and parsed correctly. This is no mean feat, and sorting it all out can be difficult, at best.

That said, though, once those hurdles (and a few others, probably) have been overcome, the process of converting the parsed input to AIML is fairly straightforward, and could be accomplished via simply “filling in the blanks” using a template. As the thread I’ve linked to indicates, I’ve an AIML creation tool (written in PHP) that can be easily adapted to that particular use, and the code for that tool is freely available upon request. Even if you use a different programming language, the code is (I think, at least) easily converted to another language, so if you want to use it as a basis for something you’re working on, you’re welcome to it. smile

I’d be pleased to assist with this endeavor, if you’re interested.


  [ # 2 ]


It’s an area I’ve got a lot of interest in. To combat the variety of challenges involved in parsing different types of inputs, I envisioned a modular system. Think command-line framework tools that combine multiple tools and modules such as Recon-ng, BeEF or the various exploitation frameworks.

Maybe something like sets of general purpose parsers that would get used or ignored in a specific order to produce the kind of AIML you’re looking for from the specific text input at hand.

To solve your problem of standardizing input, it’s going to be another case of parsers, which would be a lot simpler than the AIML knowledge generator parsers. I have a lot of experience with web scraping, parsing and processing often badly structured HTML - I think I’ve got transferable skills.

I was thinking of starting with something like interview transcripts to produce an AIML set that produces answers like the interviewee might give. Basically try to mirror a personality.


  [ # 3 ]

Interview transcripts are, generally speaking, more easily and consistently parsed that other sources, such as short stories and/or novels, but to my way of thinking they’re also the least useful for generating useful categories for conversation, as they’re usually rather tightly focused on very specific topics, but it would certainly be a step in the right direction. Another potentially good source might be the minutes from various community-type meetings, such as those from city council meetings, non-criminal court proceedings (even the criminal ones may have some minimal value), etc. A great number of these are public record, so theoretically should be fairly easy to obtain. The big challenge here would be in maintaining the privacy of the people involved in such proceedings.

Given that “proper grammar” dictates that anything that a character says within the confines of works of literature, it may not be a bad idea to experiment with a “story parser” that first isolates matched pairs of quotation marks, then tries to match one up with other adjacent ones, to try to generate a transcript from the body of the work. There are, of course, challenges to overcome with this approach, such as conversations involving more than two individuals, soliloquies, quoted trains of thought where what’s “said” isn’t actually spoken, etc. but these can be overcome, I think, with a bit of real-time human intervention during the process(?) Something to consider, at least. wink


  [ # 4 ]

Yes, interviews would be no good for general conversation but for the purposes of emulating a specific personality, perhaps the perfect source. Perhaps a parser/generator configuration could be developed to use parts of the responses to generate general conversation AIML that is like a version of the generic general conversational stuff but with elements of the interviewees responses.

This page mentions some sources for data including 5.5 million movie subtitles.

As far as the big challenges go, I’m going to look into using third party APIs to help. Have you seen the txt analysis features available from Alchemy? I’m interested in learning the implementation but for now if I can delegate that to get the desired result, I’ll be happy.



  [ # 5 ]

That looks interesting, but I really don’t see anything that involves conversation extraction or detection. Of course, I just gave it a cursory inspection for now, as I’m supposed to be working, but I’ll try to give it a more in-depth perusal when I have more time, later this morning.

Still, as you mentioned earlier, “reinventing the wheel” is something to be avoided, whenever possible, but given that we’ve already established that there’s not much at all in this specific area, it’s more like “inventing” than “reinventing”, at this point. wink cheese


  [ # 6 ]

The Alchemy API is actually really promising for this use case. From one of the API intro pages:

AlchemyLanguage is a collection of APIs that offer text analysis through natural language processing. The AlchemyLanguage APIs can analyze text and help you to understand its sentiment, keywords, entities, high-level concepts and more

And phrases like this get me quite excited. In reference to concept extraction:

For example, if an article mentions CERN and the Higgs boson, the Concepts API functions will identify Large Hadron Collider as a concept even if that term is not mentioned explicitly in the page.

Lets take someone else’s wheels and make a new car :D



  login or register to react