AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Chatbots, AI, wildcards, random phrases and Pattern Matching
 
 
  [ # 46 ]

Interesting notion, Victor! That’s probably a better approach than what I was suggesting, I think. cheese

 

 
  [ # 47 ]
Dave Morton - Dec 16, 2011:

Interesting notion, Victor! That’s probably a better approach than what I was suggesting, I think. cheese

Pretty much the same idea.  At first, I’m sure like others, I initially thought ‘the hell with case’.  But, I decided to treat the original value of all terms as simply another property (their ‘*.orig.val), similiar to their part of speech properties (pos).

 

 

 
  [ # 48 ]

Victor and Dave,

I did not say that the original user input is not preserved. The query string is transformed into a variable array and the original query string is used for other processes including the chat log which is always available to the script. This is important since the session memory is based on these archived query arrays and is called upon as needed.

One reason I decided upon extensive client-side processing and the JavaScript run-time is to have the static variable memory available at all times during the user session enhancing the processing speed, functionality and adding to the user experience.

A few years back when client memory was a bit more limited and the current browser state of the art would not support the use of multiple nested variable arrays without crashing the script, I would have never dreamed of this approach

As Bob Dylan would say “The times they are a-changin” cool smile

 

 
  [ # 49 ]
Andrew Smith - Dec 15, 2011:

When all you’ve got is a hammer, everything looks like a nail.

Sometimes, when you are trying to drive in a nail, it is better to use a hammer than an oil well.

I duplicated your parser and I have enclosed the parser in an HTML page attached.

I did it using 12 regular expressions.gulp
Technically it is only 1 line of JavaScript that does all the magic, but I formatted it into a couple lines to make it easier to follow. You might be able to use it to explore parser techniques.

What threw me a little was your parser’s strange handling of sentences and quotes (a sentence is not really a sentence and you required funny quote handling).

I also took the opportunity to add a couple of features. Your parser doesn’t handle quoted words or parenthetical statements, so I threw those in for fun. wink

I only tested this in opera but the page should work in any browser. Also, the best web browsers compile regexs. After you run it once the next time it will be lightning fast (and it is still pretty fast the first time).

Even if you don’t know Javascript people should be able to understand and modify the code to try their own parsers.

Now this parser uses only the simplest of regex features. It is no where near the biggest regex I’ve built (~50k) and doesn’t use the most advanced features or have the support of my full JAIL framework, but I hope that it does show that Regular Expressions are a very capable technology.

File Attachments
Parse.zip  (File Size: 10KB - Downloads: 97)
 

 
  [ # 50 ]

I forgot to mention, you can paste your own text into the text area to try different passages. Also included in the file is the output of one of my other regexish tools, it does a word frequency analysis on the text.

 

 
  [ # 51 ]

Sorry to disappoint you Merlin, but your first attempt is wrong. Even if you don’t understand why the output is the way it is (there is in fact a very good reason for every detail being the way it is), at the very least I would have expected you to be able to reproduce the behaviour of my sample parser, character for character, and to be doing otherwise, whether deliberately or accidentally, is just plain sloppy. That shouldn’t be too hard for you should it?

I’m also disappointed that you picked such a trivial example for your first attempt. It is after all, a solved problem now, even if you don’t understand all the subtleties of it.

Like everybody else, I’m here in the hope of learning something, and I’d view your efforts to demonstrate your extreme cleverness more favorably if you were able to point out what is wrong with the wikitext parsing program that I wrote, and where it could be improved.

 

 
  [ # 52 ]

Maybe next week when I have more time I’ll do the wiki example. But then again, that is also a solved problem so it may not be worth the effort. My goal is not to point out that any of your programs have something wrong (I believe that there are many approaches to a correct solution) only that regular expressions should be another tool in your solution arsenal.

The only thing I am challenging is your assertion:
“In a nutshell, you can divide all languages (i.e. patterns of interest) into four broad categories according to how difficult it is to parse (i.e. recognise) them. The simplest are called “regular expressions” and these have a precise mathematical definition. “
People should not confuse “regular languages” with “regular expressions”. There has been a lot of development fueled by the web on the latter. What I had hoped people would learn is that regular expressions have overcome many of the limitations that the technology had in the past.

Even though I may not have duplicated the example character for character (extra features, formatting for web output, probably missed something, etc.), I don’t think you are saying that regular expression could not have duplicated the output (just a question of implementation). You would have to agree that the JavaScript does a good job parsing the page in a trivial number of regular expressions.

 

 
  [ # 53 ]
Laura Patterson - Dec 16, 2011:

One reason I decided upon extensive client-side processing and the JavaScript run-time is to have the static variable memory available at all times during the user session enhancing the processing speed, functionality and adding to the user experience.

That will help ‘big time’ smile  So it is kind of an Ajax type app?  To put more processing on client side was a great idea, your users will appreciate it.  Good thinking!  I hate when I visit a website and every little thing I do causes (like changing one drop down box and it re-populates another dropdown, -but- has to go all the way to the server and rebuild the entire GUI of the app, just to repopulate another dropdown.  arrrrgghh!!

 

 
  [ # 54 ]

Victor,

You hit the nail on the head! smile It only makes sense that all the processing and parsing takes place in the client’s browser. In addition, I have an animated avatar and other enhancements that benefit greatly by using this approach.

Another reason for using JS and RUBY scripting is to keep the application portable and cross platform, meaning no Flash or other third party plug-ins are required. In matter of fact, in testing my bot runs great on my iPhone 3Gs with 8G RAM.

 

 
  [ # 55 ]

Only thing is, since this is a commercial product your developing, perhaps you have to watch not to have the ‘secret sauce’ (as you mentioned above) to be in the client side too much (since a simple ‘show page source’ would allow some reverse engineering)..but I suspect that the secret sauce is in server side code smile
OMG - you can tell this is the 21st century.. 8 GB of RAM on a hand held device… how far we’ve come!!!!!

 

 
  [ # 56 ]
Victor Shulist - Dec 14, 2011:

And Andrew, I understand you made some good progress on your GLR parser?

Sorry I meant to reply to this days ago but all my forum time was getting used up elsewhere. It is really good to see you back and active again and I am looking forward to us encouraging/goading each other on to ever greater achievements. smile

Yes, I’ve made huge progress, largely thanks to my wife’s generosity who is funding my research, and I’ve been working on my project full-time for most of the year.

I’ve finally completed a parsing engine that has the scalability and flexibility to handle the kinds of grammars that I think are needed for natural language processing. Probably the main items of potential interest to you would be the discourse analysis grammar that’s been referenced in this thread, and also “English Verbs” over in “My Chatbot Project”.

Rather than hand coding parsing software for different purposes, I’ve been defining the grammar using a very simple notation called “Chomsky Normal Form”. My parser compiles this into highly optimised lookup tables which are interpreted by a driver. I’ve also experimented with compiling the grammar directly to C code, but for the complex grammars that I’m working with now, there isn’t a whole lot to be gained by doing this because most of the processing time is spent manipulating the directed graph structures required by the GLR algorithm. Either way, the performance is extraordinary.

While simple grammars such as the discourse analyser and various programming language parsers are easy enough to hand code in Chomsky Normal Form, I’m really just treating it as an intermediate language, the kind that compilers produce on the backend before it is assembled into binary. The real magic starts to happen when I use a higher level language to define the grammar and generate the definition from that, such as one based on typed feature structures.

The English verb forms that I’ve been trying to map out are a good example. It only takes a few hundred lines of high level code to generate all the grammar rules for parsing all the different possible forms that an english verb can take. Currently this produces around 10000 grammar rules in Chomsky Normal Form (or if you like, lines of grammar assembly language) produced from the few hundred lines of high level definitions. Those grammar rules compile to a binary that is about 20 megabytes in size (compared to 11k for the discourse analyser binary).

The resulting parser is then able to parse, disambiguate, and assign semantic values to any verb phrase in one step. I posted an example file which lists some of the verb forms that it can handle now. Although it is only a subset, there are more than 19000 distinct phrases in that list, all built around one verb. As I’ve added more forms since then, the actual total is closer to 30000 distinct forms for each verb, and once I get started on idioms there will be even more.

 

 

 
  [ # 57 ]

Here’s an example of the input and output from the test program. Using XML as the output format is just a convenience, it could just as easily be Common Lisp or something else. Internally it is all processed as directed graph structures. The semantic content could be encoded as XML attributes instead of compound element names too, something I’ve been meaning to change.

tmp/hello.bin -if i am not asked 
<Grammar>
 <
Subjunctive_Simple_Present_First_Singular_Passive_Negative>
  <
First_Singular_Subject>
   <
Common_First_Singular_Subject>i</Common_First_Singular_Subject>
  </
First_Singular_Subject>
  <
Past_Participle_ASK>asked</Past_Participle_ASK>
 </
Subjunctive_Simple_Present_First_Singular_Passive_Negative>
</
Grammar

 

 

 
  [ # 58 ]
Victor Shulist - Dec 16, 2011:

Only thing is, since this is a commercial product your developing, perhaps you have to watch not to have the ‘secret sauce’ (as you mentioned above) to be in the client side too much (since a simple ‘show page source’ would allow some reverse engineering)..but I suspect that the secret sauce is in server side code smile


Actually, a bit better protection than just a hidden file on the server. If you can “reverse engineer” 128 bit encryption that is domain dependant, good luck! So peeking at the source code will get you the HTML wrapper. Not to say a seasoned hacker with a whole lot of time on his hands could not crack it, but if they are that good, then it might be easier for them just write it themselves. smile

Victor Shulist - Dec 16, 2011:

OMG - you can tell this is the 21st century.. 8 GB of RAM on a hand held device… how far we’ve come!!!!

I remember how excited I was when I bought my first 10 gig hardrive!!
We have come a long way baby!! wink

 

 

 
  [ # 59 ]

Andrew,

Very impressive!! My parser is not nearly that sophisticated. Of course, mine does not have to be to accomplish the intended results. The focus of mine is more on word association than grammar rules, but you still need the basic grammar identification capabilities. I have really enjoyed working on my project for the pure challenge of accomplishment.

Members like yourself and Victor are inspirational and motivates my own outside of the box thinking. wink

 

 
  [ # 60 ]

Andrew

You have been working as hard as I have then!  FULL TIME to your project? Ok, I am seriously jealous! smile (Me, weekends only… except I will put 10 full days over xmas hehe)  You mentioned compiling down to C; I did exactly that myself - each grammar rule is as close to actual assembly language as you can get, and it sounds that yours is equally as efficient.

Are you planning to create some video demos?  I have been meaning to do so myself but just so busy actually working on the project itself.  Perhaps this weekend I will post to my thread (a bit dusty now !), but I have a new name for both the engine and the first ‘bot’ to be created from the platform.  CLUES/Grace, is now GLI(General Language Intelligence)/Abel (as in Cain & Abel, but I’m not a ‘fan’ of Cain lol), respectively.

I want to do some very detailed profiling of my engine also, any ideas?  I’d like to get an idea of how many parse trees per second perhaps.

Thanks for the very detailed update .. keep up those long hours!!

 

‹ First  < 2 3 4 5 > 
4 of 5
 
  login or register to react