I’m a first-year computer science undergraduate, and I’m undertaking a project to write an English-speaking chatbot with some limited semantic capabilities. Specifically, I would like to recognize queries regarding people, model ongoing conversations, and transform statements and requests into database queries composed in a compiled semantic language and resolved by a reasoning system. Ambitious, I know!
Whether or not I get all of that done isn’t important to me right now; for now, I’d simply like to work out a simple system for English parsing. I’m writing in Python and would prefer to keep it that way, but I’m open to changing if completely necessary.
I would like to be able to build parse trees of simple statements, queries, and recursive compositions thereof. I have a fairly reliable POS tagger I’ve corpus-trained in NLTK, a natural language toolkit for Python.
My first task is developing a complete grammar and selecting a parser to use in order to construct valid sentence trees from an input. I can do this to some extent but not reliably at all, and my regex is a mess! Is anyone aware of a good way to construct English parse trees, given assumed-correct POS tags? NLTK’s tutorial book leads you down a few neat paths, but there’s seldom, if ever, a single mention of the industry standard or what to do if you just want to skip the development step. I absolutely want a parser that’s written for me, because it’s not my goal to write one! I also would really like to keep using Python.
If anybody has any tips, I would really appreciate them.