AI Zone Admin Forum Add your forum

NEWS: survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

POS tagging, is it good enough?

Is It Time for Some Linguistics

Christopher D. Manning (taught the Coursera NLP course)
Departments of Linguistics and Computer Science
Stanford University

Manning states that part-of-speech tagging performance’s current level is about 97.3% token accuracy (56% sentence accuracy).

What is the source of the belief that 97% is the limit of human consistency for part-of-speech tagging? It is easy to test for human tagging reliability: one just makes multiple measurements and sees how consistent the results are.

Is Part-of-Speech Tagging a Solved Task? Is it accurate enough for Natural Language Understanding?



  [ # 1 ]

Hey Merlin,

Just got started with Part of Speech tagging, among other things, while training Elizabot this evening. As you can see, this is somewhat rudimentary still, not fully developed and integrated like it is in mature chatbots, such as ChatScript and Skynet-AI.

Can you give me any tips?

>> Tell me aboutNLP
<< The branch of information science that deals with natural language information.  Part of Speechn.
Synsethuman language technology
<< Synsetnatural language processing
<< Hypernyminformatics.  Part of Speechn
<< Hypernyminformation processing.  Part of Speechn
<< Hypernyminformation science.  Part of Speechn
<< Hypernymip.  Part of Speechn

>> What isrobot
<< A mechanism that can move automatically.  Part of Speechn.
<< Synsetgolem
<< Hypernymmechanism.  Part of Speechn
<< Hyponymandroid.  Part of Speechn
<< Hyponymhumanoid.  Part of Speechn
<< Hyponymmechanical man.  Part of Speechn

/* Explanation: Like C++ cin >> is input and cout << is output. */ 


Elizabot Citations:

George A. Miller (1995). WordNet: A Lexical Database for English.
Communications of the ACM Vol. 38, No. 11: 39-41.

Christiane Fellbaum (1998, ed.) WordNet: An Electronic Lexical Database.
Cambridge, MA: MIT Press.

Princeton University “About WordNet.” WordNet. Princeton University. 2010.




  [ # 2 ]

Cool topic, Merlin, and nice paper! I need to read through it more thoroughly and I definitely have some thoughts on the subject. In the middle of traveling now, but when I get some time I’d love to talk about this more. smile


  [ # 3 ]

First, I should be clear. Skynet-AI does not use Part-Of-Speech (POS) tagging.
I believe ChatScript does, but you would have to ask Bruce about how he uses it.

It may be interesting for some as to why I don’t use it.
I looked at using POS in a semantic pipeline to help aid in understanding. My goal for an AI is to understand the input phrase or Natural Language Understanding (NLU). In such a system the POS would be fed to downstream tasks. Some popular taggers include:
- Brill Tagger
- Stanford Part-of-Speech Tagger
- NLTK Tagger

POS Tagger Limitations
In Skynet-AI, the AI interpreter is downloaded each time a user signs on. This creates constraints in practical resource sizes. Since most of the taggers use a dictionary, this increases overhead. A standard tagger is larger than all of Skynet-AI. I did create my own real-time tagger with a minimal dictionary and rules. It looked promising, but ultimately was not part of Skynet-AI’s design. If people have enough interest, I could freshen it up and put a demo on-line.

In the paper Manning gives a few facts about the accuracy of tagging. Current state of the art is about 97% (about the same as a human). But it drops dramatically for full sentences (56%). Since the goal is NLU of the input phrase, I am more concerned about the 56%. That would mean about every other sentence would have an error in it. This led me to conclude that humans have some fuzzy mechanism in the process that allows them to account for or ignore these errors and still understand the input.

Manning gives a number of reasons for the POS errors:
Frequency of different POS tagging error types.
Class Frequency
1. Lexicon gap 4.5%
2. Unknown word 4.5%
3. Could plausibly get right 16.0%
4. Difficult linguistics 19.5%
5. Underspecified/unclear 12.0%
6. Inconsistent/no standard 28.0%
7. Gold standard wrong 15.5%

My analysis showed that many of these problems would be increased when accounting for “text speak” with typos and shorthand phrases in it. As a result I took a different approach with Skynet-AI which has worked out well.

I have also been interested in universal parts of speech from a stand point of intelligence/NLU. The basics may be; nouns, verbs, adverbs, pronouns, determiners/articles, numbers, punctuation. You might also add tags for conjunctions, particles, and other (abbreviations, etc) according to the paper. As the size of a bot grows, and you are looking to handle ever larger numbers of responses, one of the areas that POS tagging may help is in Natural Language Generation (NLG).

Other related areas I found interesting:
Link Grammar
Using a Neural Net to learn POS
Neural word vectors/Deep learning for NLP


  [ # 4 ]

...56% isn’t very useful. A dictionary comparison is only the start. I think one really needs grammar rules or some sort of context analysis to get that up. The best I’ve managed in terms of categorising words in sentences is about 70%-80% correct, and for that I programmed every language rule I could get my hands on: Grammar, spelling, morphology and some Latin (grammar rules helped the most).
I imagine you could get to something similar with a large database of conventional word occurrances to figure out ambiguous uses, but I consider that to be inefficient and in the end you’re still going to need context and reasoning to cover the last 20% or so. Humans can keep multiple ambiguous meanings of words in mind while reading, deciding which is meant only once one of them starts making sense in the context.

But I’m no expert. Surely there are systems that do better than 56% in sentences?


  [ # 5 ]
Merlin - Sep 17, 2013:

First, I should be clear. Skynet-AI does not use Part-Of-Speech (POS) tagging.
I believe ChatScript does, but you would have to ask Bruce about how he uses it.

Oh, thanks!  That’s a great tip, Merlin.  Here’s what I did with it. I downloaded ChatScript version 3.62 just released today.  Browsing through the new source code and dictionaries, i was able to interrogate Elizabot on that basis… So, I imagine ChatScript may be taking advantage of a later Wordnet version than Elizabot is during alpha testing.  During beta testing, I may upgrade Elizabot’s Wordnet version, but that won’t be anytime soon.

>> Tell me aboutWordnet 
<< A machine-readable lexical database organized by meaningsdeveloped at Princeton University.  Part of Speechn.
Any of the machine-readable lexical databases modeled after the Princeton WordNet.  Part of Speechn.
Synsetprinceton wordnet
<< Hypernymlexical database.  Part of Speech




  [ # 6 ]

I am aware of the Stanford statistic of 97.x% postagging accuracy and 56% sentence accuracy. But I don’t know whether sentence accuracy also includes which word to bind a prepositional phrase to. If the grammatical substructures are determined correctly but the overall hooking together of the tree is wrong in some pieces, that is less serious than if the actual sentence is badly divined.  The reasons I am writing my own pos-tagging/parser in ChatScript is that I have needs beyond chat for accurate and rapid parsing, and 97.x postaggging accuracy and the corresponding sentence parsing rate is not acceptable, nor is the speed of the Stanford system. The ONLY systems I know of that can achieve a 99.x% pos tagging accuracy are rule-based ones, not statistically based ones. They are MUCH harder to write obviously, and don’t port nicely to other languages. Fortunately I am primarily concerned with English (any other language will have to live with whatever pos/parser technology is already out there). And I am willing to beat my head against the wall over time to get my system up to snuff. It is not close to prime-time yet.


  login or register to react