AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Grammar and where AI went wrong
 
 

Some interesting reads on AI and grammar.

Noam Chomsky on Where Artificial Intelligence Went Wrong

In 1956, the computer scientist John McCarthy coined the term “Artificial Intelligence” (AI) to describe the study of intelligence by implementing its essential features on a computer. Instantiating an intelligent system using man-made hardware, rather than our own “biological hardware” of cells and tissues, would show ultimate understanding, and have obvious practical applications in the creation of intelligent devices or even robots.

In May of last year, during the 150th anniversary of the Massachusetts Institute of Technology, a symposium on “Brains, Minds and Machines” took place, where leading computer scientists, psychologists and neuroscientists gathered to discuss the past and future of artificial intelligence and its connection to the neurosciences.

The gathering was meant to inspire multidisciplinary enthusiasm for the revival of the scientific question from which the field of artificial intelligence originated: how does intelligence work? How does our brain give rise to our cognitive abilities, and could this ever be implemented in a machine?

Noam Chomsky, speaking in the symposium, wasn’t so enthused. Chomsky critiqued the field of AI for adopting an approach reminiscent of behaviorism, except in more modern, computationally sophisticated form. Chomsky argued that the field’s heavy use of statistical techniques to pick regularities in masses of data is unlikely to yield the explanatory insight that science ought to offer. For Chomsky, the “new AI”—focused on using statistical learning techniques to better mine and predict data—is unlikely to yield general principles about the nature of intelligent beings or about cognition.

UTOPIAN FOR BEGINNERS
An amateur linguist loses control of the language he invented.

Languages are something of a mess. They evolve over centuries through an unplanned, democratic process that leaves them teeming with irregularities, quirks, and words like “knight.” No one who set out to design a form of communication would ever end up with anything like English, Mandarin, or any of the more than six thousand languages spoken today.
“Natural languages are adequate, but that doesn’t mean they’re optimal,” John Quijada, a fifty-four-year-old former employee of the California State Department of Motor Vehicles, told me. In 2004, he published a monograph on the Internet that was titled “Ithkuil: A Philosophical Design for a Hypothetical Language.”

Ithkuil:// A Philosophical Design for a Hypothetical Language

Ithkuil has two seemingly incompatible ambitions: to be maximally precise but also maximally concise, capable of capturing nearly every thought that a human being could have while doing so in as few sounds as possible.

The principle of linguistic relativity holds that the structure of a language affects the ways in which its speakers conceptualize their world, i.e. their world view, or otherwise influences their cognitive processes. Popularly known as the Sapir–Whorf hypothesis, or Whorfianism, the principle is often defined as having two versions: (i) the strong version that language determines thought and that linguistic categories limit and determine cognitive categories and (ii) the weak version that linguistic categories and usage influence thought and certain kinds of non-linguistic behaviour.

Is it possible that the problem of AI is the lack of an appropriate grammar?

 

 
  [ # 1 ]
Merlin - Dec 20, 2012:

Chomsky argued that the field’s heavy use of statistical techniques to pick regularities in masses of data is unlikely to yield the explanatory insight that science ought to offer. For Chomsky, the “new AI”—focused on using statistical learning techniques to better mine and predict data—is unlikely to yield general principles about the nature of intelligent beings or about cognition.

I agree wholeheartedly with Chomsky! In my experience, the moment you start relying on statistics in any part of a designed AI system, you’ve already gone astray. The only exception of which I can think is when preprocessing original raw data like for smoothing purposes, before the data actually hits the analyzers / feature detections, in which case the raw data isn’t something of which the intelligent part of the system is yet aware.

Therefore I have a low opinion of Bayesian nets and data mining (the trend which Chomsky called “new AI”), both of which are among the top five trendy directions I see in AI.

My reasoning is that data do not tell cause-and-effect. This is an extremely important thing for *everyone* to realize. I detest seeing the general public constantly misled by politicians, for example, by statistics, which happens all the time to promote some group’s specific political agenda. That’s why it’s so easy to “lie” using statistics (“lies, damned lies, and statistics”). I’ve also seen too many awful injustices enacted by foolish people who lacked a critical reasoning skill due to lack of awareness of this piece of knowledge/wisdom, and I’ve also seen some very intelligent people make fools of themselves for the same reason. Intelligence is all about *understanding* and prediction, and statistics does neither. To me, statistics is a last resort for fields like psychology where the first step is merely to notice if two things are correlated, where the phenomena are so complex that cause-and-effect is too difficult to determine right away. But it should only be a first step for detecting a pattern and a beginning to think about possible hypotheses, at best *maybe* eventually a heuristic, but definitely not a dependable rule or physical law.

Merlin - Dec 20, 2012:

Is it possible that the problem of AI is the lack of an appropriate grammar?

In my opinion, the main problem of AI in general is that it currently lacks a good representation of the real world. The language understanding problem is just a subset of that basic problem. It shouldn’t be assumed that grammar must be represented as a written set of rules, in my view. To paraphrase Archimedes: Give me a place to stand and good enough representation and I’ll change the world!

The rest of your interesting post is too much for me to digest in one sitting. Good stuff, though.

 

 

 
  [ # 2 ]
Mark Atkins - Dec 20, 2012:
Merlin - Dec 20, 2012:

Chomsky argued that the field’s heavy use of statistical techniques to pick regularities in masses of data is unlikely to yield the explanatory insight that science ought to offer. For Chomsky, the “new AI”—focused on using statistical learning techniques to better mine and predict data—is unlikely to yield general principles about the nature of intelligent beings or about cognition.

I agree wholeheartedly with Chomsky! In my experience, the moment you start relying on statistics in any part of a designed AI system, you’ve already gone astray. The only exception of which I can think is when preprocessing original raw data like for smoothing purposes, before the data actually hits the analyzers / feature detections, in which case the raw data isn’t something of which the intelligent part of the system is yet aware.

Therefore I have a low opinion of Bayesian nets and data mining (the trend which Chomsky called “new AI”), both of which are among the top five trendy directions I see in AI.

I read an interesting paper a while ago that discussed how babies learn language. They made the point that young children can pick up new grammar from just a single example. (They cited experiments to test this.) And that a single counterexample can suppress the over-generalization of that new grammar. For example, linking an object and verb with a certain preposition. And then learning that this same preposition isn’t used with another similar verb. Obviously, humans aren’t using large data sets to generate rules, and yet the rules we generate can be quite accurate.

On the other hand, I wonder if we don’t discount the shear amount of raw data we are exposed to. For example, after seeing a cat once, a human will not confuse it with a dog. However, we aren’t just seeing one instant of a cat, are we? We see it in motion—from many angles, and how it moves from one position to another. How do we quantify the amount of data that we are exposed to? And how much is required to uniquely identify a cat in the future?

 

 
  [ # 3 ]
Mark Atkins - Dec 20, 2012:

Therefore I have a low opinion of Bayesian nets and data mining (the trend which Chomsky called “new AI”), both of which are among the top five trendy directions I see in AI.

Google would disagree. In the “Introduction to Artificial Intelligence” course (now offered by Udacity), given by Peter Norvig and Sebastian Thrun, most of the problems were translated into a “Bayesian” or search problem. Peter Norvig is Director of Research at Google Inc. He is also a Fellow of the American Association for Artificial Intelligence and the Association for Computing Machinery. Norvig is co-author of the popular textbook Artificial Intelligence: A Modern Approach. Sebastian is best known for his research in robotics and machine learning, specifically his work with self-driving cars.

In anecdotes about Google, it is apparent that much of their philosophy is influenced by “Bayes Rule”. If anyone can make the best use of lots of data and hardware, then Kurzweil and Google are probably a good bet.

On the other hand, after taking a look at the Bayesian and neural net approaches and how they might relate to chatbots and intelligence, I don’t believe that just slurping up big data is the optimal path to general computer intelligence.

Stepping through the:
DATA>INFORMATION>KNOWLEDGE>WISDOM
process will require more finesse. I don’t believe it is a matter of computing power. I think it has more to do with a lack of appropriate system/software design and architecture.

 

 
  [ # 4 ]
C R Hunt - Dec 20, 2012:

I wonder if we don’t discount the shear amount of raw data we are exposed to.

I agree CR. If an AI/chatbot had the same amount of input in terms of man hours and hand holding that a baby/child has for its education, then we might be surprised at how fast progress might be made in it’s education.

 

 
  [ # 5 ]
C R Hunt - Dec 20, 2012:

How do we quantify the amount of data that we are exposed to? And how much is required to uniquely identify a cat in the future?

“One exposure to the wise is sufficient”?

Recently I read about an interesting neurological hypothesis that there is a feedback loop that holds information in memory longer than the duration of its sensory exposure, sort of like a looped recording that keeps exposing memory cells to that one memory. That is roughly equivalent to repeated training on the same input pattern in artificial neural networks. I’ll find and type the reference, if you like. Of course that immediately begs a lot of other questions, like why other memories don’t average that memory out, how generalizations are formed, and so on.

 

 

 
  [ # 6 ]
Merlin - Dec 20, 2012:

Google would disagree.

So would eBay. So let’s talk about an experience I had with eBay’s automatic recommendation system…

One year I was buying a lot of memorabilia from Disneyland on eBay. Of course eBay’s statistical software had no trouble recognizing the theme of Disneyland in my purchases and recommending other items I might like, and it did a very good job of that. But I also bought a number of items that seemed unrelated: a postcard of a restaurant in San Juan Capistrano, a postcard of orange groves in Irvine, and so on. So eBay began recommending postcards of orange groves in Florida to me. It’s “logic” (which was just statistical clustering) was apparently that I was interested in three different topics: Disneyland, a restaurant, and orange groves. Of course another pattern is that all three of the locales were in Southern California, and eBay’s software eventually must have picked up on that statistical pattern, too, since it began recommending items from Knott’s Berry Farm, among other things.

However, in general I had no real interest in orange groves in Florida or Knott’s Berry Farm. There was in fact a very clear-cut pattern in my purchases but eBay never caught it: in general I was buying memorabilia only from the route along the I-5 freeway between San Diego and Anaheim, which was the standard route my family and I used to take to Disneyland. A really intelligent piece of software would have had more knowledge of the world in the form of maps, and would have noticed that my purchases were consistently of places that could be plotted along a clear-cut segment of a very prominent line on a map. That would have been truly intelligent, and it would have immediately hit 100% prediction accuracy in suggestions of items that might interest me. Instead, all it had were three clusters that were statistically unrelated in either topic or locale, based on my rejection of its other suggestions based on locale, so overall its prediction (based on its lack of understanding, based on its limitation to statistics) was weak.

I can probably give numerous other examples of how statistics fails miserably to produce intelligent reasoning, but I thought the above example was particularly clear-cut.

Therefore I’ll keep my bias against statistics-based “intelligence”, regardless of the stature or pay of Google’s employees, thank you. grin

 

 

 
  [ # 7 ]

The only exception of which I can think is when preprocessing original raw data like for smoothing purposes, before the data actually hits the analyzers / feature detections, in which case the raw data isn’t something of which the intelligent part of the system is yet aware.

Does this mean you live in a world where everything is certain to you?

 

 
  [ # 8 ]
Jan Bogaerts - Dec 21, 2012:

Does this mean you live in a world where everything is certain to you?

Of course the world is uncertain, so it sounds like you’re asking about the relationship between statistics and reasoning. Here’s how I believe the strength of understanding increases in science, depending on the methods used…

At the lowest level, there is only correlation, more formally correlation between at least two variables. There is no cause-and-effect implied. If grass is correlated with the color green, that doesn’t say whether grass causes greenness, or whether greenness causes grass, or neither (a third agent could be causing both effects and the correlation could be an artifact of some other mechanism).

“Correlation does not imply causation” is a phrase used in science and statistics to emphasize that a correlation between two variables does not necessarily imply that one causes the other.
http://en.wikipedia.org/wiki/Correlation_does_not_imply_causation

A step up from that is experimentation. If you create an experiment where you manipulate one of those variables while observing the other, then you know which is cause and which is effect.

Experiments establish cause and effect.
http://www.simplypsychology.org/correlation.html

However, at that point you have only shallow reasoning based on statistics. For example, knowing that drug X cures disease Y with certainty c does not tell you why. There could be a very complicated set of chemical changes and anatomical involvement and even psychological factors involved. Or it could even be that the experiment happened to involve a place and time and sampled population that was very atypical so that the seeming cause-and-effect really didn’t even exist.

http://www.j-paine.org/students/lectures/lect3/node25.html

A step up from that is reasoning from “first principles”. If you are trying to trying to figure out why a light is not coming on and you notice the cord is unplugged, that is deep reasoning because there is clear-cut cause-and-effect. That’s where the level of intelligence lies, even though that was a simple example. Some authors even go so far as to consider intelligence to be nothing more than accurate prediction.

But that’s not all. I am arguing a much stronger proposition. Prediction is not just one of the things your brain does. It is the primary function of the neocortex, and the foundation of intelligence. The cortex is an organ of prediction. If we want to understand what intelligence is, what creativity is, how your brain works, and how to build intelligent machines, we must understand the nature of these predictions and how the cortex makes them. Even behavior is best understood as a by-product of prediction.
(“On Intelligence”, Jeff Hawkins with Sandra Blakeslee, 2004, page 89)

RC:

By accident I just came across that except I mentioned, which fortuitously also relates to the prediction-as-intelligence philosophy I must mentioned above to Jan:

It is not possible to describe this complex mechanism here, but what is es-
sential is that the succession of the animal’s positions in space will be replayed
by the brain at least seven or eight times per second. In other words, seven or
eight times per second, the animal will experience all the different sensory
configurations that correspond to the positions he will occupy if he correctly
negotiates the trajectory he has learned. I can think of no better example of
what I mean when I say that perception uses memory to predict the future
consequences of action. This interplay of past and future repeats on a time
scale of several dozen milliseconds in the brain. It is also possible to imagine
that slower mechanisms exist. These ideas are of course only models, but they
are becoming increasingly plausible, even though the presence of theta rhythm
and rhythm at 40 Hz in the hippocampus of the monkey have not yet been
firmly established.
(“The Brain’s Sense of Movement”, Alain Berthoz, 2000, pages 133-134)

Also, if your question was more about how the various angles of view of a cat can allow a brain to understand a single concept of a cat, the answer is even simpler (in theory): the brain is highly dedicated to creating invariant representations, in nearly all components and in nearly all sensory modalities. Not the best quote, but here’s an example:

There are other very important examples of relativity in motion perception that are
more representative of the vectorial nature of motion processing. These examples come
mainly from the Gogel and Johansson studies. These latter studies indicate three important
properties of motion perception (Johansson, 1978):
(1) that the visual system is capable of vector analyzing motions;
(2) that relative motions can be integrated into coherent shapes by the visual system;
(3) that there seems to be a hierarchy of preferred transformational invariants in (2)
and (1) above, particularly when motion in depth is a consequent perception.
(“Visual Perception: Theory and Practice”, Terry Caelli, 1981, page 154)

 

 

 
  [ # 9 ]

I have noticed that, to get from correlations to reasoning, a weighted statistical approach can be beneficial to better locate the points of interest. Furthermore, from my perspective at least, there isn’t much of a difference between grammar and more statistically based approaches. As a real world example, take the 2 decision trees that where generated for for the (AIML) pattern definitions:

<pattern>This is a test</pattern>
<
pattern>This is a test</pattern//same as above, so a duplicate pattern.
<pattern>This is no test</pattern

Which look like the images attached at the bottom.
I deliberately declared a duplicate cause this is an interesting case for statistical approaches: what to do with them: take the first, a random one, use the one that was most spoken by the other person or take the one that belongs to the topic that was most recently used,... (the latter 2 being more statistically based, since you need to keep track of weights, or something similar).
In the ‘statistical approach, I changed every step in the tree into a double that keeps track of a weight. There’s also an extra layer at the end to allow for an extra weight’ between the different results. And that’s it. The interpretation of the tree can be done by the same code.
The additional weights can be used to find further relationships in combination with other decision trees which don’t necessarily need to have the same structure.

For me, it isn’t about one or the other, but both, which can be more then 1 + 1.

Image Attachments
TextPatternDecisionTree.JPG
WeightedTextPatternDecisionTree.JPG
 

 
  [ # 10 ]

Note: in the example above, I’ve given a weight to the neurons themselves. I could also have assigned this to the relationships, but that’s currently still a bit hard to depict graphically. It wouldn’t have needed the extra layer though.

 

 
  [ # 11 ]

Jan,

I’m barely familiar with AIML and I don’t know what your goal is, so I don’t know why you’re using the representation you’re using. In the representation I mentioned in my thread about generalizing the ten sentence patterns, for example, I might represent “This is a test” as merely a Venn diagram, with either one or two circles, depending on the importance of the words “this is”. I might represent it as one of the following:

(1)
One circle labeled “test”, possibly with the modifier “a” hanging off of it.
(2)
Two circles, one labeled “this”, one labeled “test” with “this” completely embedded in the set “test”, possibly with the modifier mentioned in (1).

Then the entire Venn diagram would be placed in one of the four slots of my vector.

I don’t see that the verb “to be” / “is” is required to represent the concept of that sentence, so I view all the arrows in your diagram as doing nothing except confusing things. As I elusively implied in passing in another thread, “to be” has different fundamental meanings, especially “subset of”/“element of”, “has attribute of”, and “equivalent to”. If I were setting up my own representation system, I might omit that verb altogether and use three other separate mechanisms/representations to distinguish between those three concepts. In fact, in my current architecture I’m using another completely different representation than that Venn + vector + modifier tree approach, so there is much flexibility in representation, largely influenced by one’s particular goal. Subconsciously, that’s probably why I haven’t been motivated to learn AIML: it’s not going to readily represent what I want represented, or in the way I want it represented. I would also represent number of presentations in a completely different way, and correlations in a completely different way, and I might not even include correlations per se at all.

 

 
  login or register to react