AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Thoughts on the search for the math of intelligence
 
 

PART 1: TENSORS

(Part 1 of about 10 parts)

(This is necessarily going to be a long thread, therefore it will necessarily require many consecutive posts spanning at least several days. Everyone feel free to post inbetween my various posted parts, since it might turn out that I can’t finish my entire intended thread anyway.)

My recent thread about the U.S. government searching for “Maxwell’s equations of thought” really got me thinking about where one would find a suitable mathematics to represent thought, intelligence, or AI, and whether such a form math even exists, and why are why not. I’ve spent the past month learning branches of math I’d never understood before, and I’ve been brainstorming extensively on all of this, so I thought I’d post my findings and thoughts.

In 1989 Patricia Churchland suggested three areas of AI research that she considered particularly promising. One such area was tensors, which are an area of mathematics related to arrays:

(p. 411)
  My approach here will be to present three quite different theoretical
examples with a view to showing what virtues they have and why
they are interesting. Each in its way is highly incomplete; of course
each makes simplifications and waves its hands in many important
places. Nevertheless, by looking at these approaches sympathet-
ically, while remaining sensitive to their limitations, we may be able
to see whether the central motivating ideas are powerful and useful
and, most importantly, whether they are experimentally provocative.

(p. 411)
  Two of the examples originate from within an essentially neu-
robiological framework. The first focuses on the fundamental prob-
lem of sensorimotor control and offers a general framework for
(p. 412)
understanding the computational architecture of nervous systems.

The authors of this approach are Andras Pellionisz and Rodolfo
Llinas, and owing to the very broad scope and the general system-
aticity their theory seeks to encompass, I shall discuss it at consider-
able length.

(p. 417)
The basic mathematical
insight was that if the input is construed as a vector in one coordinate
system, and if the output is construed as a vector in a different coordi-
nate system, then a tensor is what effects the mapping or tranforma-
tion from one vector to the other
. Which tensor matrix governs the
transformation for a given pair of input-output ensembles is an em-
pirical matter determined by the requirements of the reference frames
in question. And that matrix is implemented in the connectivity rela-
tions obtaining between input arrays and output arrays.

(p. 418)
  A tensor is a generalized mathematical function for transforming
vectors into other vectors, irrespective of the differences in metric and
dimension of the coordinate systems. If the basic functional problem
of sensorimotor control is getting from one very different coordinate
system to another, then tensorial transformations are just what the
nervous system should be doing
. Accordingly, the hypothesis is that
the connectivity relations between a given input ensemble and its
output ensemble are the physical embodiment of a tensor.

(“Neurophilosophy: Toward a Unified Science of the Mind-Brain”, Patricia Smith Churchland, 1989)

Coordinate transformations are certainly critically important, as Marr notes in his book “Vision”. Marr even suggests that our brains automatically assign a visual coordinate system to each one of our limbs, even when viewing somebody else’s limbs, from the arm to the hand to each finger joint, each of which rotates on its own unique axis:

(p. 307)
There are two kinds of object-centered coordinate systems that the 3-D
model representation might use. In one, all the component axes of a
description, from torso to eyelash, are specified in a common frame based
on the axis of the whole shape. The other uses a distributed coordinate
system, in which each 3-D model has its own coordinate system. The latter
is preferable for two main reasons.
First, the spatial relations specified in
a 3-D model description are always local to one of its models and should
be given in a frame of reference determined by that model for the same
reasons that we prefer an object-centered system over a viewer-centered
one. To do otherwise would cause information about the relative disposi-
tions of a model’s components to depend on the orientation of the model
axis relative to the whole shape. For example, the description of the shape
of a horse’s leg would depend on the angle that the leg makes with the
torso. Second, in addition to this stability and uniqueness consideration,
the representation’s accessibility and modularity is improved if each 3-D
model maintains its own coordinate system, because it can then be dealt
with as a completely self-contained unit of shape description.

(“Vision: A Computational Investigation into the Human Representation and Processing of Visual Information”, David Marr, 1982)

That is a lot of coordinate systems to consider, and at first that suggestion seemed unrealistic to me, but after much thought I decided Marr was probably correct. At the very least our brains certainly convert viewer-centered (= self-centered) coordinate systems (where our own body is at the origin of a 3D graph) to object-centered coordinate systems (where the center of the object being viewed is at the center of a 3D graph), so clearly our brains already perform such computations without our even being consciously aware of that underlying mathematics. Object-centered coordinate systems are almost required since the brain requires invariants: without such invariants, our brains would have to update every model of every object we see as soon as we changed location, or the objects changed location, or shadows moved in the slightest way. Since such coordinate transformations already occur in our brains and are performed with ease, it’s not difficult to believe that extrapolations of this mechanism exist and are commonly used by the brain.

Unfortunately, speaking for myself, I just don’t understand the gist of tensors. Twice in my life I’ve made a serious effort to learn them, and both times I plowed through many pages of linear algebra, most of the material of which I already knew, then I finally gave up in boredom since I didn’t understand where it all was headed. Are tensors just linear algebra with more flexible notation? If so, why doesn’t somebody just say so, and then give a tutorial on the differences and why those differences are important, instead of trying to give me a repeat course in linear algebra? Maybe I’ve just been unfortunate enough to get poor or unsuitable introductory material. Admittedly there exists a lot of poor tutorials out there, especially in topics in mathematics.

But at least Patricia Churchland admitted that she also had great trouble understanding tensor theory, at least as far as it related to AI. She admitted she finally had to devise a cartoon of a crab trying to grab an apple in order to get a grip on a specific example of how tensors work…

(p. 419)
The phenomenological scenario here seems to be
confusion and incomprehension in the first phase, followed, as
understanding flowers, by a gathering sense of obviousness adhering
to the general principles. The detailed hypotheses are, evidentally, a
further matter. My own understanding here began to find its feet as
Paul M. Churchland and I constructed a cartoon story of a highly
simplified creature who faces a sensorimotor control problem of the
utmost simplicity.
In what follows I shall use the cartoon story in
trying to outline the Pellionisz-Llinas picture of the brain’s geomet-
rical problems and its geometrical solutions. With that in hand, we
shall return to the nervous systems and to the cerebellum in particular.

(“Neurophilosophy: Toward a Unified Science of the Mind-Brain”, Patricia Smith Churchland, 1989)

I still haven’t dug into the details enough to understand her cartoon example, but nevertheless I thought she did an excellent job of presenting her simplified example, especially with the accompanying 3D coordinate systems and bent lines showing the transformations involved.

My opinion of tensors relative to AI is mixed. While tensors may be ideal for matrix operations, especially since the cerebellum’s layout almost looks like a 3D matrix already, I have some doubts as to whether a matrix can provide flexible enough pattern recognition of the type humans have, especially for vision. Images, after all, aren’t inherently mathematical or inherently suitable for display in a rectangular grid: images are continuous objects that move in a continuous manner. Images in general are very difficult to describe accurately in *any* manner, especially via matrix representation. I have a hard time believing that the brain contains pixelized screens with their pixels discretely winking on or off as the edge of an image passes by.

(p. 412)
  The place to start, then, is where the theory started: the cerebellum.
With only some exaggeration it can be said that almost everything
one would want to know about the micro-organization of the cerebel-
lum is known
. For neuroanatomists the cerebellum has been some-
thing of a dream of experimental approachability, because it has a
limited number of neural types (five, plus two incoming fibers), each
one morphologically distinctive and each one positioned and con-
nected in a characteristic and highly regimented manner
(figure 2.4).
The output of the cerebellar cortex is the exclusive job of just one type
of cell, the Purkinje cell (of which more anon), and the input is sup-
plied by just two, very different cell systems, the mossy fibers and the
climbing fibers. This investigable organization has made it possible to
determine the electrophysiological properties of each distinct class of
neuron and to study in detail the nature of the Purkinje output rela-
tive to the mossy fiber-climbing fiber input. The neuronal population
(p. 413)
in the cerebellum is huge—something on the order of 10^10 neurons—
and there is at last another order of magnitude in synaptic connec-
tions. Nonetheless, basic structural knowledge of the cerebellum has
made it possible to construct a schematic wiring diagram
that illus-
trates the pathways and connectivity patterns of the participating
cells (figure 10.1). The first point, then, is that a great deal is under-
stood at the level of micro-organization.

(“Neurophilosophy: Toward a Unified Science of the Mind-Brain”, Patricia Smith Churchland, 1989)

I’m still keeping an open mind about arrays and tensors possibly being a universal representation method in the brain, but mostly I don’t believe that’s a promising direction. Engineers, mathematicians, and computer scientists, especially people modeling artificial neural networks (ANNs), have been using arrays for representation for years, and nothing particularly promising or novel seems to have arisen from such representations or ANNs. Engineering complaints like wasted storage space for sparse matrices seem to be common, and Simon Haykin mentioned the need for nonlinear analog components in ANNs. Most books on AI don’t even mention arrays in conjunction with knowledge representation (KR). Whereas arrays are good for describing how coordinates transform to one another, an underlying problem is that such an operation must be performed on *every point* in the visual object, and there are a *lot* of points in visual objects. In fact, in theory, there are not just infinitely many such points, but *uncountably* infinitely many such points. Math applicable to computers doesn’t handle uncountable infinity very well. grin

 

 
  [ # 1 ]

PART 2: AUTOMATA THEORY

Any use of digital computers is technically use of a branch of mathematics called automata theory (http://en.wikipedia.org/wiki/Automata_theory), regardless of the software programming paradigm you are using, and regardless of whether you are intentionally performing mathematical calculations. “Automata” include things like DFAs = deterministic finite automata (http://en.wikipedia.org/wiki/Deterministic_finite_automaton), Turing Machines (which are just simplified, abstracted models of digital computers), and PDAs = pushdown automata (http://en.wikipedia.org/wiki/Pushdown_automaton), each of which is considered a mathematical object that at its simplest level merely takes inputted words and decides whether to accept each word or not.

Since digital computers have existed since the 1940s, which is the same decade in which the field of artificial intelligence began in earnest, attempts to produce AI have now spanned 60 years, without any major, known results, which has lead some people to reconsider whether some other type of computing machine / automaton is better suited to AI. One problem is that most discrete-state machines of which we can conceive are theoretically equivalent to a Turing Machine, which in turn is theoretically equivalent to the same computers we already use every day that seem incapable of exhibiting intelligence. (I intended the term “in theory” mostly to mean that efficiency issues aren’t considered.) For example, parallel processing computers and some cellular automata are still equivalent to Turing Machines. (From what I gather, there is some debate about whether genetic algorithms are equivalent to Turing Machines, however.)

The Rule 110 cellular automaton (often simply Rule 110) is an elementary cellular automaton with interesting behavior on the boundary between stability and chaos. In this respect it is similar to Game of Life. Rule 110 is known to be Turing complete. This implies that, in principle, any calculation or computer program can be simulated using this automaton.
http://en.wikipedia.org/wiki/Rule_110

In generating a program automatically, if
we do not know whether the problem is solvable or not in
advance, then the representation of the program must be
Turing-complete, i.e. the representation must be able to
express any algorithms. However, a tree structure used
by the standard Genetic Programming is not Turing-
complete
.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.152.9154&rep=rep1&type=pdf

James Bailey in his book “After Thought: The Computer Challenge to Human Intelligence” shows great interest in new “maths” based on exotic new computer architectures, and opined that our standard digital computer model (i.e., automata theory) is obsolete:

(p. 9)
  The designers of the first electronic computers deliberately
designed them to carry out the same numerocentric operations of alge-
bra and calculus that had been developed back when all computers
were people and when available computing capacity was minuscule.
These equational maths came to prominence in the Renaissance,
replacing the circles and diagrams of geometry. Their entry into the
schools almost three hundred years ago was the last substantial change
in the secondary mathematics curriculum.
  In an age when computing power is abundant, these maths are
obsolete
. At a minimum, it is time to transfer responsibility for teach-
ing geometry to the history department. If students should be intro-
duced to the maths of the ancient Greeks, it should be in the same way
they are introduced to the political theories and the art of the Greeks.
The problems for which geometry originally entered the schools have
been either solved or taken over by other methods.
  Reassigning responsibility for geometry opens up room in the cur-
riculum for new evolutionary intermaths, maths with still-unfamiliar
names like cellular automata, genetic algorithms, artificial life, classifier
systems, and neural networks
. These are maths that would have made
no sense in previous centuries because they are maths that no people
in their right minds would ever try to carry out “by hand.” They are
maths that flourish in an environment where all information is uni-
formly encoded as bits. These are the maths with which electronic
computers will evolve their own versions of scientific theories and for-
mulations.

(“After Thought: The Computer Challenge to Human Intelligence”, James Bailey, 1996)

However, I view Bailey’s eclectic examples as carelessly tossed out, without deep thought into either their foundations, definitions, relationships, or efficiency issues. I wouldn’t consider many of his examples as “maths”, unless they are already included in the math of automata theory. Specifically, as I just mentioned, some cellular automata are known to be Turing complete, “artificial life” as I understand it is just computer simulation (therefore is essentially just a Turing Machine), “classifier systems” are not defined by Bailey and as far as I know are just examples of the other mentioned automata, and neural networks per Simon Haykins’ later versions of his book “Neural Networks: A Comprehensive Foundation” are largely a dead end because they are equivalent to known statistical methods, with the only exceptions being their nonlinear activation functions or possibly the addition of top-down methods. The same comments apply to turtle graphics, another “math” that Bailey mentions elsewhere. Therefore no “math” Bailey proposed is likely to give us anything more powerful than a Turing Machine: they are all likely Turing complete.

However, I believe Baily is *roughly* on the right track, especially since he makes some comments on which I agree perfectly, especially regarding avoidance of non-numerical math for futuristic maths, but his specific suggestions fall short, in my opinion. What other kind of machines should be considered, then? Weizenbaum cryptically mentions that there are a few exceptions to Turing’s theorem about Turing completeness / universal computation:

(p. 64)
Turing proved that all computers (save a few special-purpose types that do not concern us here) are equivalent to one another, i.e., are all universal.

(“Computer Power and Human Reason: From Judgment to Calculation”, Joseph Weizenbaum, 1976)

I haven’t researched enough to find out exactly which “types that do not concern us here” to see if they really do concern us here, but I suspect one of those types is analog computers. I regard conventional analog computers as essentially math equations where functions are implemented as resistors, operators like addition and multiplication are implemented as junctions of lines, and numerical values are represented as voltages (https://www.clear.rice.edu/elec301/Projects99/anlgcomp/). Since there is no memory, program, language, or decision-making ability in such machines, they are stupid “special-purpose” computers, probably the ones that Weizenbaum mentioned.

However, many people assume that because conventional analog computers are an outdated, special-purpose novelty that analog computation is no longer of interest or theoretical value. The main problem with that assumption is that the term “analog” doesn’t need to refer to *numerical* values: it can refer to any non-discrete part of a machine or its representations. “Analog” means a “continuously variable physical quantity”. Of particular interest to me is the possibility that representation of images would ideally require analog representation, since images are represented with continuous contours, continuous shadings, and continuous motion if they are in motion. Different parts of a Turing Machine can be analog, too: a continuous tape instead of a tape of discrete symbols, continuous symbols instead of discrete symbols, continous states instead of discrete states (possibly similar to quantum states), or continuous output (e.g., drawn diagrams) instead of printed discrete symbols.

Even quantum computing is a big question mark. It is widely agreed that they might be more powerful than Turing machines, but not that they definitely will be, because there are foundational issues in quantum physics at stake that we are not 100% sure of.
http://c2.com/cgi/wiki?NonTuringComputing

As usual, the issue of what the *goal* of a Turing Machine is, or more generally the goal of the branch of math called automata theory is, is something I believe to be critically important. The whole idea is that a Turing Machine is supposed to describe our common notion of what an “effective method”, i.e., an “algorithm” (http://en.wikipedia.org/wiki/Algorithm), is. This is where I believe the whole direction of automata theory breaks down. While humans admittedly often describe things such as cooking recipes and travel directions in discrete steps, those discrete steps are always composed of low-level recognition tasks with an assumed foundation of commonsense reasoning, and it is exactly those low-level tasks that give us the most difficulty in producing AI. It may be necessary to use algorithms to model human reasoning at a higher level, but to use algorithms (i.e., automata theory) to model *everything* that animals need to consider is something that I cannot believe. If nothing else, we know the brain does not perform millions of mathematical steps, such as a Prolog program or math simulation might do, because we can measure the response times and neural conduction times of biological brains and determine that so many *serial* computations simply cannot explain intelligent behavior:

(p. 35)
For example, it is worth dwelling on the con-
straints imposed by this temporal fact: events in the world of silicon
chips happen in the nanosecond (10^-9) range, whereas events in the
neuronal world happen in the millisecond (10^-3) range. Brain events
are ponderously slow compared to silicon events, yet in a race to
complete a perceptual recognition task, the brain leaves the computer
far back in the dust. The brain routinely accomplishes perceptual
recognition tasks in something on the order of 100-200 milliseconds,
whereas tasks of much lesser complexity will take a huge conven-
tional computer days. This immediately implies that however the
brain accomplishes perceptual recognition, it cannot be by millions of
steps arranged in sequence. There is simply not enough time. (This
will be discussed in more detail in chapter 10. See also Feldman 1985.)

(“Neurophilosophy: Toward a Unified Science of the Mind-Brain”, Patricia Smith Churchland, 1989)

There are some obvious solutions to this serial processing problem arising from automata: (1) use parallel computations; (2) use brute force speed to compensate for inefficiency resulting from an inappropriate computing architecture and/or limited connectivity (in contrast to the extremely high connectivity of real neurons); (3) use analog computation in some way. Solutions (1) and (2) bring us right back to the original problem though, since despite decades of research into parallel processors and neural processors we still can’t figure out how to combine those computations or how to represent the associated data well enough to perform timely recognition of complex objects like faces. In my opinion, solution (3) deserves the most investigation, especially with regard to visual objects in the real world, especially in contrast to the *numerical* mathematical respresentations we’ve been using, whether those numbers were found in scalar form or array form (as used in tensors).

 

 
  [ # 2 ]

PART 3: GROUP THEORY

My interest in vision as key to AI stems from several observations. A few such observations are:

(1) Machines seem to be increasingly taking the same IQ tests that humans take (at University of Gothenburg: http://www.gizmag.com/swedish-program-150iq/21465/; at MIT: http://news.uic.edu/a-computer-as-smart-as-a-four-year-old), and many such IQ test problems like analogies (http://apps.nthfusion.com/1_analogy.php), Raven’s Progressive Matrices (http://wanttolearnmath.edublogs.org/2013/01/29/raven-matrices-test/), and box folding problems (http://www.fibonicci.com/spatial-reasoning/test/) require understanding based amost solely on vision. IQ tests are a practical and more logical substitute for the frequently mentioned Turing Test, so I’m glad to see this trend.
(2) Vision is also one of the most difficult problems in AI, so considering vision problems is a quick way to assess the potential power of a given proposed AI architecture.
(3) Language understanding seems to be based on visual information: that words serve more the purpose of invoking images than serving as tokens in a symbol-manipulating system.
(4) One theory is that the brain simulates an inferential visual map of the universe, which serves as the basis of understanding everything in the “perceptual bubble” surrounding the organism. (see: “Henry Markram: A brain in a supercomputer”, http://www.ted.com/talks/henry_markram_supercomputing_the_brain_s_secrets.html, @ 1:46)
(5) Animals seem to use vision as a foundation for most higher level tasks such as path planning, following steps of an algorithm, playing a board game, logic, etc.

This emphasis on vision narrows down the type of math that is likely to be involved as relating to intelligence. I realized that the search for “equations of thought” is such a deep topic that one must consider not just equations and entities within the types of numerical math that we are all used to, like algebra and linear algebra, but one must consider entire *branches* of math that most of us never deal with, especially nonvisual branches of math. This makes sense in retrospect because scalars are 0D entities (essentially points on a line) whereas images are 2D or 3D entities, so one wouldn’t expect a higher dimensional entity to be easily describable using single numerical values. Even with an array you’d still be dealing with a finite set of numerical values to describe a continuous object with an uncountably infinite number of points, so there is obviously an inherent disparity there.

It is interesting to note that math isn’t officially divided into two superbranches called “numerical” and “non-numerical”: that is my own classification I find useful for this pursuit of the mathematics of thought. Fortunately, James Bailey makes a similar implication when he calls math operations on numbers-as-we-know-them “numerocentric operations” (After Thought: The Computer Challenge to Human Intelligence”, James Bailey, 1996, page 9), so at least I know I’m not alone in my manner of thinking about this problem.

There exist only two non-numerical branches of math, as far as I know: group theory and topology. Even as a math major I never took any courses in either of these topics, so I had to learn them both on my own. This post is about group theory.

My default and paradigmatic visualization of group theory is of the Rubik’s Cube, which is what initially motivated me to start studying group theory. In the Rubik’s Cube there are no numbers inherently involved, other than trivial and almost irrelevant things like a count of the number of cubelets (= 27) or faces (= 54), so obviously numerical math doesn’t apply to that puzzle, and group theory is not numerical math, so the two topics fit each other well. The Rubik’s Cube is a physical object with a finite set of a discrete number (27) of objects (the cubelets) that affect each other in predictable ways as the faces of the cube are rotated. There is a finite (though extremely large!) number of possible states of the Rubik’s Cube, and the states relate to each other by obvious operations such as “turn the right-hand face one quarter turn clockwise” or “turn the front face one quarter turn counterclockwise”.

Group theory can describe Rubik’s Cube states with each state as a different node in a graph, with labeled lines connecting those state nodes with one of the operations (= face turns) that caused the one one state to be transformed into the other. Such a diagram is called a Cayley diagram. The entire Cayley diagram is essentially the “group” of the Rubik’s Cube, and each single state/node of a Cayley diagram is called an “element”, just as elements of a set are called “elements” in set theory. The goal of group theory is typically either to compare a newly observed group to known groups to see if it matches, or to make observations about the transitions between the graph’s nodes, especially to look for patterns such as cycles, “normal” subgroups, and symmetry. It is easy to see that group theory is a type of math because clear-cut entities are being operated upon by clear-cut operations, which result in more clear-cut entities. The main differences are that the entities are states of an object rather than numbers, and the operations are physical manipulations rather than addition or multiplication. Basically group theory arose by abstracting the concept of state instead of the concept of quantity, but it’s still just math with a different foundation.

It should be clear immediately from the above description that group theory on such a regular object with discrete operations does not apply to typical vision problems such as pattern recognition, because:

(1) In the real world, objects typically aren’t composed of well-defined, regular shapes like the cubelets in a Rubik’s Cube, and aren’t regular themselves.
(2) Modifications of objects in the real world are typically very different than the discrete operations performed on discrete objects like a Rubik’s Cube.
(3) The goals are different: in the real world the goal typically is to recognize an object or to predict how modification will affect it, whereas in a regular object like a Rubik’s Cube the goal of group theory is typically to compare two groups overall that have the same number of states, or to detect a pattern across a series of related operations.

However, there does exist a modification of group theory called “continuous group theory” that is more applicable to continuous real-world objects, and also, regular objects aren’t technically required by group theory, since group theory is interested in the operations, not the objects:

Groups don’t normally require any structure on their members beyond what’s required to make the group operator work properly. You can define a group whose values are a set of points, a set of numbers, a set of coins – very nearly anything you want.

http://scienceblogs.com/goodmath/2007/03/19/the-mapping-of-the-e8-lie-grou/

Some of the more common and more important continuous groups are the Lie groups (pronounced like “Lee”), which in my understanding consider the set of rotations of a physical object to be a group (http://theoxyanionhole.blogspot.com/2013/06/what-fck-does-abstract-algebra-have-to_2.html), and are linear besides (http://www.euclideanspace.com/maths/discrete/groups/lie/index.htm), which makes them simpler, practical, and particularly well-suited to physics problems. Rotation, translation, and scaling are the primary desired invariants in object recognition, so such transformations are directly applicable to AI.

Admittedly at this point, my understanding of group theory starts to falter. However, it seems to me that by considering a group of rotations of a single object that Cayley diagrams must be discarded since there would be an infinite number of possible rotation angles, therefore part of the value of discrete group theory, namely examination of subgroups, must disappear. This is an unfortunate loss of information, especially considering that it was the search for normal subgroups within group A5 that allowed the famous proof of unsolvability of the quintic polynomial over arithmetic and radicals to be discovered. When considering the group of possible rotations, I believe the entire group then becomes nothing more than a single object governed by a single operation. In that case it seems to me there is no need of group theory at all in AI, since such transformations are already describable by a simpler group called the “affine group”, based on the well-known affine transformations (http://en.wikipedia.org/wiki/Affine_transformation) that can be represented with small arrays.

This representation exhibits the set of all invertible affine transformations as the semidirect product of Kn and GL(n, k). This is a group under the operation of composition of functions, called the affine group.

http://en.wikipedia.org/wiki/Affine_transformation

Therefore group theory, whether in discrete form or continuous form, seems to me not to be directly suitable for real-world vision problems: either discrete objects aren’t applicable, or the usefulness of subgroups disappears, or the math reduces to the numerical math of affine transformations.

Nevertheless, group theory is fascinating, and it’s so easy to learn because there are no numbers! A serious consideration would be to teach group theory to kids who hate math since instead of dealing with tedious carries and memorization of multiplication tables, the kids could draw pictures instead, which would make the topic much more understandable. Group theory is also valuable to scientists in all fields who seek to describe regular patterns they encounter in their work. Groups are found everywhere in life, especially music (such as in symmetry of song composition, chord modifications producing new types of chords, or rotation of a given scale configuration to produce a new scale), crystals (where atoms form lattices of repeating structures), quantum mechanics, dancing, friezes on walls, flipping a mattress over, the mathematical proof of quintic equation solvability, error-correcting computer codes, various puzzles (Rubik’s Cube, 15 Puzzle), and more. If you’re interested, some great, easily understandable introductions to group theory are:

()
http://opinionator.blogs.nytimes.com/2010/05/02/group-think/?_r=0 - excellent intro
()
“Visual Group Theory”, Nathan C. Carter, 2009 - excellent, I want to buy this one to read whenever I need to kill time
()
“Group Theory in the Bedroom, and Other Mathematical Diversions”, Brian Hayes, 2008 - only one chapter on the group theory of flipping a mattress
()
“Mathematics: The Science of Patterns: The Search for Order in Life, Mind, and the Universe”, Keith Devlin, 1994 - only one section on group theory
()
“The Language of Mathematics: Making the Invisible Visible”, Keith Devlin, 1998 - identical text to the Devlin book above, just a different title

 

 
  [ # 3 ]

PART 4: AFFINE TRANSFORMATIONS

—————
ERRATA ON GROUP THEORY:
(1) I forgot that Cayley diagrams can also represent states by listing the operations that it took to get to those states, not just by showing the states themselves. Therefore a group describing continuous rotation could have a finite, discrete Cayley diagram by showing the operations, and in fact the “multiplication table” equivalent of such a Cayley diagram was already shown in one of my links (http://theoxyanionhole.blogspot.com/2013/06/what-fck-does-abstract-algebra-have-to_2.html). However, my general conclusion is still unchanged: in object recognition the series of rotations that led to a given angle of the viewed image are not of interest, especially since affine transformations presumably can move the object to that state directly, and the object still needs to be recognized in its final state, which is the difficult AI problem that group theory doesn’t address.
(2) Automata theory technically would be another non-numerical branch of math, along with group theory and topology, even though the computers it models in practice typically perform mathematical operations. That’s because Turing Machines specify only change of states, not arithmetic operations as assembly language would do.
—————

One of the most striking abilities of animal brains is rapid object recognition, which again emphasizes the importance of vision in intelligence. Real-world objects are routinely and rapidly recognized by biological visual systems, regardless of the size, location, or orientation of those objects in space. If animal brains are so adept at that task and computers are so poor at it, that suggests it would be particularly valuable to examine the mathematics that describe those variations of objects. It turns out that the mathematics that describes the changes in coordinates for any of the three operations of scaling, translation, and rotation is relatively simple, and is called affine transformations. (“Affine” is pronounced exactly as in the sentence “That’s a FINE transformation you got there, Jethro!”)

Affine transformations are transformations that operate on images and leave straight lines as straight lines after the transformation, leave parallel lines parallel, and preserve ratios of distances along lines, though they might alter angles or line segment lengths in the image. Except for skew, which is one of about five named affine transformations, the exact shape of the object is preserved through the transformation.

In geometry, an affine transformation or affine map[1] or an affinity (from the Latin, affinis, “connected with”) is a function between affine spaces which preserves the affine structure. This means that an affine map sends points to points, lines to lines, planes to planes, etc. An affine transformation does not necessarily preserve angles or lengths, though it does preserve ratios of distances between points lying on a straight line. Also, sets of parallel lines remain parallel after an affine transformation.

http://en.wikipedia.org/wiki/Affine_transformation

Affine transformations work equally well in 3D as in 2D: the arrays just need to be one row and one column longer (= an “augmented matrix”) for 3D.

While it’s certainly true that animal brains must and do recognize objects regardless of their size (= scaling) in the visual field, necessary since perceived objects can occur at any distance from the eye, and similarly for where an object is arbitrarily placed (= translated), there is evidence that brains don’t recognize objects well if they are rotated. In fact, brains are particularly poor at recognizing faces that have been rotated, so much so that this phenomenon has been given the name “The Thatcher Effect”:

http://en.wikipedia.org/wiki/Thatcher_effect
http://www.bbc.co.uk/bang/article_thatcher.shtml

That effect probably sheds some light on how humans recognize faces, recognition of which is a particularly subtle and difficult task, as computer programmers have discovered repeatedly. (I hear Google has recently made astonishing progress in this field, however.) That also suggests that animals don’t have good ability to produce rotation invariance in their brains when recognizing *any* particular object, which has also been shown to be true via psychological studies. How do animals do that task so well, then? They apparently rotate an image in their brain until it matches an orientation they *do* recognize! There have been famous experiments that show humans perform such mental rotation…

(p. 225)
  A famous and striking example is Shepard and Metzler’s (1971)
mental rotation experiment. They showed subjects pairs of line
drawings (see figure 1) and asked them whether the two drawings
depicted the same object. Sometimes the objects were the same
but oriented at different angles; at other times, they were different
(p. 227)
objects. Participants claimed that they answered by rotating one
object in their minds
until it lined up with the other, permitting
them to see whether it matched. To check this claim, the exper-
imenters measured how fast the subjects answered and then plot-
ted these times against the angles of rotation required to line up
the respective pairs. The reaction time (minus a small constant)
is directly proportional to the amount of rotation needed. It’s hard
to see how this could happen if subjects weren’t really rotating
images
; so the natural conclusion is that they are—and, moreover,
at a constant angular velocity. (The rate varies from person to
person, but averages about 60 degrees per second or 10 rpm.)
  Subsequent experiments supported that interpretation by con-
firming that images pass through intermediate angles as they
rotate, just as if they were glued to a rigid turntable.

(“Artificial Intelligence: The Very Idea”, John Haugeland, 1985)

...but a lesser known and possibly very important fact is that even bees perform such mental rotation!

(p. 354)
  Individual bees learn patterns of colors and can recognize a horizontal target from
a new direction, indicating an ability to do mental rotation.

(“What Is Thought?”, Eric B. Baum, 2004)

Bees have only 960,000 neurons (http://en.wikipedia.org/wiki/List_of_animals_by_number_of_neurons) in their brains—about 10^4 fewer than humans—so this is a fairly impressive feat, and lends credence to my belief that biological vision is using some particularly clever and particularly fast method for manipulating images, which suggests biological vision is also using some clever and efficient means of *representing* those images, which gets to the very heart of AI, since we haven’t yet learned how brains represent knowledge in general and yet here hints of how it’s done are screaming at us. That is one reason I believe major clues to AI can be found by studying vision. (By the way, bees are generally more intelligent than wasps, and vision in flies is very different than vision in humans, so one has to be careful about assuming that all animals have similar visual systems.)

We can see easily from the mathematics of affine transforms why rotation is extra difficult: whereas all affine transforms involve some scalar multiplication and scalar addition (since affine transforms involve multiplying a vector and an array), rotation additionally uses trigonometric functions. Trig functions are difficult to compute by hand, which anybody who has taken high school geometry has probably learned, and which anybody who has seen the infinite series definitions of trig functions *definitely* has learned (http://en.wikipedia.org/wiki/Sine). The formulas for affine transforms can be seen at: http://people.cs.clemson.edu/~dhouse/courses/401/notes/affines-matrices.pdf

The presence of trig functions might have later implications in AI: if we can build our intelligent machines however we want, we might prefer to build in trig functions to make our machines even better than biological brains right from the start, since we can then save computation time by having specialized hardware to perform the particularly difficult rotation operation! This might be quite easy to do since analog components already exist to produce the needed trig functions, such as the Wein bridge oscillator (http://sf0.org/levitatingpotato/Analog-Computer-Creation/; http://en.wikipedia.org/wiki/Wien_bridge_oscillator). Another implication is that even if we produce an idealized form of mathematics that performs object recognition using rotation invariance, we might not be modeling accurately what brains actually do.

As an aside, I used to think that affine transformations were *more* than was needed to describe image motion since affine transforms also include shear, and shear distorts the shape of a 2D image, which seems undesirable for object recognition. But I was thinking 2-dimensionally instead of 3-dimensionally: in three dimensions if you rotate a cube and look at one of the faces slanted away from you, the 2D shape you see for that face is a parallelogram, not a square, and a parallelogram is produced by a shear operation applied to a square. So affine transformations are exactly what is needed, no more and no less, for translation, rotation, and scaling of 3D objects, unless you try to include perspective.

Another aside: Whereas most of the affine transformations can be represented as simple 2x2 arrays, the translation transformation cannot be, since translation is not a linear transformation. However, there is a simple mathematical trick that allows the translation formula to look just like all the other affine transformation formulas: use 3x3 arrays that contain an extra element in them, namely the constant “1” placed at the bottom on the array diagonal. Those resized/augmented arrays then implement what are called “homogeneous coordinates”, and are the standard representation method used. (http://en.wikipedia.org/wiki/Transformation_matrix)

Are affine transformations all the math we need for AI, then? No, almost certainly not, though they *might* be *one* of the needed parts. One big problem, which I mentioned before, is that *every point* in the image must be manipulated using array multiplication in order to get the image mathematically transformed, which is probably impossible even in theory. Even if a brain could perform that calculation, that brain probably wouldn’t have any conception of what it was doing, especially the concept that all those points it just manipulated belong to the same object, which means our brains might still be incapable of recognizing objects as complete, unique entities separate from their parts. Also, affine transformations don’t do much toward object recognition: they just move the object around. To peform recognition in this fashion, the object would have to be intelligently matched in size, position, and orientation to some internally stored template, then a comparsion peformed on both overlaid images, but use of intelligence to do that brings us right back to where we started in trying to describe intelligence mathematically. There are even bigger problems in relating image manipulation to AI that I won’t cover here. But at least affine transformations provided some insight into the level of mathematics needed for image manipulation, and at least we saw the importance of image manipulation in intelligence and problem solving.

 

 

 
  [ # 4 ]

We can see easily from the mathematics of affine transforms why rotation is extra difficult: whereas all affine transforms involve some scalar multiplication and scalar addition (since affine transforms involve multiplying a vector and an array), rotation additionally uses trigonometric functions. Trig functions are difficult to compute by hand.

You don’t need trig functions to do image rotation, rotation by 3 shears is probably more akin to what a biological system would use.


http://www.leptonica.com/rotation.html
http://www.ocf.berkeley.edu/~fricke/projects/israel/paeth/rotation_by_shearing.html

 

 

 
  [ # 5 ]
Merlin - Aug 18, 2013:

You don’t need trig functions to do image rotation, rotation by 3 shears is probably more akin to what a biological system would use.

Great information, thanks! I’d never heard of those alternative methods. If triple sheers is the method biological systems are using, then it should take exactly 3 times as long to rotate an image as it would be to shear it (or translate or scale it). That hypothesis should be easily testable via experiment. (Anybody out there looking for a good science project idea, or a good psychology experiment to publish?)

 

 

 
  [ # 6 ]

Hi Mark,

just a quick note that I have just finished reading through all of these posts, and have found it really interesting to have such a detailed insight in your thoughts - I hope you keep them up! From my educational background I am quite familiar with most topics you describe. I studied Lie groups quite intensively at one point for example, and the coordinate-system-juggling going on when describing connected solid bodies - which is what it boils down to I think - also rang quite some bells, including those darn tensors, which I agree are annoying smile I found myself having many remarks while reading that may hopefully contribute to some aspects of your understanding. I’m exhausted now though, so it will be for later smile Just wanted to comment here as a thank you note and to express my hope that you go on.

It really IS mind-blowing to contemplate how the heck the brain manages to process / store / retrieve visual information so rapidly, I definitely share your sense wonder there. My bet would still be on parallel rather than analog computing for the answer though.

 

 
  [ # 7 ]

Wouter,

Thanks, at last some feedback from people today! For a while it felt like I was posting messages into a void where the statistics said people were reading them, but nobody had anything to say. Were they all secretly muttering, “There he goes again.”? And yes, of course I’ll keep up my research, thoughts, and posts. To borrow a phrase from “The Terminator”: “That’s what he does. That’s *all* he does!” grin

 

 
  [ # 8 ]
Mark Atkins - Aug 19, 2013:
Merlin - Aug 18, 2013:

You don’t need trig functions to do image rotation, rotation by 3 shears is probably more akin to what a biological system would use.

Great information, thanks! I’d never heard of those alternative methods. If triple sheers is the method biological systems are using, then it should take exactly 3 times as long to rotate an image as it would be to shear it (or translate or scale it). That hypothesis should be easily testable via experiment. (Anybody out there looking for a good science project idea, or a good psychology experiment to publish?)

Triple shears taking 3 times as long would only relate to items processed in a serial fashion. In a biological system, it could be done in parallel. Additionally, if repeated frequently enough, a mapping would be created for a direct answer.

In vision systems, the raw pixel data is fed into the neural net that looks for edges and motion. The goal is to find something interesting. Noise, or similar pixels are much less interesting than edges and motion. These higher level approximations are learned. Some neurons develop for specifically recognizing the orientation of a line. Image segmantation could be key to object recognition.
http://cs.brown.edu/courses/cs195-5/spring2012/lectures/2012-01-26_overview.pdf

As a biological entity continues to build experiences, neurons are added, allowing an ability to recognize things like a place as a shortcut. Place cells in the hippocampus are an example.
https://en.wikipedia.org/wiki/Place_cell


Other image processing items of potential interest:
http://www.phrogz.net/SVG/animation_on_a_curve.html
http://cs.brown.edu/~pff/segment/
http://cs.brown.edu/courses/cs195-5/spring2012/calendar.html

 

 

 
  [ # 9 ]

Funny thought: all these ponderings on the incredibly complex math that seems to be connected to such elementary-seeming low-level (in the biological sense) stuff as making sense of what we see is a wonderful illustration of Moravec’s Paradox:

http://en.wikipedia.org/wiki/Moravec’s_paradox

(‘in AI, what is easy is hard, what is hard is easy’)

[edit by Dave]
The above link is another example of the forum software “breaking” a link in order to keep us safe. I’ve tried fixing it, but once again, the forum software isn’t playing nice. Nothing I can do about it, I’m afraid. The link shows an apostrophe (single quote) in the title, but it’s missing from the actual link. Sorry.
[/edit]

 

 
  [ # 10 ]

Wouter, please don’t think I’m picking on you. This isn’t your doing at all, but is a problem with… well, it’s not on you.

Anyway, what I may suggest is any time you need to post a link that contains “non-traditional” characters (single quotes/apostrophes, left or right parens, or others that I can’t currently recall), it may well be best to take the extra step and create a tinyURL or bit.ly link that CAN be posted here. Just be sure to warn folks about the redirect, is all. smile

 

 
  [ # 11 ]

Ah, whoops, hadn’t noticed that the link didn’t work. Anyway, WORKING LINK TO WIKIPEDIA ARTICLE: http://tinyurl.com/4qrpvda

 

 
  [ # 12 ]

That’s ok, Wouter. I’ve got your back. smile

Well, actually, I have everyone’s back, here. cheese

 

 
  [ # 13 ]

PART 5: TOPOLOGY

The other well-known, non-numerocentric branch of math is topology. I’ve been learning topology in only the past few weeks and already I’m really being drawn to it, despite my expectations to the contrary: it’s very clever stuff, it relies on relationships I never imagined could even exist, it contains some very advanced, interesting modern topics like high dimensional manifolds that I’d been wondering about recently even before I knew what a “manifold” was, and it appears to have some brilliant people involved in the field such as Stephen Smale (of chaos theory fame, http://en.wikipedia.org/wiki/Stephen_Smale). Unlike group theory, some branches of topology do *start* to relate to AI, to my surprise.

There exist many subfields within topology (http://en.wikipedia.org/wiki/Topology), especially:

point-set topology (= general toplogy) - very general, deals with open & closed sets
algebraic topology -  classification of surfaces, homology
geometric topology - manifolds, knot theory
network topology - connections of nodes and links (http://en.wikipedia.org/wiki/Network_topology)
differential topology - differentiable functions on differentiable manifolds (http://www.drchristiansalas.org.uk/MathsandPhysics/Topology/topbook.pdf)

I believe the main branch of topology that most people first hear about is algebraic topology, which deals with surfaces like tori and Klein bottles. The main goal of algebraic topology seems to be classification of surfaces, although one math professor told me the goal of point-set topology is to generalize the concept of convergence in point sets.

Classification of surfaces is superficially similar to object recognition because both goals abstract objects so that some attributes of the object become irrelevant. In object recognition some of the attributes that are usually irrelevant are size, location, orientation, and presence of mild distortion such as skewing or stretching, whereas in topology the main attribute that is irrelevant is stretching. In both cases it is the general semblance of an object that is important, especially in that the features that are ideally near each other are still near each other in a distorted version of the object. Also, both goals usually look at only the surfaces of objects, though for different reasons: algebraic topology is *only* concerned about surfaces by its definition, whereas object recognition looks at surfaces only because most surfaces are opaque rather than transparent or semi-transparent.

Inherently, though, topology fails to meet many of AI’s requirements of object recognition. The old example of a donut being topologically the same as a coffee cup (http://en.wikipedia.org/wiki/Topology) is a good example of this failure: although both objects have the same topological classification (genus 1), they look different enough that one object could not be confused with the other, and the fact that a different name exists for each object suggests that humans think of them as completely different kinds of objects, all of which is true. You wouldn’t want a vision system to mistake a donut for a coffee cup, for example, so right away it’s clear that mainstream topology isn’t useful for object recognition.

That fact discouraged me but it also got me thinking deeply and creatively… Why does no current branch of math apply to object recognition, which is such a fundamental goal? Is there yet another branch of math, as yet undiscovered, that *does* apply to object recognition? If so, what would that math be like, and how would one find it? How does one go about creating an entirely new branch of mathematics?

It’s maybe fortunate that I tried to learn the basics of two branches of non-numerocentric mathematics within a few weeks, because in doing so I was forced to rethink repeatedly over a short period of time what constitutes the abstract basis of each branch of math, since math nowadays could encompass some very abstract entities like Rubik’s Cubes, arbitrary surfaces, and knots, all of which can be operated upon with equally abstract operators to produce new entities of the same type, though usually by altering them in the process. I’m not sure if a name even exists for my concept of “an abstract concept that forms the basis of a top-level branch of mathematics”, although “algebraic structures” (http://en.wikipedia.org/wiki/Algebraic_structure) are the basis of a very general branch of math called “universal algebra” (http://en.wikipedia.org/wiki/Universal_algebra), whose basis is sets, so my “fundamental abstract entity” might be called generally an “element” of a set.

After my first day of learning about the abstract basis of topology, especially after learning about the abstract basis of group theory, I brainstormed that night on what would be needed for an AI math. I soon realized that every top-level branch of math was based on its own unique abstract concept that caused disparate things to be related. Some examples:

()
numerocentric math: the abstract concept of quantity
Example: 3 apples have something in common with 3 pebbles.
()
automata theory: the abstract concept of an effective procedure
Example: A digital computer has something in common with a cellular automaton.
()
group theory: the abstract concept of a group (= a set of elements + a certain type of operation on those elements)
Example: The 4 ways to flip a mattress have the same graphical relationship as the 4 ways to display your two hands. (http://opinionator.blogs.nytimes.com/2010/05/02/group-think/?_r=0)
()
topology: the abstract concept of connectedness
Example: A donut’s surface has something in common with a coffee cup’s surface.
()
transfinite arithmetic: the abstract concept of cardinality
Example: Adding element “0” to the infinite set {1, 2, 3, ...} seems not to affect the “size” of that set.

Such abstract concepts were clearly the key: all the well-known, top-level branches of math were based upon such concepts, but none of those existing concepts were particularly useful for object recognition. What concept *was* useful for object recognition, then? The answer: shape! A branch of math that used shape as an invariant would recognize that a big red square sitting flat has something in common with a little blue square tottering on its corner, and that recognition was exactly what was needed for a mathematical foundation of vision systems. Shape is something that would be invariant to all affine transformations except possibly shear, plus invariant to attributes like color, texture, and reflectance. It seemed I had just invented a new top-level branch of math! I began wondering what I would call my newly envisioned branch of math.

By striking coincidence, the very next day I stopped in a used bookstore and took a look at their math section. With my proposed shape-based branch of math on my mind, one book title on the shelf just about jumped out at me: it was entitled “Shape Theory”! I became even more excited when I read the introduction page:

(p. 7)
What is shape? What is form? To say that two objects have the same shape has an
intuitively obvious, but very imprecise meaning. As was pointed out by Lord and
Wilson in their preface to [69], a mathematics of form description and analysis is
greatly needed
. This monograph summarises a theory that may help towards this
goal, at least in the study of irregular shapes.
  The majority of the techniques of geometric pattern description require that the
objects being studied be smooth and fairly regular. Methods from differential
geometry, for instance, require smoothness whereas algebraic topological methods
require that the object, whether a physical or an abstract one, may be built up from
cells or simplices (see Spanier [96]). Increasingly, these methods have been applied,
to varying extents, to diverse practical problems of shape, and pattern description
and analysis (see Faux and Pratt [39] and Gasson [46]).
  Naturally occurring objects are rarely smooth and, as the work on fractals has
shown, are by their very nature irregular. Within the abstract setting also, objects
frequently occur that can be arbitrarily irregular, for instance closed bounded subsets
of an Euclidean space. Is it possible to extend methods used in the study of smooth
or regular geometric objects to such as these? With regard to the methods of algebraic
topology, the answer is positive and the resulting theory is known as shape theory.

  Geometric shape theory can be seen as an extension of the methods of algebraic
topology
to arbitrary spaces, but its general methodology is much more widely
applicable and the main aim of this monograph is the study of that shape theoretic
methodology. We hope to show that this methodology embodies its own form of
logic, a logic that is fascinating in its own right.

(“Shape Theory: Categorical Methods of Approximation”, J.-M. Cordier & T. Porter, 1989)

Darn! It looked like somebody beat me to my great new idea and had already done appreciable work in the field! But as I read farther down the page I began to become disillusioned with shape theory because it seemed they weren’t really tackling the same general problem I was:

(p. 7)
  Geometric shape theory as developed by Borsuk and others exemplifies a process
which is common in mathematical reasoning. Typically one has a class of objects on
which one has a reasonably complete set of information. In the case of Borsuk’s
shape theory, these objects are the finite polyhedra. This class is considered as a class
(p. 8)
of ‘models’ or ‘prototypes’ within a larger class of objects of interest
; in Borsuk’s
shape theory, this larger class consists of the compact metric spaces. The aim of the
exercise is to use approximations to the objects of interest by the models to study the
objects of the larger class. One may, for instance, seek to extend invariants know
to give good information on the models, to be applicable to the larger class of objects.
The best classical example of this is, in topology, probably the definition of Cech
homology groups extending the simplicial homology groups to polyhedra.

(“Shape Theory: Categorical Methods of Approximation”, J.-M. Cordier & T. Porter, 1989)

In other words, shape theory in its current form is still ad hoc, it requires limited classes of objects and approximations, and it is concerned with metrics. Note also that shape theory was described as being part of topology, namely algebraic topology. So two pressing questions came to my mind: (1) Why was shape theory included under topology instead of being a standalone top-level branch of math in its own right? (2) Why did shape theory deal so much with approximations instead of the absolutely precise math that is standard in almost all other branches of math? In a few days of research I was able to answer both questions. (Stay tuned to Mark’s Adventures in Mathland! smile)

At least now I knew I was now on the right path: my earlier intuition that either “Maxwell’s equations of thought” either didn’t exist or were part of some little-known branch of math (http://www.chatbots.org/ai_zone/viewthread/1432/) turned out to be exactly true, at least in all likelihood, because I had just discovered a very applicable branch of math that either hadn’t existed before or at least had not yet been sufficiently developed so that AI people generally knew about it.

If anybody’s interested in learning the basics of topology in a hurry, below are the best such sources I’ve found so far:

“The Language of Mathematics: Making the Invisible Visible”, Keith Devlin, 1998) - my favorite, very good
“Group Theory in the Bedroom, and Other Mathematical Diversions”, Brian Hayes, 2008) - limited coverage, but good

 

 
  [ # 14 ]
Merlin - Aug 19, 2013:

Triple shears taking 3 times as long would only relate to items processed in a serial fashion. In a biological system, it could be done in parallel. Additionally, if repeated frequently enough, a mapping would be created for a direct answer.

I thought about your comments for quite a while. If there were three parallel hardware layers in 2D, each layer of which was oblivious to the other layers except for input/ouput, and oblivious to the overall goal of the computation, then yes, the computation could be done in parallel. It would be weird to have a given point on the original image being jostled around for a total of 3 times in the layers just to move it a small desired amount in what could have been a smooth direct motion, but such a method would certainly work if the system knew where the boundaries of the object were. There would still be some kind of tripling effect of duration even in that case due to the 3 *layers* operating serially, but the duration might be negligible since the hardware would be so fast. If such a system were hardcoded, it wouldn’t ever learn, but if it were capable of learning, yes, it would eventually learn the simple, direct mapping on its own. Since sine functions of small angles can be approximated well by either lines (sin x = x, approximately, near x=0) (http://en.wikipedia.org/wiki/Small-angle_approximation) or a truncated Taylor series, trig functions might not be difficult for the brain to learn, or at least to approximate.

Merlin - Aug 19, 2013:

In vision systems, the raw pixel data is fed into the neural net that looks for edges and motion. The goal is to find something interesting. Noise, or similar pixels are much less interesting than edges and motion. These higher level approximations are learned. Some neurons develop for specifically recognizing the orientation of a line. Image segmantation could be key to object recognition.
http://cs.brown.edu/courses/cs195-5/spring2012/lectures/2012-01-26_overview.pdf

Typically such feature detectors are essentially built into the hardware (both in biological and artificial vision systems) since they are so routinely used and don’t need to vary. One biological exception: an animal must first pass through its “critical period” of learning how its environment works at the low level of features:

A well-studied example of a critical period in biology is the
development of orientation specificity in the visual cortex for cats (e.g.
Baxter, 1966): when cats are only exposed to certain patterns (e.g. only
horizontal stripes) during the first weeks of their lives, they will never be
able to perceive other patterns (e.g. vertical stripes).
http://www.lotpublications.nl/publish/articles/001314/bookpart.pdf

By the way, that PDF file link of yours has a slide labeled “Target Tracking” that essentially shows an example of Marr’s multiple coordinate systems within a single human body, the kind I was mentioning earlier in this thread. That file also shows what looks like Voronoi tessellation, Bayes nets, and invariant detection among multiple objects, each of which is a heavy topic.

I haven’t yet look at your other image processing links, but thanks for the information. I’ll get to those links eventually.

 

 

 
  [ # 15 ]

The “critical period” in biological systems trains feature detectors based upon the interesting items experienced during the early part of the animal’s life. These features are later used to encode memories. This leads to a number of potential problems for people trying to build AI systems that replicate biological ones.

Not only do you need to be able to codify the math of the neurons, but also how they connect to each other and strengthen or weaken based on stimulus in the environment. Additionally, you need an environment that is compatible and will reinforce correct connections (if you want to direct the system to do specific things). If you were to build an AI that had its own “critical period” where it built up appropriate feature detectors, the resulting memory coding model would be specific to that AI. You would not be able to transfer memories without the coding model, and each AI that undergoes a different critical period would have a unique memory encoding method. This would allow you to duplicate a seed AI from a specific point (like a digital DNA), but not share memories directly between different AIs.

 

 1 2 3 > 
1 of 3
 
  login or register to react