Perceptrons
These networks use synaptic weight adjustments to learn to recognize patterns. They work well for some
lower order phenomenon, but fail at higher order problems and the book Perceptrons proved that these basic, CNS-unit like perceptrons would not be able to carry out
tasks that are often quite trivial for humans, such as determining whether an image object was "connected".
The underlying rationale for perceptrons and other pattern-recognition
networks is usually not stated in both earlier and current works, but it seems the initial point is that when a pattern
of some sort (good or bad) is experienced that we want to recognize it at some later point in time (this thing is good
to eat; this is bad to touch; etc.). By adjusting weights in a perceptron, we create (I think) a device that will
recognize an item in that it will selectively respond only to that item and tell its fellow networks, "here is that think
that we previously decided was good."
Below is an excerpt from a 1960 report by Minsky:
G. Using random nets for Bayes decisions: The nets of Fig. 6 are very orderly in structure. Is all this structure necessary? Certainly if
there were a great many properties, each of which provided very little marginal information, some of them would not be missed.
Then one might expect good results with a mere sampling of all the possible connection paths w~~. And one might thus, in this
special situation, use a random connection net. The two-layer nets here resemble those of the “perceptron” proposal
of Rosenblatt [22]. I n the latter, there is an additional level of connections coming directly from randomly selected points
of a “retina.” Here the properties, the devices which abstract the visual input data, are simple functions which
add some inputs, subtract others, and detect whether the result exceeds a threshold. Equation (1), we think, illustrates what
is of value in this scheme. It does seem clear that such nets can handle a maximum-likelihood type of analysis of the output
of the property functions. But these nets, with their simple, randomly generated, connections can probably never achieve recognition
of such patterns as “the class of figures having two separated parts,” and they cannot even achieve the effect
of template recognition without size and position normalization (unless sample figures have been presented previously in essentially
all sizes and positions). For the chances are extremely small of finding, by random methods, enough properties usefully correlated
with patterns appreciably more abstract than are those of the prototype-derived kind. And these networks can really only separate
out (by weighting) information in the individual input properties; they cannot extract further information present in nonadditive
form. The “perceptron” class of machines has facilities neither for obtaining better-than-chance properties nor
for assembling better-than-additive combinations of those it gets from random construction."
This preceding text from Minsky 45 years ago [ Steps Towards Artificial Intelligence;
http://web.media.MIT.edu] is still quite relevant today and indicates the kinds of problems that machine learning researchers and appliers are
still struggling with today.
Some more text from: Steps Towards AI:
" H. Articulation and Attention—Limitations of the Property-List Method.
M. Minsky: [Note: I substantially revised this section
in December 2000, to clarify and simplify the notations.] Because of its fixed size, the property-list scheme is limited in
the complexities of the relations it can describe. If a machine can recognize a chair and a table, it surely should be able
to tell us that "there is a chair and a table." To an extent, we can invent properties in which some such relationships are
embedded, but no formula of fixed form can represent arbitrary complex relationships. Thus, we might want to describe the
leftmost figure below as,

"A rectangle (1) contains two
subfigures disposed horizontally. The part on the left is a rectangle (2) that contains two subfigures disposed vertically,
the upper part of which is a circle (3) and the lower a triangle (4). The part on the right . . . etc."
This section from Minsky's Steps Towards
AI (1960) illustrates an aspect of the problem of recursive decomposition and emphasizes the limitations of making lists
and such approaches. It seems to me that making lists is OK and is indeed something that animal brains have evolved
to do and that part of our problem is that we are trying to make lists that are TOO BIG, and therefore run into problems of
combinatorial explosion. It seems that instead we should start by trying to make AI-zebrafish, AI-dogs and AI-toddlers,
before taking on AI-adult humans. The world is highly complex to be sure, but the key issue is how much of that complexity
can we ignore, and how compactly can we condense the important complexities into category schemas that provided the necessary
details to compete against conspecific and interspecific competitors.
Eric Baum's "What is Thought?" postulates that inductive bias (IB),
i.e. our inclination to learn specific kinds of things and to be quite good at learning them, is encoded into our genome. A
variant of this is the case with larval zebrafish (discussed on another page here?) in that they will respond to things like
moving stripes, moving shadows, and moving spots very effectively, but without any actual learning (see e.g. McElligott and
O'Malley, 2005). Indeed, it seems certain that the first time a larval zebrafish sees a paramecium it will hunt and
kill it. This certainly could be viewed as inductive bias against the paramecium. It should presumably be more
properly called instinct, but the point here is that there is no learning curve: it knows what stimulus to "hunt" and it knows
how to track and capture it (also see Borla et al., 2002). There is no perceptron training time here. Evolution
has compacted a description of these things (detection, ID, motor program selection and implementation) into the zebrafish
genome, and the developmental processes that have taken place by 5 days post-fertilization have created suitable architectures
for carrying out this and many other sensory-motor tasks.
I lump this inductive guidance in with inductive learning bias because I think that
these are related things: the genome establishes a starting point on this learning curve (and many others) and then uses
inductive bias to ensure that we continue to learn and implement adaptive behaviors in regards to things that matter.
This aspect of IB may be a powerful ally in solving some of the problems of combinatorial explosions and recursive decomposition.
These ideas conflict with views of human cortex as a general purpose learning machine, but neurology tells us this description
is wrong: cortex and the rest of the CNS have many specialized parts. We evolved from teleost fishes that used specialized
parts of the CNS to discrimate signals telling them to escape, attack or navigate at a very early developmental age.
This does not mean that these same parts cannot contribute in a diverse set of other behaviors and processes: our systems
are flexible in such regard. Each cortical module has evolutionarily conserved features (6 layers, layer 4 input, etc.,
etc.) and Hawkins views of a general purpose architecture seems correct within limits. But
the critical view is what are these limits? What are the specializations between modules: it is these specializations
that will express inductive bias and help solve the problems that were brought to the forefront in Perceptrons.
|