Digital Entities Action Committee

Minsky and the Magic Perceptron

Site Map
Water Molecules and Consciousness
Digital Ethics
The DEAC Forecast
Dangerous Activities
Subconcious Info Processing
Semantics comes from Experience
Cyber Warfare
What Kurzweil Did not Consider
Stupid Machines!
What it is Like to be a Chair
Constructing Intelligence
What is Intelligence? .....[email JF]
More "On Intelligence"
Comp. Neuro. Primer
Cortical Operations
Human Gods -- Purveyors of Digital Warfare
War Bytes (fiction)
Story Boards
Borg Fish (and other stories of Ghosts in the Machine)
Who is DEAC?
DEAC Budget
Machine Learning by Minsky!
Sad and Angry Machines
Neuronal Diversity and the Thalamus
New Items -- in progress
Queries: Key Unknowns
Neurotransmitters 1
Neuronal Diversity
Marvin Minsky, inventor and visionary, co-wrote, with Seymour Papert, a seminal work of AI titled "Perceptrons".  This threw down a gauntlet to those who believed that simple neural networks could do "big things".
Minsky was also a Technical Consultant in Kubrick's: 2001 a space odyssey and patented the confocal microscope in the 1960's-- the most revolutionary development in microscopic imaging in the 20th Century.
These networks use synaptic weight adjustments to learn to recognize patterns.  They work well for some lower order phenomenon, but fail at higher order problems and the book Perceptrons proved that these basic, CNS-unit like perceptrons would not be able to carry out tasks that are often quite trivial for humans, such as determining whether an image object was "connected".  
        The underlying rationale for perceptrons and other pattern-recognition networks is usually not stated in both earlier and current works, but it seems the initial point is that when a pattern of some sort (good or bad) is experienced that we want to recognize it at some later point in time (this thing is good to eat; this is bad to touch; etc.).  By adjusting weights in a perceptron, we create (I think) a device that will recognize an item in that it will selectively respond only to that item and tell its fellow networks, "here is that think that we previously decided was good." 
Below is an excerpt from a 1960 report by Minsky:
G. Using random nets for Bayes decisions:  The nets of Fig. 6 are very orderly in structure. Is all this structure necessary? Certainly if there were a great many properties, each of which provided very little marginal information, some of them would not be missed. Then one might expect good results with a mere sampling of all the possible connection paths w~~. And one might thus, in this special situation, use a random connection net. The two-layer nets here resemble those of the “perceptron” proposal of Rosenblatt [22]. I n the latter, there is an additional level of connections coming directly from randomly selected points of a “retina.” Here the properties, the devices which abstract the visual input data, are simple functions which add some inputs, subtract others, and detect whether the result exceeds a threshold. Equation (1), we think, illustrates what is of value in this scheme. It does seem clear that such nets can handle a maximum-likelihood type of analysis of the output of the property functions. But these nets, with their simple, randomly generated, connections can probably never achieve recognition of such patterns as “the class of figures having two separated parts,” and they cannot even achieve the effect of template recognition without size and position normalization (unless sample figures have been presented previously in essentially all sizes and positions). For the chances are extremely small of finding, by random methods, enough properties usefully correlated with patterns appreciably more abstract than are those of the prototype-derived kind. And these networks can really only separate out (by weighting) information in the individual input properties; they cannot extract further information present in nonadditive form. The “perceptron” class of machines has facilities neither for obtaining better-than-chance properties nor for assembling better-than-additive combinations of those it gets from random construction."
This preceding text from Minsky 45 years ago [Steps Towards Artificial Intelligence;] is still quite relevant today and indicates the kinds of problems that machine learning researchers and appliers are still struggling with today.

Some more text from: Steps Towards AI:
"H. Articulation and Attention—Limitations of the Property-List Method.  M. Minsky:  [Note: I substantially revised this section in December 2000, to clarify and simplify the notations.] Because of its fixed size, the property-list scheme is limited in the complexities of the relations it can describe. If a machine can recognize a chair and a table, it surely should be able to tell us that "there is a chair and a table." To an extent, we can invent properties in which some such relationships are embedded, but no formula of fixed form can represent arbitrary complex relationships. Thus, we might want to describe the leftmost figure below as,

"A rectangle (1) contains two subfigures disposed horizontally. The part on the left is a rectangle (2) that contains two subfigures disposed vertically, the upper part of which is a circle (3) and the lower a triangle (4). The part on the right . . . etc."

This section from Minsky's Steps Towards AI (1960) illustrates an aspect of the problem of recursive decomposition and emphasizes the limitations of making lists and such approaches.  It seems to me that making lists is OK and is indeed something that animal brains have evolved to do and that part of our problem is that we are trying to make lists that are TOO BIG, and therefore run into problems of combinatorial explosion.  It seems that instead we should start by trying to make AI-zebrafish, AI-dogs and AI-toddlers, before taking on AI-adult humans.  The world is highly complex to be sure, but the key issue is how much of that complexity can we ignore, and how compactly can we condense the important complexities into category schemas that provided the necessary details to compete against conspecific and interspecific competitors. 

Eric Baum's "What is Thought?" postulates that inductive bias (IB), i.e. our inclination to learn specific kinds of things and to be quite good at learning them, is encoded into our genome.  A variant of this is the case with larval zebrafish (discussed on another page here?) in that they will respond to things like moving stripes, moving shadows, and moving spots very effectively, but without any actual learning (see e.g. McElligott and O'Malley, 2005).  Indeed, it seems certain that the first time a larval zebrafish sees a paramecium it will hunt and kill it.  This certainly could be viewed as inductive bias against the paramecium.  It should presumably be more properly called instinct, but the point here is that there is no learning curve: it knows what stimulus to "hunt" and it knows how to track and capture it (also see Borla et al., 2002).  There is no perceptron training time here.  Evolution has compacted a description of these things (detection, ID, motor program selection and implementation) into the zebrafish genome, and the developmental processes that have taken place by 5 days post-fertilization have created suitable architectures for carrying out this and many other sensory-motor tasks. 
       I lump this inductive guidance in with inductive learning bias because I think that these are related things: the genome establishes a starting point on this learning curve (and many others) and then uses inductive bias to ensure that we continue to learn and implement adaptive behaviors in regards to things that matter.  This aspect of IB may be a powerful ally in solving some of the problems of combinatorial explosions and recursive decomposition.  These ideas conflict with views of human cortex as a general purpose learning machine, but neurology tells us this description is wrong: cortex and the rest of the CNS have many specialized parts.  We evolved from teleost fishes that used specialized parts of the CNS to discrimate signals telling them to escape, attack or navigate at a very early developmental age.  This does not mean that these same parts cannot contribute in a diverse set of other behaviors and processes: our systems are flexible in such regard.  Each cortical module has evolutionarily conserved features (6 layers, layer 4 input, etc., etc.) and Hawkins views of a general purpose architecture seems correct within limits.  But the critical view is what are these limits?  What are the specializations between modules: it is these specializations that will express inductive bias and help solve the problems that were brought to the forefront in Perceptrons.

4th Millenium