1395 lines
58 KiB
Plaintext
1395 lines
58 KiB
Plaintext
FRED S. ROBERTS AND PATRICK SUPPES
|
|
SOME PROBLEMS IN THE GEOMETRY OF VISUAL PERCEPTION
|
|
1. A GENERAL MODEL
|
|
In view of the long history of discussion about the nature of perception, bo~ by philosophers and psychologists, it may seem foolhardy to propose to begin afresh with a general discussion. However, as we hope to show in this paper, many of the most fundamental problems about perception have not as yet been clearly settled and are just now receiving careful formulation. We believe that from a scientific standpoint the problems of perception are difficult, and we want to say at once that we do not propose to solve may of them here. Our purpose is mainly to set up a general model which we may use as a framework for discussion. In these general terms, we shall try-to summarize a class of empirical observations about perception, organize some of the fundamental problems into sharply defined classes, and suggest one or two possible explanations.
|
|
1.1. Physical Space vs. Perceptual Space We shall, as our title suggests, limit ourselves to a discussion of visual perception, and also to those visual phenomena involving perception of geometrical characteristics as opposed to such things as color, texture, and the like, although many of our remarks are more generally applicable. Our discussion for the most part will deal with binocular vision, although several of our explanations, notably those in terms of eye movements, will be monocular in nature. To begin with, we shall distinguish between physical space and perceptual space, the space from which we draw our 'conscious' perceptions. For the latter we shall also use the phrases visual space or subjective visual space. It seems sensible to take as physical space ordinary three-dimensional Euclidean space. This space we denote by E 3, or simply by E. (It is possible to argue about this choice of physical space.)
|
|
For perceptual space, we propose no a priori structure of a general nature. Indeed, we shall try to study how one might infer the geometric
|
|
173
|
|
Synthese 17 (1967) 173-201; © D. Reidel Publishing Co., Dordrecht-Holland
|
|
|
|
FRED S. ROBERTS AND PATRICK SUPPES
|
|
certain hyperbolic curves. Moreover, light arranged in physically straight lines is not always seen as straight by the observer. If a subject is asked to align two rows of lights in 'parallel straight lines' (parallel alleys), and then alternatively into 'lines with corresponding points equidistant' (distance alleys), the two resulting configurations are different, whereas in a Euclidean geometry they would be the same. If Land R denote the idealized centers of rotation of the left and right eyes respectively, then Figure 1 shows some of the configUrations in the horizontal p1an~ at eye level which are judged aligned. Figure 2 shows the parallel and distance alleys in the same plane.
|
|
|
|
•
|
|
|
|
c
|
|
|
|
L
|
|
|
|
R
|
|
|
|
Fig. 1.
|
|
|
|
PARALLEL-ALLEY
|
|
)\
|
|
|
|
DISTANCE ~ ALLEY
|
|
|
|
",--DISTANCE
|
|
ALLEY
|
|
|
|
•
|
|
|
|
•
|
|
|
|
L
|
|
|
|
R
|
|
|
|
Fig. 2.
|
|
|
|
Helmholtz [10] obtained similar results by the use of after-images. Fixate at the center of a horizontal straight line at eye level in the frontal plane in physical space, and then shift your gaze rapidly to the center of a parallel line below it. The after-image of the first line does not coIncide with the second, but instead, the first line continues to appear straight
|
|
|
|
176
|
|
|
|
PROBLEMS IN THE GEOMETRY OF VISUAL PERCEPTION
|
|
|
|
ht
|
|
|
|
while the new line appears .concave up. Conversely, the after-image
|
|
|
|
~o
|
|
|
|
cOIncides with a physical hyperbola, concave down. Similar results hold
|
|
|
|
ld
|
|
|
|
if we move our gaze upwards or deal with vertical lines. Thus the physical
|
|
|
|
It'
|
|
|
|
curves 'seen as straight' appear to be hyperbolas that are convex toward
|
|
|
|
in
|
|
|
|
the primary point of fixation. 2 (This includes the original horizontal
|
|
|
|
ile
|
|
|
|
line.) The curves seen as straight are at least qualitatively like those of
|
|
|
|
~n
|
|
|
|
Figure 3 below.
|
|
|
|
ye
|
|
|
|
Such results indicate that the primitive visual geometry differs from
|
|
|
|
ce
|
|
|
|
the physical geometry. Presumably ~he role of learning is to help us
|
|
|
|
overcome ~his difference. Thus in the case of straightness, for example,
|
|
|
|
we have to learn to see physically straight lines as straight.
|
|
|
|
2.2. The Eye-Movement Explanation
|
|
|
|
There are ~wo approaches ~o ~he s~udy of primitive visual geometry.
|
|
|
|
One is to try to describe precisely the properties of this geometry, and the
|
|
|
|
other is to try to explain why our primitive geometry is as it is.
|
|
|
|
The latter type of explanations, presumably, are physiological. In
|
|
|
|
our present kinematical situation, it is natural to try.to use eye movement
|
|
|
|
as a basis for understanding primitive visual space. Such an approach
|
|
|
|
goes back to Helmholtz [9, 10], and it is of interest to follow his presen-
|
|
|
|
tation, concentrating on straightness. The idealized eyeball may for
|
|
|
|
our purposes be considered a rigid body which rotates about a fixed
|
|
|
|
point O. If we fix our gaze at any point P in 'external space', then OP
|
|
|
|
will be called the visual axis. In particular, the point of fixation A when we
|
|
|
|
are looking straight ahead toward the horizon will be called the primary
|
|
|
|
point. It is natural ~o assume that the position E(P) of the eyeball when
|
|
|
|
the fixation point is P is completely determined by the primary position
|
|
|
|
E(A) and the visual axis OP.3 This fundamental law of eye movement is
|
|
|
|
known as Danders' Law. Thus, under Donders' Law, no matter how much
|
|
|
|
we move our eyes, if we return to looking at the same point in external
|
|
|
|
space, the eyeball returns to the same position. It is implicitly assumed that
|
|
|
|
the correspondence P-"E(P) is not trivial (i.e., not a constant map) and
|
|
|
|
also continuous. Finally, the discussion is really linlited to points within
|
|
|
|
a reasonable angular distance of the primary point. Donders' Law, which
|
|
|
|
postulates the existence of a correspondence P-"E(P), should be dis-
|
|
|
|
.e
|
|
|
|
tinguished from any pa~icular law of eye position, which specifies for
|
|
|
|
It
|
|
|
|
each point P the corresponding position E(P).
|
|
|
|
177
|
|
|
|
FRED S. ROBERTS AND PATRICK SUPPES
|
|
|
|
We would like to give, independent of any particular law of eye
|
|
|
|
position, an eye-movement definition of our primitive perception of
|
|
|
|
straightness. Using the notion of alignment as motivation, we say with
|
|
|
|
Helmholtz :that a curve C in physical space is 'seen as straight' (in the
|
|
|
|
primitive sense) provided that as we move our fixation point along C
|
|
|
|
(scan C), successive por,tions of the curve are imaged on exactly the
|
|
|
|
same elements of the central portion of the retina.
|
|
|
|
,
|
|
|
|
Using this definition, Horace Lamb [12] proves the following striking
|
|
|
|
theorem:
|
|
|
|
THEOREM: Under Danders' Law, it is not possible for every physically
|
|
|
|
straight line (segment) to be '''seen as straight'. (More precisely, under any
|
|
|
|
particular law of eye position, the class of those physical curves seen as
|
|
|
|
straight does not include all straight line segments.)
|
|
|
|
This theorem is a strong argument for the view that our primitive
|
|
|
|
visual geometry, for physiological reasons, cannot be Euclidean, and
|
|
|
|
so learning must enter. Thus, it seems likely we cannot perceive Euclidean
|
|
|
|
straightness at birth. Because we think this theorem is very important,
|
|
|
|
because it does not seem to be a well-known result, and because Lamb's
|
|
|
|
proof is not particularly rigorous, we sketch a proof in an appendix.
|
|
|
|
It might be objected that the Helmholtz definition of primitive straight-
|
|
|
|
ness corresponds more to constant curvature, :than to straightness.
|
|
|
|
However, we would argue that there does not seem to be any distinction
|
|
|
|
on the primitive level between these two concepts. And, even if the
|
|
|
|
objection is well taken, Lamb's :theorem (with the words 'judged
|
|
|
|
as aligned' replacing the words 'seen as straight') remains just as
|
|
|
|
startling.
|
|
|
|
Accepting the Helmholtz definition, and given the negative result of
|
|
|
|
Lamb, it becomes of interest to calculate exactly what curves in physical
|
|
|
|
space are seen as straight. To do this, we need a specific law of eye
|
|
|
|
position. It is sufficient to describe how to find position E(P) from point
|
|
|
|
P and the primary position E(A). Since the same visual axis corresponds
|
|
|
|
to many points of external space, we may limit ourselves to points P on
|
|
|
|
the surface S of a sphere about 0 surrounding the eyeball. We shall call
|
|
|
|
S the spherical field. In particular, we may assume A lies on S.
|
|
|
|
Probably the simplest law of eye position is Listing's Law, which says
|
|
|
|
that E(P) may be obtained from E(A) by a rotation of the entire eyeball
|
|
|
|
corresponding to the great circle arc AP on the spherical field. 4 (That is,
|
|
|
|
178
|
|
|
|
PROBLEMS IN THE GEOMETRY OF VISUAL PERCEPTION
|
|
|
|
eye
|
|
|
|
in moving our eyes in a haphazard path from fixation point A to fixation
|
|
|
|
of
|
|
|
|
point P, the end result is 'as if' we just rotated the eye directly.)
|
|
|
|
lith
|
|
|
|
Helmholtz proved the following:
|
|
|
|
the
|
|
|
|
THEOREM: Under Listing's Law, those curves in external space which
|
|
|
|
~C
|
|
|
|
are seen as straight are exactly those corresponding under projection from
|
|
|
|
the
|
|
|
|
the point 0 to arcs of circles on the spherical field which pass through the
|
|
|
|
point B diametrically opposite to the primary point A.
|
|
|
|
ing
|
|
|
|
In particular, in the plane perpendicular to the line OA, all of the
|
|
|
|
curves shown in Figure 3 are seen as straight. This result agrees with the
|
|
|
|
711y
|
|
|
|
experimental data mentioned.
|
|
|
|
7ny
|
|
|
|
: as
|
|
|
|
J.
|
|
|
|
-I-
|
|
|
|
:lve md
|
|
|
|
~an
|
|
|
|
mt,
|
|
|
|
A
|
|
|
|
lb's
|
|
|
|
~ss.
|
|
|
|
1
|
|
|
|
IT
|
|
|
|
LOn
|
|
|
|
Fig. 3.
|
|
|
|
the
|
|
|
|
,>ed
|
|
|
|
It should be remarked, as a final comment here, that our judgments of
|
|
|
|
as
|
|
|
|
straightness in 'real-life' situations can be made without eye movement.
|
|
|
|
We can even recognize straight lines which are flashed on a tachistoscope
|
|
|
|
of
|
|
|
|
so fast that no eye movements can be made. How then can eye movements
|
|
|
|
.cal
|
|
|
|
be used to account for the perception of straightness if the perception of
|
|
|
|
eye
|
|
|
|
straightness can be accomplished without eye movements? The answer
|
|
|
|
lint
|
|
|
|
here is that it is only our learned concept of straightness which can be
|
|
|
|
lds
|
|
|
|
perceived without scanning. Before learning, we require eye movements
|
|
|
|
on
|
|
|
|
to perceive alignment (cf. Hebb [8]). These observations indicate then,
|
|
|
|
:all
|
|
|
|
not that our eye-movement definition of primitive straightness is mis-
|
|
|
|
guided, but rather that learning plays a crucial role. For it seems that we
|
|
|
|
lYS
|
|
|
|
can see a certain familiar configuration on the retina and immediately
|
|
|
|
>all
|
|
|
|
infer that it is straight without scanning at all (cf. Platt [15]). We describe
|
|
|
|
IS,
|
|
|
|
a specific mechanism for such inferences in Section 3.3.
|
|
|
|
179
|
|
|
|
FRED S. ROBERTS AND PATRICK SUPPES
|
|
2.3. Recovering the Primitive Visual Geometry
|
|
The other approach to primitive visual space is to try to recover its geometry from certain observables. We are interested in studying what structures are compatible with our primitive visual perceptions, what relations are meaningful in our primitive visual space, etc. Hence we are using the word 'geometry' in a very general sense. The problems involved here are what observables to choose and what properties to study and derive.
|
|
For example, one whole collection of observables are our judgments of comparative distance, alignment or betweenness, parallelism, etc. We shall limit our discussion to these with the remark that choice of appropriate observables for the study of primitive visual space is very much an open question. These particular concepts all make sense in an abstract metric space. Blank [3], following Luneburg [13], has investigated to what extent primitive visual space is a metric space. He starts with observed relations Q* and B*, the comparative distance and betweenness relations; i.e., Q* consists of all quadruples (x, y, u, v) of points of visual space so that the distance between x and y is observed to be smaller than the distance between u and v; and B* consists of all triples (x, y, z) so that x, y, and z are observed to lie on a line, with y between x and z. It should be noted that Q* and B* do not necessarily agree with the corresponding Euclidean relations.
|
|
Given a metric d, we may speak of its comparative distance and
|
|
betweenness relations, Qd and Bd, defined respectively as {(x, y, u, v):
|
|
d(x, y)<d(u, v)} and {(x, y, z): d(x, y)+d(y, z)=d(x, z)}. Blank proves a representation theorem of the following form: THEOREM: Under a set of axioms A on the relations Q* and B*, there
|
|
exists a metric d on primitive visual space such that Q* = Qd and B* =Bd • Moreover, such a metric is unique up to a similarity transformation.
|
|
We quote this theorem, but without a detailed list of the axioms, because we feel that it is an example of the type of consideration involved here. Namely, it is an attempt to recover the geometry of visual space from certain observed relations. We are not asserting that a subject consciously makes judgments of numerical distance. The numerical measure is, in the words of Blank, "something superimposed upon his visual experience". It might, on the other hand, be a reasonable approach merely to describe the primitive visual space by describing the various
|
|
180
|
|
|
|
PROBLEMS IN THE GEOMETRY OF VISUAL PERCEPTION
|
|
|
|
observed relations, without trying to give numerical representations.
|
|
|
|
The specific Blank axiomatization has several serious drawbacks of
|
|
|
|
geo-
|
|
|
|
which Blank is aware. For example, the observed relation xy<uv defined
|
|
|
|
vhat
|
|
|
|
by Q*(x, y, u, v) is probably not transitive on 'pairs' as it must be if Q*
|
|
|
|
:ela-
|
|
|
|
comes from a metric. Such difficulties stern in large part from what hap-
|
|
|
|
sing
|
|
|
|
pens near the threshold of discrimination. This is typical, then, of the
|
|
|
|
here
|
|
|
|
difficulty in studying primitive visual geometry: local geometric and topo-
|
|
|
|
rive.
|
|
|
|
logical properties are obscured.
|
|
|
|
;s of
|
|
|
|
Beals, Krantz, and Tversky [2] list a set of axioms based on the
|
|
|
|
We
|
|
|
|
relation Q* alone, which may be compared with Blank's, although the
|
|
|
|
lpri-
|
|
|
|
axioms do not refer specifically to visual space. Their representation
|
|
|
|
1 an
|
|
|
|
theorem is also essentially the same. These axioms encounter the same
|
|
|
|
.fact
|
|
|
|
difficulty as do Blank's.
|
|
|
|
i to
|
|
|
|
To go a step further, a good description ofprirnitive visual space might
|
|
|
|
ob-
|
|
|
|
be some sort of coordinatization, together with a set of functions relating
|
|
|
|
ness
|
|
|
|
the physical and psychological coordinates. Blank obtains a coordi-
|
|
|
|
sual
|
|
|
|
natization by adding several powerful axioms, allowing him to prove:
|
|
|
|
:han
|
|
|
|
THEOREM: Primitive visual space is a Riemannian space of constant
|
|
|
|
that
|
|
|
|
negative curvature, i.e., a hyperbolic space.
|
|
|
|
mId
|
|
|
|
The result of this theorem is actually the starting point ofthe Luneburg
|
|
|
|
iing
|
|
|
|
theory. Usingtheextremely specific geometryimplied by the theorem, Blank
|
|
|
|
and Luneburg coordinatize visual space, write down an explicit metric in
|
|
|
|
and
|
|
|
|
terms of these coordinates, and investigate experimentally the relation
|
|
|
|
, v):
|
|
|
|
between psychological and physical coordinates. We feel, once again, that
|
|
|
|
these results are more significant for their approach to the study of primi-
|
|
|
|
tive visual space than for their detailed accuracy as a description of it,
|
|
|
|
although it should be added in all fairness that the hyperbolic geometry
|
|
|
|
explains several of the experimental results described earlier.
|
|
|
|
As a final comment, it should be noted that one of the advantages
|
|
|
|
lms,
|
|
|
|
of both the Blank and the Beals, Krantz and Tversky axiomatizations
|
|
|
|
lved
|
|
|
|
discussed above is that they allow for a simple algorithm to recover
|
|
|
|
lace
|
|
|
|
the metric from the observables. This is not true of the additional
|
|
|
|
~ect
|
|
|
|
Blank axioms required for the representation as a hyperbolic space,
|
|
|
|
:ical
|
|
|
|
and hence of his metric based on the explicit psychological coordinates,
|
|
|
|
his
|
|
|
|
without knowing the non-explicit relation between physical and psycho-
|
|
|
|
lach
|
|
|
|
logical coordinates. We feel in general that such representations should
|
|
|
|
lOus
|
|
|
|
be constructive in nature, allowing us to approximate the primitive visual
|
|
|
|
181
|
|
|
|
FRED S. ROBERTS AND PATRICK SUPPES
|
|
geometry by simple algorithmic procedures from the observed data.
|
|
3. LEARNING
|
|
We would like to turn next to an investigation of how our primitive visual space becomes modified through learning. We assume that a primary role of learning is to overcome the difference between primitive visual space and physical space. (It is immediately clear that it is not possible to completely overcome this difference. To a large extent, for example, we cannot avoid restrictions on our visual acuity and hence the existence of thresholds in our visual space.)
|
|
It should be noted that perception after learning involves both learned and primitive factors. By learned perceptual behavior, we mean perceptual behavior after learning, thus including the invariant primitive factors. An investigation of learning in perception should, we feel, be divided into two problems. First is the problem of giving a precise description or definition of the learned perceptual behavior. And second is that of suggesting how we develop a mechanism for exhibiting this behavior. We have been disappointed not to be able to find many mathematically oriented papers that discuss these problems.
|
|
3.1. Perception of Constancy
|
|
We shall restrict ourselves to discussion of the so-called 'perceptual
|
|
constancies'. The fact of perceptual invariance has been commented upon by philosophers and psychologists for a long time, and for good reason. The phenomena of size constancy and shape constancy are basic to our very ability to move around in the world, to relate to our environment, and so forth. It is almost inconceivable to imagine what would happen if we could not identify an object seen in different positions or orientations, under different conditions, and transformed in various ways. It seems correct to claim that to a very large extent, these constancies are learned, but the precise sense in which they are learned needs to be formulated with care. Proceeding with our twofold approach, we would like to start by giving a precise definition of 'constancy' and 'perception of constancy'. Our discussion should be viewed not so much as a definite proposal but rather as a hopefully fruitful framework for discussion.
|
|
Any description of the phenomenon of constancy leads to the idea that
|
|
182
|
|
|
|
ita.
|
|
tive
|
|
mamal
|
|
e to we e of
|
|
ned tual ors. into I or t of 'ior. :ally
|
|
Itual 'pon son. our lent, en if ons, :ems ned, with t by Ilcy' . l but
|
|
that
|
|
|
|
PROBLEMS IN THE GEOMETRY OF VISUAL PERCEPTION
|
|
there is a certain group 5 G of transformations of the physical space under which figural identity is preserved.6 Namely, if 8 and 8' are subsets of physical space E, then 8 and 8' are identified if and only if there is a
|
|
transformation t in G so that 8=tI8'. (The notation t18' means the restriction of t to 8'. By abuse of notation, we shall hereafter write 8=t(8')). We might prefer to think ofG as a group oftransformations on a selected collection F of subsets of E, called the 'relevant configurations'.
|
|
The idea of 'squareness' might then be looked at in the following platonic way: we have in mind an 'ideal' square in a standard position in physical
|
|
space. Given a configuration in physical space, we identify it as a square if and only if it can by a series of allowable transformations be super-
|
|
imposed on or made to cOIncide with this ideal square. Similar remarks might be made for 'right angle', 'circularity', etc.
|
|
For every group oftransformations we have a different type of constancy. 'Size constancy', for example, might correspond to taking the group of congruence (or distance-preserving) transformations of Euclidean threespace.7 A certain type of shape constancy arises by taking G to be the group of rotations of E. We view it as an empirical problem to identify the relevant groups G for different types of significant constancies, and prefer to speak in this generality here.
|
|
The group G determines an equivalence relation _ G on the subsets of
|
|
= E, defined by: 8 G S' if and only if there is a transformation t in G so
|
|
that 8=t(8'). We think of this equivalence relation as the constancy and we shall refer to == G as the 'constancy of type G'. Then our 'perception of
|
|
constancy' is the identification oftwo figures equivalent with respect to G. Pitts and McCulloch [14] suggest a mechanism for perception of
|
|
constancy which is of some interest, though we choose not to pursue it in detail here. Generalizing their ideas, we might suppose, as· in Section 2, that the image eventually becomes stabilized. Then, it is reasonable to assume that corresponding to each physical stimulus S there is a sensory
|
|
image cp(S). This might be thought of as a retinal image or a firing
|
|
pattern on the neurons in 'area 17' of the cortex. If If> denotes the class of
|
|
all cp (S), then, we suggest, there is induced on the space If> an equivalence
|
|
relation ~ G corresponding to =G' And the constancy in question is
|
|
'computed in the brain' by means of a functional F with the property that
|
|
F(cp)=F(cp') if and only if CP~GCP'. Pitts and McCulloch give a physiological interpretation of F.
|
|
183
|
|
|
|
FRED S. ROBERTS AND PATRICK SUPPES
|
|
3.2. Elementary Properties and Concepts
|
|
Our definition of constancy is actually independent of whether the perception of constancy is a learned or innate process. But it is clear our constancies are to a large part learned, and so we would like to turn to an explanation of how such learning might come about.
|
|
We begin by distinguishing two types of equivalence classes under a given constancy. One is the class of 'elementary properties' and the other is the class of 'concepts'. We shall propose ~hat a constancy is acquired through the learning of numerous elementary prope~ies and concepts which are invariant under it.
|
|
We cannot be extremely precise here, but w,hat we have in mind for
|
|
the elementary properties are such things as straightness, parallelism, perpendicularity, and roundness. On the other hand, a concept may be thought of as a collection of elementary properties, in a sense ~o be formalized in Section 3.4. For example, 'squareness', 'consists' of the elementary properties '~wo pairs of parallel lines', 'four right angles', 'four equal line segments', etc.
|
|
We feel that a precise determination of the elementary properties will have to be to a large part experimental in nature. (It should be noted, by the way, that a determination of the elementary properties depends on the constancy. Thus, if the underlying constancy group G is the group of all rotations, then any two Euclidean straight lines are eqnivalent under G and the property of being a straight line is a candidate for an elementary property - it corresponds to one equivalence class. Similarly, if G consists of just parallel displacements of E into itself, then the property of being a horizontal straight line is a candidate for an elementary property).
|
|
Experimental determination of elementary properties should center around what basic properties we use to organize our perceptions. Any study of visual perception must come to grips with the vast informationprocessing problems involved in organizing our perceptions. How do we pass from a mass ofperceptualinputto an organized conscious perception? How do we arrange our percepts into meaningful parts? How do we select what aspects of the stimulus are consciously perceived?
|
|
Our organization of figures seems to be greatly influenced by our learning, although there are probably some innate or primitive factors involved here too. We propose to identify the elementary properties with
|
|
184
|
|
|
|
PROBLEMS IN THE GEOMETRY OF VISUAL PERCEPTION
|
|
|
|
the meaningful units into which complex percepts are arranged. There is
|
|
|
|
a particularly frnitful and relatively new source of data in the light of
|
|
|
|
'-
|
|
|
|
which we can be a little more specific in our ideas here. This source is the
|
|
|
|
r
|
|
|
|
collection of experiments in which even involuntary f?ye movements are
|
|
|
|
J
|
|
|
|
eliminated. By various means, the image is stabilized on the retina,S
|
|
|
|
If this happens, the conscious perception soon fades. After a time,
|
|
|
|
a
|
|
|
|
however, the image alternately reappears and fades out in various 'mean-
|
|
|
|
r
|
|
|
|
ingful' units. Pritchard [16] describes some of these phenomena in detail.
|
|
|
|
d
|
|
|
|
From a geometric standpoint, some of the organizing factors appear
|
|
|
|
.s
|
|
|
|
to be straightness, parallelism, and similarity, all in the Euclidean sense.
|
|
|
|
That is, straight lines, parallel lines and planes, and similar figures,
|
|
|
|
usually appear and disappear together. Also, contiguity, symmetry,
|
|
|
|
l,
|
|
|
|
convexity, boundary or 'closedness', and angle, among other factors,
|
|
|
|
e
|
|
|
|
appear to playa role. Adjacent curves and closed figures, for example,
|
|
|
|
e
|
|
|
|
appear as wholes. These are the types of concepts Gestaltists often use.
|
|
|
|
,,e,
|
|
|
|
To give some examples, given an array consisting of rows of squares, the parts reappearing together are usually horizontal, vertical, and
|
|
|
|
diagonal rows. A Necker cube breaks up into single lines, parallel lines,
|
|
|
|
11
|
|
|
|
a pair of parallel planes or a pair of adjacent lines or planes. Finally,
|
|
|
|
I,
|
|
|
|
given a triangle and a circle, either these appear singly or alternatively
|
|
|
|
n
|
|
|
|
adjacent boundaries appear together. Summarizing, in some sense 'simpler'
|
|
|
|
)f
|
|
|
|
figures act as units. (Simplicity also seems to play a role in length of
|
|
|
|
:r
|
|
|
|
reappearance time,)
|
|
|
|
y
|
|
|
|
These data indicate that our elementary properties should be divided
|
|
|
|
ts
|
|
|
|
into two classes, primitive and learned. Those such as Euclidean straight-
|
|
|
|
g
|
|
|
|
ness, parallelism and the like are learned, while such factors as contiguity,
|
|
|
|
boundary and closedness are probably primitive. A second observation
|
|
|
|
:r
|
|
|
|
here, and one that we made earlier, is that our perception of the elementary
|
|
|
|
y
|
|
|
|
properties can be accomplished even without eye movements, Experiments
|
|
|
|
1-
|
|
|
|
in which the stimulus is flashed on a tachistoscope bear out this obser-
|
|
|
|
re
|
|
|
|
vation. We may recognize Euclidean (or learned) straightness without
|
|
|
|
I?
|
|
|
|
scanning. This is in direct opposition to our perception of primitive
|
|
|
|
re
|
|
|
|
straightness, which, as we saw in Section 2, is crucially tied to eye move-
|
|
|
|
ment. Any theory of learning in perception will have to account for these
|
|
|
|
lr
|
|
|
|
facts.
|
|
|
|
rs
|
|
|
|
The observation that the elementary properties correspond in some
|
|
|
|
:h
|
|
|
|
sense to the 'simpler' figures leads to one theoretical attempt at defining
|
|
|
|
185
|
|
|
|
FRED S. ROBERTS AND PATRICK SUPPES
|
|
the elementary properties. Attneave [1] suggests that simplicity has something ~o do with regularity or redundancy, in particular "predictability of the whole from a part". Thus, for example, a Euclidean straight line is so simple because we only need to know two points on it ~o know all others. And, two parallel lines are so simple because two points on one line and one on the second determine the pair. If a subject is shown a closed curve and is asked to represent it from memory, by, say, ~en points, then the points he chooses are those points where regularity is interrup~ed: corners, sudden bends, and changes in convexity. The reader is referred to the paper by Attneave and to Hochberg [11] for a more de~ai1ed discussion.
|
|
An alternative theoretical attempt at defining the elementary properties is considerably different from this one. It involves a study of the neural configurations or firing patterns in ~he cortex, and aims to describe the elementary properties as those corresponding to neural patterns satisfying certain criteria. This neural theory at the same time provides a mechanism for perception of these elementary properties.
|
|
Zeeman [18] provides a model of the brain which is particularly useful for the points we shall make. His model of the brain is a triple (C, "1, p), where C is the set of nerve cells in the cortex; "1 is the binary relation on C consis~ing of all pairs of cells (a, b) such that a can fire b; and p is a function p: "1-*[0,1] which represents the strength of the connection.9 We may alternatively think of pea, b) as the probability that if a fires, then b will fire.
|
|
The cortex C consists of three distinct classes of cells: S, T, and R. The cells in S receive sensory inputs; those in T may be fired by other cells in C; and those in R are the 'self-firers'. Sand T are disjoint, for otherwise we would confuse sense data and imaginings. A 'thought' or 'perceptual image' is then a 'firing pattern' on C, or more precisely a
|
|
function t: C-*[O, 1], where t(c) represents the 'rate of firing' of CEC.
|
|
Zeeman introduces a measure of the 'sharpness of an image', and suggests that straight lines, parallel lines, boundaries, and so forth, produce sharp images. His definition of sharpness set) of an image tis:
|
|
2: (t(C))2
|
|
eeC
|
|
S (t) = ---;2:=-t(:-:C)- •
|
|
° eeC
|
|
Note that set) is between and 1, and a sharp image occurs if the cortex
|
|
186
|
|
|
|
PROBLEMS IN THE GEOMETRY OF VISUAL PERCEPTION
|
|
|
|
is divided into two parts, one firing rapidly and the other slowly. This, as Zeeman says, is a crude measure. :aut it is, we feel, on the right track, and at least exemplifies how the notion of elementary property might be explicated by the equation 'elementary property = sharp image'.
|
|
|
|
3.3. Learning Elementary Properties
|
|
|
|
We propose as suggested above that a constancy is in part innate and in
|
|
|
|
part acquired through the learning of numerous elementary properties
|
|
|
|
and concepts that are invariant under it. In our idealized model, a concept
|
|
|
|
cannot be learned until various relevant elementary properties are learned,
|
|
|
|
and so we shall divide our discussion of learning into two parts, first
|
|
|
|
dealing with elementary properties and then with concepts in Section 3.4.
|
|
|
|
A mechanism for perception ofelementary properties is easy to describe:
|
|
|
|
each elementary figure (or instance of an elementary property, e.g., a
|
|
|
|
straight line) corresponds to a particular firing pattern in the sensory
|
|
|
|
input cortex S (to use the notation of Zeeman's model). Suppose that
|
|
|
|
l
|
|
|
|
after learning, each steady sensory input coming from a fixed elementary
|
|
|
|
figure always gives rise to the same firing pattern or image t not only on
|
|
|
|
,
|
|
|
|
S but after stabilization, on all of C. Then, each elementary property
|
|
|
|
l
|
|
|
|
corresponds to one or more such images.
|
|
|
|
l
|
|
|
|
To discuss the mechanism in more detail, let us concentrate on straight-
|
|
|
|
ness. Thus, for example, a horizontal line at a particular location in E
|
|
|
|
will produce a certain familiar image which becomes conditioned to the
|
|
|
|
phrase 'straight line'. Now, although there are infinitely many different
|
|
|
|
r
|
|
|
|
horizontal straight lines in physical space, physiological data indicate
|
|
|
|
r
|
|
|
|
that there is a relatively small number ofcorresponding excitation patterns
|
|
|
|
r
|
|
|
|
in S, only about 10 or 15 (Hebb [8]). If corresponding to each such pattern
|
|
|
|
1
|
|
|
|
there is after stabilization only one image, then, cortically we can dis-
|
|
|
|
tinguish only a small number of horizontal lines. Similarly, we probably
|
|
|
|
1
|
|
|
|
can distinguish only a small number of different slopes of lines, and for
|
|
|
|
each slope only a small number of lines of that slope. Thus, we must
|
|
|
|
condition to the words 'straight line' only a small number of firing
|
|
|
|
patterns. In this way, we have a mechanism for perceiving straight lines,
|
|
|
|
or alternatively horizontal lines, and similarly other elementary figures.
|
|
|
|
The major problem we face with such a model is in justifying the
|
|
|
|
supposition that a certain steady sense input, and hence, a fixed firing
|
|
|
|
pattern on S, always gives rise, under the influence of learning, to the
|
|
|
|
187
|
|
|
|
FRED S. ROBERTS AND PATRICK SUPPES
|
|
same image t on C. This is especially doubtful considering all the randomness built in~o our model. The first thing to do is to modify our demands. It is certainly unreasonable to require that a given steady sensory input coming from an elementary figure always give rise to the exact same image. Instead, we would be happy to have, using the terminology of Zeeman [18], a tolerance relation on the class of all perceptual images, so ~ha~ such an input always gives rise to two images that are 'close' or within tolerance.
|
|
Hebb [8] suggests a method by which this can come about, and this method is easily understood in the framework of Zeeman's model of the
|
|
brain. Recall that for Zeeman the brain is a triple (C, y,p), where C
|
|
and yare essentially unchanging physiological constants. Thus, our learning must involve change ofp. This change occurs, both Zeeman and Hebb suggest, through the process of facilitation: if cell a fires cell b, then in the future i~ is slightly easier for a to fire b.lO The physiological process of facilitation has been observed, and appears to be basically chemical in nature. It is easy enough to suggest a neural model for facilitation which might be compared with physiological data.
|
|
Suppose for simplicity tha~ firing patterns over a course of time are governed by a distribution so that for every pair of neurons a and b at each unit time tn> the probability that a and b both fire is u, the probability that only one of the two fires is v, and the probability that neither fires is l-u-v. Suppose thatpn(a, b)=Pn represents the 'strength' of connection at time tn> and that
|
|
1 if a and b both fire between time tn and time tn+1, rn(a, b)=rn= 0 if only one fires in this time interval,
|
|
1Pn if neither fires in this time interval.
|
|
Finally, if!J is a constant between 0 and 1, one possible learning procedure is to modify Pn according to the equation
|
|
Pn+1 = (1 -!J) Pn + !Jrn·
|
|
A simple computation shows that lim E(Pn)=uj(u+v). If now v is small,
|
|
i.e., if, frequently, when one of these neurons fires, then so does the other one; and if u is not too small, then uj(u+v)~ 1. Hence, the expected value of the strength of connection between a and b, or of the probability that if a fires then b does, approaches one as time passes. This is inde-
|
|
188
|
|
|
|
PROBLEMS IN THE GEOMETRY OF VISUAL PERCEPTION
|
|
|
|
ran-
|
|
|
|
pendent of the original strength of connection. (More complicated models
|
|
|
|
our
|
|
|
|
may be developed for more complicated distributions.)
|
|
|
|
:ady
|
|
|
|
Hebb's suggestion may now be formulated in this framework. There
|
|
|
|
the
|
|
|
|
are in the cortex certain neurons arranged in loops or cycles, Le., groups
|
|
|
|
ter-
|
|
|
|
al, a2, ... , an such that (al, a2)EY, (a2, a3)EY, ..., (an- l , an)Ey, (an> al)Ey.
|
|
|
|
per-
|
|
|
|
Let Al , A2 ••• , An be sensory input neurons so that Ai can fire ai and so
|
|
|
|
that
|
|
|
|
that the connections Ai to ai are strong in the sense that peA;, a;) is close
|
|
|
|
to 1. If we indicate the relation x can fire y by an arrow, the picture is as
|
|
|
|
this
|
|
|
|
shown in Figure 4.
|
|
|
|
'the
|
|
|
|
·e C
|
|
|
|
our
|
|
|
|
and
|
|
|
|
11 b,
|
|
|
|
p.cal
|
|
|
|
:ally
|
|
|
|
for
|
|
|
|
are
|
|
|
|
b at
|
|
|
|
,ility
|
|
|
|
Fig. 4.
|
|
|
|
es is
|
|
|
|
tion
|
|
|
|
Suppose now that Al , A2 , •.• , An correspond to the sensory input in
|
|
|
|
the cortex when, say, a given straight line (or other elementary figure) is
|
|
|
|
perceived. Then, Al , ••• , An are often stimulated, and hence fired, together; or more importantly, usually when one fires, then all fire. It follows that
|
|
|
|
in general, since the connections Af-+ai are strong, that some time later
|
|
|
|
lure
|
|
|
|
the neurons al , a2 , •••, all will all fire. Thus, the elementary figure or in particular the straight line in question, after learning, almost always
|
|
|
|
gives rise to an image within tolerance of the image t such that t(a;) = 1,
|
|
|
|
for all i=I,2, ... ,n and t=O otherwise. Such images then become
|
|
|
|
aall,
|
|
|
|
conditioned to the phrase 'straight line'. To test this model, let us return to the observation that elementary
|
|
|
|
the
|
|
|
|
properties, once learned, can be perceived without eye movement or
|
|
|
|
cted
|
|
|
|
scanning. Does the model have sufficient structure to account for this
|
|
|
|
,ility
|
|
|
|
phenomenon? The answer is affirmative. Suppose the elementary figure
|
|
|
|
ade-
|
|
|
|
gives rise to firing of the neurons Al , ... , All as above. Now, because
|
|
|
|
189
|
|
|
|
FRED S. ROBERTS AND PATRICK SUPPES
|
|
al' ... , an often fire together, the connections p(ai' ai+l) become strong through facilitation. Thus, after learning, if the single sensory neuron Ai or any subclass ofthe class A1, ••• , An is fired by a sensory input, it follows that all the neurons al' ... , an are likely to fire. Hence, without waiting for more 'input', we immediately 'reach the same conclusion' or 'have the same image' t, as if A1, ••• , An all had fired. Without any scanning, we instantaneously infer 'straight line'.
|
|
3.4. Learning Concepts
|
|
We turn now to the problem of providing mechanisms that will account
|
|
for the learning of concepts that are the invariants of a given group of transformations. This problem is formidable, and we do not pretend to offer a detailed theory here. We would like to sketch at least one approach that seems promising enough to be outlined. This approach builds on the ideas of stimulus-response learning theories, particularly stimulussampling theory. The psychological processes of stimulus sampling and conditioning are central to the theory, but some additional aspects are
|
|
needed to account for the phenomena at hand.
|
|
Let us begin by considering a concrete problem - that of recognizing regular polygons under rotation. A square is visually a square, no matter what angle its base forms with the horizontal. How do we recognize it in different positions of orientation? We may think ofthe stimulus elements here as the elementary properties of squares whose bases are horizontal.
|
|
Typical properties are these: four sides (I), a horizontal segment (h), a vertical segment (v), parallel sides (p), interior angles that are right angles (r), no curves but only segments (c), all segments of equal length (t), intersections only at end points of segments (i), and so forth. We are
|
|
not trying to give an exhaustive list. We may describe the set enumerated
|
|
as {f, h, v, p, r, C, t, i}. It is perhaps important to point out that the
|
|
step from the firing of individual neurons or the activation of individual receptor cells to recognition of these elementary properties is a large one conceptually. We are assuming only that it has been made already, hopefully along the lines suggested in Section 3.3.
|
|
Presented with a square whose base is horizontal, the individual can sample all the elementary properties listed and condition them to the concept of a square, or, to be more concrete, to the word square. He is now asked to pick out squares from a number of plane figures presented
|
|
190
|
|
|
|
rong ,n Ai lows iting e the :, we
|
|
ount Lp of ld to oach 11 the ulusand s are
|
|
izing .atter lze it lents Intal. t (h), right :llgth e are rated t the idual ~ one
|
|
~ady,
|
|
I can ) the He is mted
|
|
|
|
PROBLEMS IN THE GEOMETRY OF VISUAL PERCEPTION
|
|
to him. For simplicity, let us assume that the individual stores an ordered list of elementary properties, all of which are highly salient. He thus
|
|
converts the unordered set {f, h, v, p, r, c, I, i} into the ordered set (f, h, v,p, r, c, I, i). Presented with a figure he then checks off the ap-
|
|
propriate presence or absence of an elementary property. Asked if a triangle is a square, he can say 'no' immediately because the triangle has
|
|
property f, the negation off
|
|
Suppose the subject in our hypothetical experiment is now presented with a square whose base is at a 45 0 angle to the horizontal. Our subject will respond that this figure is not a square, because it has elementary properties hand iJ. He is corrected and told that it is a square. At this point learning and conditioning enter. With probability e he eliminates each elementary property that has varied, i.e., that is not an invariant. To eliminate here means, in the formal representation, to replace h or v by 0,
|
|
to indicate neutrality, not to replace h by h, of course, or v by iJ. Thus with probability e2, h and v are eliminated, with probability e(l-e) only h is, again with probability e(l-e), only v is, and with probability (1- e)2 neither is. More realistically we would probably want to introduce
|
|
a different elimination parameter e for each elementary property with the intention that e varies directly with the saliency of the property. Once both h and v are eliminated, the tilted square will be recognized as a square.
|
|
What we have just described is the approach to using positive instances of the concept to obtain information about the concept. If figures are presented on a randomized schedule and the subject has no choice of what is the next figure, then little information can be obtained from negative instances, at least little information at the elementary level. Indeed, if only certain elementary inference mechanisms are assumed it is easy to show that under randomized presentation schedules negative instances add no information whatsoever. It is true that more complicated hypotheses can be settled by inspection of negative instances, but in the present account we shall limit ourselves to the use of elementary steps of inference to pass from elementary properties to concepts and so learning will apply only to positive instances.
|
|
We now make the theoretical approach just sketched more precise and detailed. We let the set S be the set of elementary properties or stimulus elements. There are various ways of talking about these elementary properties. Here we shall simply treat the properties extensionally, so
|
|
191
|
|
|
|
FRED S. ROBERTS AND PATRICK SUPPES
|
|
|
|
that each elementary property Si is a finite partition ofF, whereFis the set
|
|
|
|
of geometric figures whose invariance under the group G of transforma-
|
|
|
|
I
|
|
|
|
tions is being learned. We assume that F is closed under G, i.e., iff EF
|
|
|
|
and TEG, then Tf EF, where Tf is the figure that results from applying
|
|
|
|
T to f We also assume that the set 8 is finite, and enumerate its elements
|
|
|
|
in the order Sl' ••• , SN' Thus, the basic situation facing an organism is
|
|
|
|
described in the present theory by a triple (8, F, G). In the psychological
|
|
|
|
literature it is also common to call what we have termed properties,
|
|
|
|
dimensions, and then to talk about the values ofthe dimensions, correspon-
|
|
|
|
i
|
|
|
|
ding to the elements in the partition. For property Si we shall use the nota-
|
|
|
|
IH
|
|
i'
|
|
|
|
tionsij to refer to thejth value (jth partition element) of the dimension or
|
|
|
|
property. The simplest case would be a two-element partition, e.g., S11 = at
|
|
|
|
least one line segment (in the figure), S12 = Sll = no line segment. A slightly
|
|
more complicated example would be: Sl.l = exactly one segment, S12 =
|
|
|
|
exactly two segments, S13 = more than two segments, S14 = no segment.
|
|
|
|
Following another terminology, which is increasingly used in the
|
|
|
|
literature of concept learning, we may say that a concept is then formally
|
|
|
|
represented by a template which is an N-tuple (tl' ..., tN) such that each
|
|
t i is some sij or the whole set F. The meaning of the last alternative is that no restriction is placed with respect to property Si on the figures
|
|
|
|
exemplifying the concept.
|
|
|
|
It is clear that the number of concepts that can be defined in terms of
|
|
|
|
8 is large, even for an 8 of modest size. For example, if we restrict
|
|
|
|
ourselves to five elementary properties each of which has five values,
|
|
|
|
then the number of extensionally different concepts, given that each
|
|
|
|
value of each property is exemplified in the set F, is 56. (The exponent is
|
|
|
|
6 rather than 5 because we include the possible value F.)
|
|
|
|
A concept C immediately defines a two-element partition of F, namely,
|
|
|
|
the partition of F into those figures that possess the property defined by
|
|
|
|
the concept and those that do not. With a slight abuse of language we
|
|
|
|
shall say that figures are elements or members of C, even though C is
|
|
an N-tuple and not a subset of F. The meaning is clear: for every f in F,
|
|
|
|
if C=(tl> ... , tN), thenf EC if and only if for 1::;, i <N,jEti• Following standard lines, we shall say that a concept C is invariant
|
|
|
|
with respect to a group G if for every figurefin F and every transformation
|
|
|
|
Tin G, fEC if and only if TfEC. We shall restrict ourselves for the
|
|
|
|
following discussion to concepts invariant under a given G.
|
|
|
|
192
|
|
|
|
PROBLEMS IN THE GEOMETRY OF VISUAL PERCEPTION
|
|
|
|
: set
|
|
|
|
For the present context, assumptions about sampling or observing
|
|
|
|
ma-
|
|
|
|
elementary properties will be highly simplified by assuming that all
|
|
|
|
'eF
|
|
|
|
elementary properties are observed or sampled whenever a figure is
|
|
|
|
ring
|
|
|
|
presented. Almost certainly this assumption will not be satisfied in actual
|
|
|
|
ents
|
|
|
|
experiments, but the use of it here affects only slightly the central ideas.
|
|
|
|
n is
|
|
|
|
Each figure f in F presented possesses some degree or value sij of each
|
|
|
|
~ical
|
|
|
|
elementary property Si' Thus each figuref may be described by an ordered
|
|
|
|
ties,
|
|
|
|
N-tuple (Ul' ... , UN) where for 1<i:::;,N, Ui is always Sij for some j. This
|
|
|
|
lon-
|
|
|
|
N-tuple will be called the elementary pattern of the figure f Again, for
|
|
|
|
ota-
|
|
|
|
the sake of conceptual simplicity, we are assuming that no perceptual
|
|
|
|
11 or
|
|
|
|
errors occur in deciding whether a figure has property sij'
|
|
|
|
= at
|
|
|
|
We now assume that learning a concept C involves learning the appro-
|
|
|
|
htly
|
|
|
|
priate template T=(t1, •••, tN). This is accomplished gradually. On each
|
|
|
|
,2=
|
|
|
|
trialn, tlle subject has a templateTn=(t1,n, ... , tN,n) associated with the con-
|
|
|
|
lent.
|
|
|
|
cept C. Through conditioning, Tn is modified until it eventually becomes T.
|
|
|
|
the
|
|
|
|
Prior to stating any specific axioms we need to define the notion of
|
|
|
|
Lally
|
|
|
|
the elementary pattern of a figure matching the template of a concept.
|
|
|
|
:ach
|
|
|
|
DEFINITION: Let U=(Ul' ... , UN) be the elementary pattern offigure f
|
|
|
|
Ie is
|
|
|
|
in F, and let Tn=(t1,., ••• , tN,n) be the template of concept C on trial n.
|
|
|
|
ures
|
|
|
|
Then U matches Tn if and only iffor every 1<i:::;, n, Ui c ti,..
|
|
|
|
Following familiar treatments of stimulus-sampling theory, we now
|
|
|
|
LS of
|
|
|
|
state axioms divided into the three categories of sampling axioms,
|
|
|
|
trict
|
|
|
|
response axioms, and conditioning axioms. (For such a treatment of
|
|
|
|
lues,
|
|
|
|
stimulus-sampling theory see Suppes and Atkinson [17].)
|
|
|
|
~ach
|
|
|
|
nt is
|
|
|
|
Sa1.Jpling Axiom
|
|
|
|
S1. On every trial the elementary pattern (Ul' ... , UN) of the presented
|
|
|
|
lely,
|
|
|
|
figure f in F is completely sampled.
|
|
|
|
d by
|
|
|
|
~ we
|
|
|
|
Response Axioms
|
|
|
|
Cis
|
|
|
|
nF,
|
|
|
|
R1. Until a positive instance of the concept is presented, the subject
|
|
|
|
does not have 'a template of the concept, and each response is made in terms
|
|
|
|
'iant
|
|
|
|
of a fixed guessing probability p.
|
|
|
|
,tion
|
|
|
|
R2. On trial n, if the subject has a template Tn for C, then figure f in F
|
|
|
|
. the
|
|
|
|
is classified as an instance of C if the elementary pattern off matches the
|
|
|
|
template, otherwise not.
|
|
|
|
193
|
|
|
|
FRED S. ROBERTS AND PATRICK SUPPES
|
|
Conditioning Axioms
|
|
Cl. On every trial, the subject has at most one template for the concept. C2. The initial template is the elementary pattern of the first positive instance of the concept.
|
|
C3. Let (Ul, ... , UN) be the elementary pattern ofthefigurefpresented on
|
|
trial n. (i) If f e C, then for each i such that Ui <;;;, ti,n' ti,n +1 becomes F on trial
|
|
n+ 1 with probability e, and remains t;,n with probability 1- e. (ii) IffeC and u;<;;;,ti,m then ti,n+l remains ti,n. (iii) Iff¢'C, then t;,n+l =ti,nfor aliI <i5,N.
|
|
It is easy to prove that if the probability e of conditioning is not zero, . then for a wide range of presentation schedules of figures, a concept C definable in terms of elements of S and invariant under a group G will be learned with probability 1. We hasten to add that somewhat more complicated axioms are needed to give an empirically realistic account of classification responses in the early trials oflearning. The most unrealistic aspect of the present axioms is that they predict only one kind of error after the first positive instance of the concept appears, namely, misclassification of positive instances, with classification of negative instances always being correct. Secondly, it is unrealistic to build a completely specific template on the basis oZ the first positive instance of the concept, as called for in Axiom C2. A closely related, but more complicated scheme calls for the probabilistic construction of a complete template, in terms of noticing or sampling all the elementary properties, over a number of trials. The modifications required also naturally lead to the prediction of errors in classifying negative instances.
|
|
Other shortcomings of the theory formulated in the axioms just given are not hard to find, but we shall not pursue the analysis further here. All that we have intended is to illustrate the kind of theory that seems promising for giving an account of learning perceptual invariants. More detailed theoretical developments and the consideration of quantitative empirical data we leave to another time and place.
|
|
194
|
|
|
|
~ept.
|
|
'itive
|
|
trial
|
|
~ero,
|
|
pt C 11 be
|
|
npliLt of listie :rror
|
|
.aSSI-
|
|
.nees etely "ept, :ated 'late, rer a ) the
|
|
~ven
|
|
rrere.
|
|
~ems
|
|
.fore ative
|
|
|
|
PROBLEMS IN THE GEOMETRY OF VISUAL PERCEPTION
|
|
APPENDIX: PROOF OF LAMB'S THEOREM
|
|
We shall attempt to define the terminology, explicitly state the assumptions, and then sketch the arguments needed for a proof of Lamb's Theorem, thus explicating his intuitive presentation.
|
|
1. Let us begin by assuming a fixed coordinate system in Euclidean 3-space and letting 0 denote the origin. Let the eyeball (in primary position, say) be denoted by E. As in Section 2.2 we make the following assumption: Assumption 1: E is a rigid body centered at O.
|
|
Let Tbe a sphere centered at 0 and containing E. We describe the movement of E by the movement of T. We shall mean by 'external space' that collection of points in our fixed 3-space which is external to T and within a fixed angular distance of the primary point A. We shall in general denote points of external space by upper-case letters and points of T by lower-case letters, except that the point 0 lies in both.
|
|
For each point P of external space, there is by Donders' Law a corresponding position T(P) of T. Let us identify T with T(A) and then think of T(P) as the image of T in our fixed 3-space which results from mapping each point b in T into the point b(P) occupied by the point b when T is in the position T(P). In this way we may describe the movements of the sphere T.
|
|
2. It is probably reasonable to assume that there is an idealized center of the
|
|
retina, say a point c EE s; T. Moreover, there is a point f EE s; T which is the
|
|
focal point of the visual system. To carefully explicate Lamb's argument, we make the following additional assumptions:
|
|
Assumption 2: If P is the point of fixation in external space, then the point P is imaged in the point c of the retina.
|
|
Assumption 3: The retinal image of a point Q lies on the line determined by fand Q.
|
|
Assumption 4: The points c, f and 0 are collinear.
|
|
Note that by Assumptions 2 and 3, the points c, f and P are collinear when P
|
|
is the point of fixation. Hence, Assumption 4 is equivalent to the assumption that the pointf always lies on (the extension of) the visual axis.
|
|
For convenience, we introduce the point a E Twhich is the point of intersection of the surface of the sphere T with the visual axis 0 A in this primary position (see Figure 5). Summarizing all of the above remarks, we have
|
|
LEMMA I: If the point P in external space is the point of fixation, then the points c, f, a, 0 and P are collinear. To use alternative notation, the points c(P), f(P), a(P), 0 and P are always collinear (see Figure 6).
|
|
Remark: To determine new position T(P) from point P and primary position T(A), note that the new position of axis Oa is completely determined: it must coincide with the line OP. Thus, by Assumption I, the only new degree of freedom is a rotation about the axis OP. This will be crucial in the proof of Lamb's Theorem.
|
|
LEMMA 2: If the point of fixation P lies on the line segment MN in external
|
|
195
|
|
|
|
FRED S. ROBERTS AND PATRICK SUPPES
|
|
space, then the image on the retina of line segment MN in this situation lies in the plane 0, M, N.
|
|
|
|
SURFACE OF T
|
|
|
|
+
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-
|
|
|
|
+o-
|
|
|
|
-......f
|
|
|
|
+a
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-
|
|
|
|
A
|
|
|
|
Fig. 5.
|
|
|
|
P
|
|
|
|
P
|
|
|
|
SURFACE OF TIp)
|
|
|
|
SURFACE OF T
|
|
|
|
o
|
|
|
|
of
|
|
|
|
c (P)
|
|
|
|
c
|
|
|
|
Fig. 6.
|
|
Proof: Since f (P), 0 and P are collinear by Lemma 1, f (P) lies in the plane 0, M, N. Thus, for every point Ron MN, the line determined by R andf(P) also lies in this plane. Finally, by Assumption 3, so does the image of R (see Figure 7). Q.E.D.
|
|
|
|
N
|
|
|
|
P f IP)
|
|
R
|
|
|
|
M
|
|
Fig. 7. 3. Let us introduce, as in Section 2.2, a spherical surface S centered at 0 and completely surrounding T. As before, each point of external space corre-
|
|
196
|
|
|
|
PROBLEMS IN THE GEOMETRY OF VISUAL PERCEPTION
|
|
|
|
es in
|
|
|
|
sponds to one point of S under projection from O. A given line segment MN
|
|
|
|
in external space corresponds to a great circle arc M'N' on the spherical surface
|
|
|
|
S (see Figure 8). The crucial result in the proof of Lamb's theorem may now
|
|
|
|
N 5
|
|
|
|
plane f(P) '? (see
|
|
1 at 0 corre-
|
|
|
|
SURFACE OF T
|
|
M
|
|
Fig. 8. be formulated as follows:
|
|
LEMMA 3: If a straight line segment MN in external space is seen as straight, then as the fixation point moves from M to N along MN, the eye rotates along the corresponding great circle arc M'N'. (More precisely, if Rand R' are any two points on MN then the position T(R') is reached from the position T(R) by a rotation '"C about an axis perpendicular to plane 0, M, N, and through an angle ROR' as shown in Figure 9).
|
|
N
|
|
R'=T(R)
|
|
o
|
|
R
|
|
M
|
|
Fig. 9.
|
|
It is this lemma which Lamb sloughs over in one line and which is at the very heart of his result.
|
|
Proof: Suppose R and R' are two points on the line segment MN. As the point of fixation moves from R to R' along MN, Lemma 1 implies that the axis Oa rotates in the plane 0, M, N as shown in Figure 10. Let B be the point of intersection of the surface of sphere T(R) with a line in the plane 0, M, N perpendicular to line Oa(R) (see Figure 11). Suppose B = b(R) for some point
|
|
197
|
|
|
|
FRED S. ROBERTS AND PATRICK SUPPES
|
|
N
|
|
R' = T(R) a(R')= T(aIR))
|
|
o
|
|
aiR) R
|
|
M
|
|
Fig. 10. b in T. (Thus, the lines Ob and Oa are perpendicular in T.) Using the remark after Lemma 1, the new position T(R') may be found from T(R) by describing the angle () between the lines 0 . 7: (b(R» and Ob(R') (see Figure 12).
|
|
N
|
|
B=b IR)
|
|
|
|
o
|
|
|
|
~~laIR)
|
|
|
|
R
|
|
|
|
M
|
|
Fig. 11.
|
|
By Lemma 2, if the point of fixation is R, then the image of a portion of line segment MN centered at R will lie in plane 0, M, N = plane 0, b(R), a(R) = plane 0, b, a of T. If the point of fixation is moved to R', the image of a portion of line segment MN centered at R' will by Lemma 2 again lie in plane 0, M, N. But since line MN is seen as straight, it follows by definition that the image will again hit the same elements of the retina and so lie again in plane 0, b, a of T.
|
|
Thus plane 0, 'O(b(R», 'O(a(R» = plane 0, M, N = plane 0, b(R), a(R) =
|
|
plane 0, b, a of T = plane 0, b(R'), a(R') and hence the angle () must be O.
|
|
198
|
|
|
|
~mark
|
|
ribing
|
|
ofline (R) = ortion M,N.
|
|
ge will
|
|
2 ofT. (R) = be O.
|
|
|
|
PROBLEMS IN THE GEOMETRY OF VISUAL PERCEPTION
|
|
N
|
|
/"1 b(R')
|
|
'r(b(R__l_)_r-/-_"_..,II
|
|
o(R')-r{a(Rl)
|
|
M
|
|
Fig. 12. Thus, the position T(R') is reached from T(R) by a rotation 'l" along the great circle arc corresponding to RR'. Q.E.D. It should be noted that we have not used anywhere near the full strength of the seen-as-straight assumption in this proof. This is why Lamb can say that the proof is really independent of the shape of the retina. 4. The rest of the argument is simple but ingenious. Here, we follow Lamb closely. Since a straight line in external space corresponds to a great circle arc on S, a triangle PQR in external space corresponds to a spherical triangle F'Q'R'. By Donders' Law, as we scan the perimeter of the triangle starting with point P (and eyeball in position E(P) or T(P)) and eventually return to P, the eye returns to its original position T(P). Suppose now that each straight line in external space is seen as straight. Then by Lemma 3, as the eye scans the perimeter of the triangle starting with point P, the eyeball rotates in order along the great circle arcs P'Q', Q'R', R'F'. But by a classical theorem of Hamilton [7], the resultant motion of the eye is a rotation about the line OF' through an angle equal to the spherical excess of the triangle P'Q'R'. This in generalis not the same as returning to the original position. We conclude that not every straight line in external space can be seen as straight. Q.E.D.
|
|
Stanford University
|
|
BIBLIOGRAPHY
|
|
U] F. Attneave, 'Some Informational Aspects of Visual Perception', Psychological Review 61 (1954) 183-193.
|
|
[2] R. Beals, D. H. Krantz, and A. Tversky, 'Foundations of Multidimensjonal Scaling: Metric Models Based on Ordering of Pairs', Psychometric Society Meetings, 1966.
|
|
199
|
|
|
|
FRED S. ROBERTS AND PATRICK SUPPES
|
|
[3] A. A. Blank, 'Axiomatics of Binocular Vision: The Foundations of Metric Geometry in Relation to Space Perception', Journal of the Optical Society of America 48 (1958) 328-333.
|
|
[4] A. A. Blank, 'Analysis of Experiments in Binocular Space Perception', Journal of the Optical Society of America 48 (1958) 911-925.
|
|
[5] D. Fender, 'Control Mechanisms of the Eye', Scientific American 211 (1964) 24-32.
|
|
[6] D. Fender, 'The Eye Movement Control System: Evolution of a Model', in Neural Theory and Modelling, Proceedings of the i962 Ojai Symposium (ed. by R. F. Reiss), Stanford University Press, Stanford, Calif., 1964, pp. 306-324.
|
|
[7] W. Hamilton, Lectures on Quaternions, 1853. [8] D. O. Hebb, The Organization of Behavior, Wiley, New York, 1949. [9] H. Helmholtz, Wissenschaftliche Abhandlungen, Vol. 2, J. A. Barth, Leipzig, 1883. [l0] H. Helmholtz, Handbuch del' physiologischen Optik, L. Voss, Hamburg and
|
|
Leipzig, 1896. (English translation: Treatise on Physiological Optics (ed. by J. P. C. Southall), The Optical Society of America, Menasha, Wise., 1925.) [11] J. E. Hochberg, Perception, Prentice Hall, Englewood Cliffs, N. J., 1964. [12] H. Lamb, 'The Kinematics of the Eye', Philosophical Magazine 38 (1919) 685-695. [l3] R. K. Luneburg, 'The Metric of Binocular Visual Space', Journal of the Optical Society of America 40 (1950) 627-642. [14] W. H. Pitts and W. S. McCulloch, 'How We Know Universals - the Perception of Auditory and Visual Forms', Bulletin of Mathematical Biophysics 9 (1947) 124-147. [l5] J. R. Platt, 'How We See Straight Lines', Scientific American 202 (1960) 121-129.• [16] R. M. Pritchard, 'Stabilized Images on the Retina', Scientific American 204 (1961) 72-78. [17] P. Suppes and R. C. Atkinson, Markov Learning Models for Multipersoll interactions, Stanford University Press, Stanford, Calif., 1960. [l8] E. C. Zeeman, 'The Topology of the Brain and Visual Perception', in The Topology of Three Manifolds (ed. by M. K. Fort), Prentice Hall, Englewood Cliffs, N. J., 1962, pp. 240-256.
|
|
REFERENCES
|
|
1 This is an assumption which should be studied more closely if the kinematic restrictions are weakened. 2 This notion is more precisely defined below. 3 Some such assumption is necessary if we are to make any sense out of our visual sensations. For, if the eye can fixate on the same scene in different positions, then different retinal images result. 4 It should be noted that the relation between E(P) and E(Q), which can be derived from Listing's Law, it not quite so simple if Q'" A. 5 We are using 'group' in the technical algebraic sense. It might be more reasonable to think of a small set of allowable transformations and G as the group generated by them. 6 Cf., for example, the discussion in Pitts & McCulloch [14]. 7 This is probably a little too simple-minded, for size constancy is observed only out to a reasonable distance from the obserVer.
|
|
200
|
|
|
|
:etric ry of
|
|
|
|
!Iil
|
|
g
|
|
|
|
lalof
|
|
|
|
0~
|
|
~
|
|
|
|
:964)
|
|
|
|
I
|
|
|
|
m~
|
|
I', in
|
|
|
|
I. by
|
|
|
|
PROBLEMS IN THE GEOMETRY OF VISUAL PERCEPTION
|
|
8 For example, the target may be fitted directly to the eye by means of a contact lens mounted with a tiny projector. See Pritchard [16]. 8 [0,1] is the set of all real numbers x with O::S;x::S;l. 10 For simplicity we shall in tile following discussion disregard rates of firing and identify firing witil firing at (or near) fuIl strengtil. If this simplification is dropped, tile discussion is easily modified.
|
|
|
|
1883. and
|
|
l. by
|
|
|
|
-695. rytical
|
|
|
|
ption
|
|
|
|
(947)
|
|
|
|
-129.•
|
|
|
|
I
|
|
|
|
1961)
|
|
|
|
I
|
|
|
|
[nter-
|
|
|
|
E
|
|
|
|
E
|
|
|
|
ology '!. J.,
|
|
|
|
ic re-
|
|
|
|
,isual liffer-
|
|
|
|
,rived
|
|
|
|
nable
|
|
|
|
ed by
|
|
|
|
~
|
|
|
|
I
|
|
|
|
lyout
|
|
|
|
~
|
|
|
|
201
|
|
|