slide 1
Modelling foveated vision in Matlab 

In order to study algorithms for an active foveate vision system,
functions and scripts in Matlab language, usable also on GNU Octave,
have been developed.

At any time, our eyes can see only a small part of the viewing field
in high detail, namely the 5 degrees mapped into the fovea. We change
our fixation target about three times per second without noticing this
most of the time.  The eye movements are called saccades.  Subjectively,
the part of the world, in which we get a detailed and veridical view of
the world, seems to be a lot wider.  When we try to imagine a concrete
algorithm for the task of accumulation of details from foveate fixations,
called saccadic integration, things become really hard.

slide 2
Fovea example

The slide shows a simple example of a fovea on a hexagonal grid with
an inner, almost uniformly sampled, hexagon of side length 6, and an
outer grid in log-polar form with 6 circles of 36 positions. This is
the grid with the least number of disruptions, namely the 6 corners of
the inner hexagon, for the given numbers. All other inner points have
exactly 6 neighbours, and the inner position is the mean value of the
adjacent positions.  The examples presenedt later are taken with a side
length of 20 and 64 circles outside intended to map a viewing field
of about 10 degrees and for Cartesian images of size 512x512.  Thus,
the originally 262.144 positions are reduced to 8941.

slide 3
Essential questions

Some tasks in processing foveate images are hard even before we try
to put fixations together. Finding contours and lines in an image,
is solved by an approximation of the Laplace operator, which gives,
in David Marr's terms, a primal sketch of the content of the image.
The detection of common geometric primitives such as straight lines,
circles, right angles and parallels, is not possible without knowing how
the image content transforms under the movements defining these primitives
in the Euclidean geometry of the physical world.  If a Fourier transform
were available, these features would be detectible by patterns in the
Fourier coefficents at fixed places. Similarly, the length comparison
comes by the wave length in frequency domain.  And lastly, though this
is not implemented yet, the algorithms should be extensible to images
containing motion.

slide 4
Positions in eye space and perception space

The subjective perception space should contain much more positions than
the eye space, in order that there is room where the details from several
fixations can be accumulated. This perception space is probably sited
in a region of the brain cortex called V1, and it is almost log-polar
organised like the eye space. The fact that it is split between the
brain halves, makes it possible to map the perception space halves to
the V1 surface in an area preserving manner.  Additionally to luminance
and colour, the distance variable is maintained by the brain making the
image surface a relief, 2-and-half-dimensional in Marr's terms.

Complex numbers are used for the positions, because the additional
operations available in the complex number field have a useful
interpretation in image processing. If a complex number is considered as
a vector in the plane having a magnitude and an angle to the real axis,
the multiplication of positions becomes scaling with the magnitude and
rotation by that angle.

In the examples, for simplicity, it is not distinguished between the
perception space and the original Cartesian image.

slide 5
Foveate graph

The foveate grid is constructed by a custom function named "hexnet".
It returns a vector of complex numbers for the positions, and a sparse
matrix denoting the adjacency relations of the foveate graph. Using sparse
matrices, wherever possible, is essential for the implementation of these
algorithms on the computer. But sparseness is also present in physiology,
because any cell can interact directly only with a few other cells.

slide 6
Nyquist-Shannon sampling theorem

Operators are needed to interpolate and downsample images between the
3 spaces. There are intrinsics for this purpose available in Matlab,
but they have a deficiency, which is seen when we look at a simple
interpolation situation: 1-dimensional uniform samples.

For this situation a frequency-bandlimited interpolation for the whole
real line is known explicitly by the Nyquist-Shannon sampling theorem.
The interpolation is a sum of weighted translated copies of the cardinal
sinus function. This solution has properties
not present in local interpolation strategies: Interpolated values
may fall outside of the input range, either by the negative values,
or by the addition of distant values. In case of high frequency input,
when the input samples change their sign often, effects may reach over
a long distance, like that which is found in the Hermann grid, when
straightness of the lanes is considered as a sort of high frequency.
For 2-dimensional images, a circular symmetric interpolation function is
wanted. It is given by the FT of the unit disk.  Its radial part is the
quotient of Bessel function J1 and the argument.  It may be named "jinc".

slide 7
Foveate example

The interpolation operators in this talk are based on the jinc function.  On the
left side of the slide the interpolation the foveated Lena test image
is seen. The effect of foveation is better seen in the Laplacian ont
the right, as it separates and enhances the interesting features.

slide 8
discrete Laplace operator

The Laplace operator itself is an infinitesimal operator, so a discrete
approximation is used on our spaces, which is only a low frequency
approximation of the real Laplacian. The difference between a suitable
multiple of the adjacency matrix and the unity diagonal matrix in each
space is used, scaled such that the value on quadratic forms is the same
in each space. On constant and linear image regions the Laplacian is zero.

slide 9
Fourier transform on a foveate grid

By comparison of the Fourier transforms for the original image and its
foveation, it is seen that the FT is foveate itself. Only in the center,
where low frequencies reside, rapid changes are present. In the computer,
we can get the FT in a foveate space by interpolation to Cartesian, using
there the 2-dimensional FFT, and transform back again. In a physiological
system, this is not a feasible operation, just as the direct application
of a Fourier matrix is not.

slide 10
Saccadic integration

A concrete algorithm for saccadic integration can now be formulated.
In the pair of Laplacian images in this slide, the fixation point has moved
from one eye of Lena to a point near the other.  First, the new state
is predicted by anticipation in perception space of the translation
caused by the saccade.  In case of moving image content, the changes
caused by the motion have to be predicted also. Then the prediction is
foveated down to eye space, where it is subtracted from the new fixation
input. When the difference is interpolated back to perception space,
it can be added there. The details from the previous fixations will be
preserved as long as the perception space can store them.

slide 11
Stereographic projection and Riemann sphere

Complex numbers are a perfect tool for dealing with the eye data. The
central perspective map from the world to a plane resembles the
construction of the Riemann sphere for the complex projective line (which
is 2-dimensional when viewed as a real manifold).  Any transformation
caused by an Euclidean transformation of the plane against the Riemann
sphere, can be described by a simple formula in complex numbers, called
Möbius transformation. It is quotient of 2 affine linear terms in one
complex variable. J. Turski has shown in a series about the topic of
"conformal camera" that Fourier transforms are possible, and he published
results for several situations in active vision.

slide 12
Schroedinger Equation of the Harmonic Oscillator

In quantum mechanics, the Schrödinger equation for the harmonic
oscillator has the property, that the initial wave Psi, after a time
of pi/2 is transformed into its FT, and after pi/2 again, it is the
original, but reversed in space.  On the right side of the equation,
only local operators representable by sparse matrices appear. Therefore
the equation can be solved for small time steps (about 5000 for 128x128
images) approximately. The fact that after the same time the original
image reappears can, be used for iterative refinement of the approximate
solution by negative feedback.

slide 13
Conclusions

  1. FT is needed for active vision.

  2. FT is feasible under the locality restriction.

  3. There are indeed signs, that nonlocal (FT-like) effects
are present in low order vision (scintillating version of the Hermann
grid).

  4. Movements are the link between the groups of the physical world
and the transformations in the image plane, especially the small ones:
microsaccades.

  5. The functions and scripts are available on the web site
"http://pkeus.de/dokuwiki/doku.php?id=en:foveate".

References:
Kloke, W.B. (2009). Microsaccdes may establish the geometric space in visual perception. European Conference on Eye Movements, (p. 147). Southampton.
Turski, J. (2004). Geometric Fourier Analysis of the Conformal Camera for Active Vision. SIAM Review, 2004, Vol. 46, No. 2: pp. 230-255.
Marr, D. (1982): Vision. A Computational Invetigation into the Human Represantation and Processing of Visual Information, W.H. Freeman and Company, ISBN 0-7167-1284-9.