slide 1 Modelling foveated vision in Matlab In order to study algorithms for an active foveate vision system, functions and scripts in Matlab language, usable also on GNU Octave, have been developed. At any time, our eyes can see only a small part of the viewing field in high detail, namely the 5 degrees mapped into the fovea. We change our fixation target about three times per second without noticing this most of the time. The eye movements are called saccades. Subjectively, the part of the world, in which we get a detailed and veridical view of the world, seems to be a lot wider. When we try to imagine a concrete algorithm for the task of accumulation of details from foveate fixations, called saccadic integration, things become really hard. slide 2 Fovea example The slide shows a simple example of a fovea on a hexagonal grid with an inner, almost uniformly sampled, hexagon of side length 6, and an outer grid in log-polar form with 6 circles of 36 positions. This is the grid with the least number of disruptions, namely the 6 corners of the inner hexagon, for the given numbers. All other inner points have exactly 6 neighbours, and the inner position is the mean value of the adjacent positions. The examples presenedt later are taken with a side length of 20 and 64 circles outside intended to map a viewing field of about 10 degrees and for Cartesian images of size 512x512. Thus, the originally 262.144 positions are reduced to 8941. slide 3 Essential questions Some tasks in processing foveate images are hard even before we try to put fixations together. Finding contours and lines in an image, is solved by an approximation of the Laplace operator, which gives, in David Marr's terms, a primal sketch of the content of the image. The detection of common geometric primitives such as straight lines, circles, right angles and parallels, is not possible without knowing how the image content transforms under the movements defining these primitives in the Euclidean geometry of the physical world. If a Fourier transform were available, these features would be detectible by patterns in the Fourier coefficents at fixed places. Similarly, the length comparison comes by the wave length in frequency domain. And lastly, though this is not implemented yet, the algorithms should be extensible to images containing motion. slide 4 Positions in eye space and perception space The subjective perception space should contain much more positions than the eye space, in order that there is room where the details from several fixations can be accumulated. This perception space is probably sited in a region of the brain cortex called V1, and it is almost log-polar organised like the eye space. The fact that it is split between the brain halves, makes it possible to map the perception space halves to the V1 surface in an area preserving manner. Additionally to luminance and colour, the distance variable is maintained by the brain making the image surface a relief, 2-and-half-dimensional in Marr's terms. Complex numbers are used for the positions, because the additional operations available in the complex number field have a useful interpretation in image processing. If a complex number is considered as a vector in the plane having a magnitude and an angle to the real axis, the multiplication of positions becomes scaling with the magnitude and rotation by that angle. In the examples, for simplicity, it is not distinguished between the perception space and the original Cartesian image. slide 5 Foveate graph The foveate grid is constructed by a custom function named "hexnet". It returns a vector of complex numbers for the positions, and a sparse matrix denoting the adjacency relations of the foveate graph. Using sparse matrices, wherever possible, is essential for the implementation of these algorithms on the computer. But sparseness is also present in physiology, because any cell can interact directly only with a few other cells. slide 6 Nyquist-Shannon sampling theorem Operators are needed to interpolate and downsample images between the 3 spaces. There are intrinsics for this purpose available in Matlab, but they have a deficiency, which is seen when we look at a simple interpolation situation: 1-dimensional uniform samples. For this situation a frequency-bandlimited interpolation for the whole real line is known explicitly by the Nyquist-Shannon sampling theorem. The interpolation is a sum of weighted translated copies of the cardinal sinus function. This solution has properties not present in local interpolation strategies: Interpolated values may fall outside of the input range, either by the negative values, or by the addition of distant values. In case of high frequency input, when the input samples change their sign often, effects may reach over a long distance, like that which is found in the Hermann grid, when straightness of the lanes is considered as a sort of high frequency. For 2-dimensional images, a circular symmetric interpolation function is wanted. It is given by the FT of the unit disk. Its radial part is the quotient of Bessel function J1 and the argument. It may be named "jinc". slide 7 Foveate example The interpolation operators in this talk are based on the jinc function. On the left side of the slide the interpolation the foveated Lena test image is seen. The effect of foveation is better seen in the Laplacian ont the right, as it separates and enhances the interesting features. slide 8 discrete Laplace operator The Laplace operator itself is an infinitesimal operator, so a discrete approximation is used on our spaces, which is only a low frequency approximation of the real Laplacian. The difference between a suitable multiple of the adjacency matrix and the unity diagonal matrix in each space is used, scaled such that the value on quadratic forms is the same in each space. On constant and linear image regions the Laplacian is zero. slide 9 Fourier transform on a foveate grid By comparison of the Fourier transforms for the original image and its foveation, it is seen that the FT is foveate itself. Only in the center, where low frequencies reside, rapid changes are present. In the computer, we can get the FT in a foveate space by interpolation to Cartesian, using there the 2-dimensional FFT, and transform back again. In a physiological system, this is not a feasible operation, just as the direct application of a Fourier matrix is not. slide 10 Saccadic integration A concrete algorithm for saccadic integration can now be formulated. In the pair of Laplacian images in this slide, the fixation point has moved from one eye of Lena to a point near the other. First, the new state is predicted by anticipation in perception space of the translation caused by the saccade. In case of moving image content, the changes caused by the motion have to be predicted also. Then the prediction is foveated down to eye space, where it is subtracted from the new fixation input. When the difference is interpolated back to perception space, it can be added there. The details from the previous fixations will be preserved as long as the perception space can store them. slide 11 Stereographic projection and Riemann sphere Complex numbers are a perfect tool for dealing with the eye data. The central perspective map from the world to a plane resembles the construction of the Riemann sphere for the complex projective line (which is 2-dimensional when viewed as a real manifold). Any transformation caused by an Euclidean transformation of the plane against the Riemann sphere, can be described by a simple formula in complex numbers, called Möbius transformation. It is quotient of 2 affine linear terms in one complex variable. J. Turski has shown in a series about the topic of "conformal camera" that Fourier transforms are possible, and he published results for several situations in active vision. slide 12 Schroedinger Equation of the Harmonic Oscillator In quantum mechanics, the Schrödinger equation for the harmonic oscillator has the property, that the initial wave Psi, after a time of pi/2 is transformed into its FT, and after pi/2 again, it is the original, but reversed in space. On the right side of the equation, only local operators representable by sparse matrices appear. Therefore the equation can be solved for small time steps (about 5000 for 128x128 images) approximately. The fact that after the same time the original image reappears can, be used for iterative refinement of the approximate solution by negative feedback. slide 13 Conclusions 1. FT is needed for active vision. 2. FT is feasible under the locality restriction. 3. There are indeed signs, that nonlocal (FT-like) effects are present in low order vision (scintillating version of the Hermann grid). 4. Movements are the link between the groups of the physical world and the transformations in the image plane, especially the small ones: microsaccades. 5. The functions and scripts are available on the web site "http://pkeus.de/dokuwiki/doku.php?id=en:foveate". References: Kloke, W.B. (2009). Microsaccdes may establish the geometric space in visual perception. European Conference on Eye Movements, (p. 147). Southampton. Turski, J. (2004). Geometric Fourier Analysis of the Conformal Camera for Active Vision. SIAM Review, 2004, Vol. 46, No. 2: pp. 230-255. Marr, D. (1982): Vision. A Computational Invetigation into the Human Represantation and Processing of Visual Information, W.H. Freeman and Company, ISBN 0-7167-1284-9.