Modelling foveated vision in Matlab This talk is about some functions and scripts in the programming language Matlab (or GNU Octave) which I prepared in the last years to study algorithms for an active vision system. At any time, our eyes can see only a small part of the viewing field in high detail, namely the 5 degrees mapped into the fovea. We change our fixation target about three times per second without noticing this most of the time. The eye movements are called saccades. Subjectively, the part of the world, in which we get a detailed and veridical view of the world, seems to be a lot larger. When we try to imagine a concrete algorithm for the task of accumulation of details fxorom foveate fixations, called saccadic integration, things become really hard. Here we have a simple example of a fovea on a hexagonal grid with an inner, almost uniformly sampled, hexagon of side length 6, and an outer grid in log-polar form with 6 circles of 36 positions. This is the grid with the least number of disruptions, namely the 6 corners of the inner hexagon, for the given numbers. All other inner points have exactly 6 neighbours, and the inner position is the mean value of the adjacent positions. The examples which I present later are taken with a side length of 20 and 64 circles outside intended to map a viewing field of about 10 degrees and for Cartesian images of size 512x512. Some tasks in processing foveate images are hard even before we try to put fixations together. Finding contours and lines in an image, is solved by an approximation of the Laplace operator, which gives us, in David Marr's terms, a primal sketch of the content of the image. The detection of common geometric primitives such as straight lines, circles and so on, is not possible without knowing how the image content transforms under the movements defining these primitives in the Euclidean geometry of the physical world. If we had a Fourier transform available, these features would be detectible by patterns in the Fourier coefficents at fixed places. Similarly, the length comparison comes by the wave length in frequency domain. And lastly, though this is not implemented yet, the algorithms should be extensible to images containing motion. The subjective perception space should contain much more positions than the eye space, in order that there is room where the details from several fixations can be put together. This perception space is probably sited in a region of the brain cortex called V1, and it is almost log-polar organised like the eye space. The fact that it is split between the brain halves, helps to map the peception space halves to the V1 surface in an area preserving manner. On the perception space, additionally to luminance and colour, the distance variable is maintained also by the brain making the image surface a relief, which was termed 2-and-half-dimensional by David Marr. I use complex numbers for the positions, because the additional operations available in the complex number field have useful interpretation in image processing. If we consider a complex number as a vector in the plane having a magnitude and an angle to the real axis, the multiplication of positions becomes scaling with the magnitude and rotation by that angle. In the examples, for simplicity we do not distinguish between the perception space and the original Cartesian image. The foveate grid is constructed by a custom function, which I named "hexnet". It returns a vector of complex numbers for the positions, and a sparse matrix denoting the adjacency relations of the foveate graph. Using sparse matrices, wherever possible, is essential for the implementation of these algorithms on the computer. But sparseness is also present in physiology, because any cell can interact directly only with a few other cells. At first, operators are needed to interpolate and downsample images between the 3 spaces. There are intrinsics for this purpose available in Matlab, but they have a deficiency, which is seen when we look at a simple interpolation situation: 1-dimensional uniform samples. For this situation a frequency-bandlimited interpolation for the whole real line is known explicitly by the Nyquist-Shannon sampling theorem. The interpolation is a sum of weighted translated copies of the cardinal sinus function, here plotted in red for positive arguments. This solution has properties not present in simple interpolation strategies: Interpolated values may fall outside of the input range, either by the negative values, or by the addition of distant values. In case of high frequency input, when the input samples change their sign often, we get effects reaching over a long distance, like that which we find in the Hermann grid, when we consider straightness of the lanes as a sort of high frequency. For 2-dimensional images, a circular symmetric interpolation function is wanted. The radial part is given by the FT of the unit disk as the quotient of Bessel function J1 and the argument. I named this function jinc, and it is plotted in blue. My interpolation operators are based on this function. Next I show an example of the foveation of the Lena test image. The effect of foveation is better seen in the right image, which shows its Laplacian. We see the enhancement of contours by this operator. The Laplace operator itself is an infinitesimal operator, so we have to use a discrete approximation on our spaces, which is mostly a low frequency approximation of the real Laplacian. I use the difference between a suitable multiple of the adjacency matrix and the unity diagonal matrix in each space, scaled such that the value on quadratic forms is the same in each space. Remember that on constant and linear functions the Laplacian is zero. How does the foveation act in the Fourier transform? We can compare the transforms for the original image and its foveation, and we see that the FT is foveate itself. Only in the center, where low frequencies reside, rapid changes are present. In the computer, we can do the FT in a foveate space by interpolation to Cartesian, using there the 2-dimensional FFT, cary out any operation like the multiplication with phase factors for a translation, and transform back again. In a physiological system, this is not a feasible operation, just as the direct application of a Fourier matrix is not, which is possible only for the low dimension of the foveate space. I come to this point later. We are now ready to give a concrete algorithm for saccadic integration. In these pair of Laplacian images, the fixation point is moved from one eye of Lena to a point near the other. First, the new state is predicted by anticipation of the translation caused by the saccade in perception space. In case of motion in the images, the changes caused by the motion have to be predicted also. Then the prediction is foveated down to eye space, where it is subtracted from the new fixation input. When the difference is interpolated back to perception space, it can be added there, and the details from the previous fixations will be preserved as long as the perception space can store them. Now I come to the reason why complex numbers are a perfect tool for dealing with the eye data. The central perspective map from the world to a plane resembles the construction of the Riemann sphere for the complex projective line (which is 2-dimensional when viewed as a real manifold). Any transformation caused by an Euclidean transformation of the plane against the Riemann sphere, can be described by a simple formula in complex numbers, called Möbius transformation. Jacek Turski has shown in a series about the topic of "conformal camera" that Fourier transforms are possible, and he published meaningful results for several situations in active vision. Now for my last topic: the implementation of a FT. Ẃe take a look into quantum mechanics where the Schrödinger equation for a simple system - the harmonic oscillator - has the property, that the initial wave Psi, after a time of pi/2 is transformed into its FT, and after pi/2 again, it is the original, but reversed in space. We see, on the right side of the equation, only local operators representable by sparse matrices. Therefore the equation can be solved for small time steps - about 5000 for 128x128 images - approximately. The fact that after the same time the original image reappears can be used for iterative refinement of the first solution. Now my conclusions: 1. FT is needed for active vision. 2. FT is feasible under the locality restriction. 3. There are indeed signs, that nonlocal (FT-like) effects are present in low order vision. (scintillating version of the Hermann grid) 4. Movements are the link between the groups of the physical world and the transformations in the image plane, especially the small ones: microsaccades. 5. The functions and scripts are available on my Dokuwiki web site.