Calibration procedure for measuring binocular coordination

Introduction

The movements of the eyes are performed in order to project the object of regard onto the centre of the fovea, since only in this distinguished region, detailed spatial information can be detected. In natural visual conditions, this occurs during the short fixation periods between saccadic eye movements. The need to move the eyes to the target allows the observer to know where and which the subject’s objects of regard are, and how long the subjects dwell on the target. To get this information, researchers use eye trackers to monitor the movements of the eyes.

In order to link the eye tracker output data to the locations in the visual world, a process called calibration is needed. For some purposes it is sufficient to rely on the procedures provided by the device manufacturer, for other, and especially the research on binocular coordination, these may not be fully sufficient. One of these reasons may be the need for monocular calibration. For some researchers monocular calibration is mandatory, others publish results about binocular coordination based on binocular calibration. Whether monocular calibration is necessary or advantageous, or even whether the resulting data are affected or not, therefore is needed to be evaluated on experimental data. At least, it is clear that providing the monocular presentation is expensive in terms of calibration time and instrumental effort.

It is the aim of the present study to explore some of the conditions of the calibration process with respect to questions of resolution, reliability and validity, especially with respect to the benefits of monocularity.

For tracking of eye movements, several methods have been developed. Today, most widespread are devices relying on the recording of the pupil location on digital video systems. Other methods are based on the location of a reflex of an external light source on the cornea, which has a different curvature from the eye ball, or, more elaborate, two of the Purkinje images, the change of reflected luminance of the moving limbus induced in a light sensor, the change of magnetic induction in a coil imbedded in a contact lense on the sclera, or the electric potential change on the skin caused by the electric dipole moment of the eye. Among the methods used for eye tracking, traditionally the scleral search coil has the best reputation for accurateness, though in comparative studies it has been shown that the wearing of the contact lense may affect the eye movements themselves. From the calibration aspect, this system has the least conceptual problems, as the magnetic induction is not affected by anything else than the angle of the coil plane with respect to the reference frame. Though, some calibration is still needed to establish the relation between these angles and the relevant angles of the visual axis. In oculomotor research, angles with respect to a head-oriented coordinate system are needed. In this case, an additional coil to monitor head movements is needed.

For some methods it may be useful to distinguish between two different concepts of an axis in the visual system. For most applications the relevant visual axis, or line of sight, is the straight line from the centre of the foveola to the object of regard, which lies about five degrees nasal to the optical axis of the eye, which is defined as the axis of symmetry of the optical system. The latter can be determined without reference to perception. The angle between these axes is constant and its spatial orientation is fully determined by the the horizontal plane and Listings’s law which describes the dependency of the rotation angle of the eye from the other two degrees of freedom of the eye ball.

For instance, in a scleral search coil system it suffices to place the coil symmetrically around to pupil to ensure proper determination of the direction of the optical axis. The influence of location of the center of the eyeball is not critical, as a translation moves both the eye and the projection of the optical axis in the plane of regard by equal amounts. For a head-mounted system the same argument holds with respect to a head based coordinate system, if the position of symmetry can be determined. This is normally not worth the effort, because the head position and additionally the 6 degrees of freedom of the device in the head coordinate system have to be calibrated anyway.

Therefore actual calibration procedures attempt to shorten the chain of transformations to a single one for the visual axis. For an analysis of error, it is important, to track the full chain. For a video based system, the image of the pupil moves on surface of the eye ball, and it is monitored from a camera, either head-mounted or fixed in an outer coordinate system. In the first case, slippage from the head mount may affect results negatively, in the other, head movements will result in spurious translational movements of the pupil images both in the distance and in the plane of observation.

The determination of the actual visual axis can be a difficult task. The only way known to do this exactly is to determine the projected image location of the object of regard on the retina itself. This can be achieved by viewing the fundus of eye using a half-mirror in the light path. Using this method, Putnam et al. reported that the locus of fixation was generally not exactly in the centre of the foveola but quite near, at about 10 arc minutes with standard deviation in the same order of magnitude. It is not clear, however, under which conditions this statement holds, especially whether the locus of fixation can be assumed constant for repetition in a different experimental session or under different workload situations.

Without the necessary adaptive optics technology, or in more convenient situations, it may be sufficient to rely on the subject’s judgement of proper fixation. In this case, the accuracy which can be achieved or which was actually achieved, is not easily determinable. For instance, in case of microstrabismic subjects the estimated direction is likely false at all. In any case, there is no evidence that direction judgement is tightly coupled to acuteness of vision, or that this coupling is constant in time. In the case of seeing in depth, the horizontal direction is additionally dependent on the viewing distance. This may be the case even in monocular viewing when the subject may adjust the locus of fixation to accommodation, proximity or other non-visual distance stimuli. Therefore, it seems highly questionable, whether the tedious monocular calibration procedure is warranted. On the other hand, a fixation disparity, meaning that the (monocularly) optimal fixation condition is not achieved in binocular viewing because of problems in binocular coordination, cannot be determined at all without using data in monocular viewing conditions. The proper way of handling seems to be using binocular calibration to get the calibration data under most stable viewing conditions, and make the monocular conditions part of the data on which the calibration is applied.

Given these restriction on absolute validity of our results, we concentrate on the questions of reliability and relative validity. The only evidence available for this evaluation is the statistics of the calibrations themselves and that of the differences on the calibrated data caused by choosing different calibrations procedures. The most important information is the consistency of recorded data to the actual pattern of calibration targets, mainly whether these data reproduce the common rectangular 9×9 grid sufficiently good. Sufficience in this case probably has to be interpreted as „not being an outlier in the set of calibrations done“. We try to collect this basic statistical information from a series of experiments designed to evaluate the calibration procedure used in binocular experiments specifically for the purpose of correct determination of horizontal vergence.

Experiments

We report data from 4 different experiments in this paper. In all of them we used an Eyelink II in detached mode, where the cameras are mounted fixed with respect to the workspace and to the display device. Two different display devices were used:

a shutter haploscope built from a CRT monitor and TFT shutter glasses, and
a mirror haploscope built from tow TFT monitors.

All experiments were carried out at 60 cm viewing distance. As calibration targets, we used a white shrinking square from 10 pixels downto one pixel in 100 ms steps to attract attention, and the changed to a x-cross of size 3×3. In some conditions this cross was displayed for a fixed time of 400 ms, in others until the subject acknowledged correct fixation with a mouse click. The head was loosely fixated using a chin rest, and a flexible tape to support a feedback in order to avoid unintentional movements. The position was aligned at the forefront of one of the eyes for all trials in the same experiment, but cameras could be adjusted for an individual, though this was not needed generally. Calibration targets were displayed in pseudo-random order starting with a central fixation for the left eye in a standard 3×3 grid obtaining a range of 10 degrees horizontally and 6 degrees vertically. Unless otherwise noted, the targets were displayed monocularly pseudo-random interleaved between the eyes.

For each experiment, we give the number of subjects n, the number of experimental sessions of each subject s, and the repetitions inside a session r, and the number of experimental conditions c. By repetition, in the context of this paper, is meant the number of before-and-after pairs of complete calibrations.

Experiment 1 -- Eich

( n = 19, s = 2, r = 4, c = 2 )

The simplest experiment is built from 4 blocks of calibrations immediately following each other. In the outer blocks we used 18 monocular fixations, in the inner 9 binocular fixations. From this we can compare the calibration coefficients and statistical quality for monocular and binocular target fixations at identical distances. In contrast, a targetwise interleaved display was used, where the 27 monocular and binocular target presentations were pseudo-randomly interleaved, twice in each trial. Only the mirror haploscope was used.

As no other than the 9 target positions were evaluated, the outcome of this experiment can be discussed without actually using the calibration, which would involve a linear or nonlinear transformation of the horizontal and vertical measurement channels. Only the fixations and the deviations under the different experimental conditions are used, and they can be discussed on the base of the raw data, independently for the directions.

This experiment minimises the time distance between monocular and binocular fixations for the targetwise interleave, and even in the blockwise interleave, the best condition for the distance of a conventional calibration to the binocular payload is achieved, at the price of double the number of calibration fixations relative to the number of payload fixations.

Experiment 2 -- mishfd

( n = 25, s = 2, r = 6, c = 2 )

In this experiment we tried to compare the two presentation devices in the context of a comparison of subjective and objective methods of determining fixation disparity (FD). As the subjective measurement of FD involves a relatively high number of psychophysical decisions about the relative location of nonius lines, we use ionly the last of them for the comparison. Therefore the calibration was carried out before the last 10 experiment cycles and after them. For the discussion of calibration issues this has the advantage that the experiment has a more realistic payload and time distance between the calibrations than in experiment 1. As the length of the cycles varied in the experimental conditions considerably between 10 fixations and 30 fixations in the other condition, this time distance is an experimental factor.

Experiment 3 -- varia

( n = 13, s = 2, r = 6, c = 4 )

Here we compared both devices under variation of the fixed time frame and the mouseclick paradigm for the subject in the calibration. As experimental payload we used a vergence shift paradigm. The stimulus in this experiment was a binocular fixation object consting of several crosses on the foveal and parafoveal area at varying horizontal disparities between 20 arc degrees eso and exo. The subjects were instructed to blink between each presentation in order to make the fixations independent of each other as far as possible. This set of measurements is considered a direct test on the usability of the calibration process in determining fixation disparities, because the vergence movements induced are small enough not to be disturbed by proximal or accommodation induced vergence. We may be able to give quantitative figures for the reliability of vergence measurements in this range from the statistics of this experiment.

Experiment 3a

( n = 13, s = 2, r = 6, c = 2 )

The same experimental condition was used to evaluate the Eyelink II cornea reflex mode on the CRT. As the lighting device for the reflex conflicted with the use of shutter glasses, here the monocular presentation was completely different. The eyes had to be calibrated blockwise for the left and right eye using a paper occlusion device. The idea of the cornea reflex method is that the location of the cornea reflex relative to the center of pupil is less affected by translational movement of the head than either of them alone. The firmware of the Eyelink does not allow to record these two variables directly, only the internally computed value, probably the difference, is accessible. In our opinion, this is inferior than knowing them both. Together with restrictions on the use caused by the fact, that both inputs have to be valid at each instance, we decided not to explore this mode further.

Experiment 4 -- 3wort

( n = 13, s = 2, r = 3, c = 2 )

Towards a solution of the problem using binocular calibrations and still providing a monocular reference, we inserted monocular presentations of the central target in regular intervals during the course of the experiment. Both monocularly and binocularly calibrated trials were measured at the shutter haploscope, and the subjects were the same as in experiment 3. Payload was the 3-word paradigm as in experiment 2.

Results

From experiment 1, which simulates the best which can be achieved with monocular calibration without additional incommodation of the subjects using bitebars or other instruments, we report the figures for the monocular components of fixation disparity obtained in a specific mixed effects model. Effectively we compare the monocular and binocular fixations of the same target locations directly. The model assumes that there may be various sources determining the outcome of the monocular component measurement, namely the subject, the session and the trial inside a session. The underlying assumption is that either a FD is a trait of a person stable over sessions, or at least inside a session (i.e. over a medium time range). There may be non-subject effects on the FD also in our design, e.g. the target location or a general time effect. Otherwise the measured FD may be inside a range of random variation which is determined from the repetitions inside the session. We take the significance of a random term on the subject level as indication of a successful determination of a FD component.

Targetwise interleave
              Term         hor left        hor right       vert left       vert right
Fixed effects
              hpos                   56.57           55.86            5.21            -3.63
              vpos                   -0.54            0.48           62.12             62.2
              hpos:vpos              -0.01            0.02            0.22                0
Random effects (SD)
VP            hpos                    7.74            0.03            1.28             2.02
              vpos                     0.9            1.34           11.27             9.12
              bin                      2.4            3.68            0.09              3.5
VP:exp        hpos                     0.6            5.33            0.07             0.95
              vpos                    0.74            0.82            1.04             1.03
Residual                             30.52           29.13           85.61             77.8
Blockwise interleave
              Term         hor left        hor right       vert left       vert right
Fixed effects
              hpos                   55.65           54.09            3.98            -4.24
              vpos                   -0.73            0.13            61.9            60.92
              hpos:vpos              -0.05            0.01               0             0.06
Random effects (SD)
VP            hpos                    7.64            6.31            2.13             0.49
              vpos                    1.13            1.64           10.51             8.75
              bin                    16.25           12.87            33.6            31.05
VP:exp        hpos                    0.11            0.37            0.06             0.48
              vpos                    0.63            0.56            1.13             0.16
Residual                             23.03           21.74           77.34            75.85
Raw data are prescaled by division through 5 to yield approximate minutes of arcs.

The expectation on the values in this table are, that there are entries near to 60 for fixed effects in hpos for horizontal measurements and in vpos for vertical, all other small. Only the bin random effects of the subjects in the horizontal measurements are really interesting, because these comprise the difference between the two experimental designs. At a first look, the blocked design yields larger values for these. The fact that the vertical bin random effects are even larger, shows that the values should be interpreted with caution.

A further expected property of the estimates for the monocular components of FD can be used to verify that they are consistent with our presumptions, namely the left components should be approximately the negatives of the right components on every level where they are significant. The presumption is that a FD disparity showing over- or undervergence should occur symmetrically in most cases. Of course, this implies that if we find this symmetry in the better conditioned case of targetwise interleave, it also means that seemingly significant FD components in the worse condition cannot be taken as valid data, if they fail to be symmetric. Indeed, in targetwise interleave the estimators correlate nearly at -1, in the blockwise interleave condition they do not even correlate negatively. We searched for the reasons for this negative result, and found that making the statistical method more robust by discarding data of lesser quality is sufficient to get the intended result for these data. Concretely, we discarded a 4th of the data with biggest residual of the calibration regression. The solution to discard bad calibrations is applicable also in real application of a calibration. Therefore we find it sufficient to be confident in the general correctness of our results.

Correlation of estimates of individual monocular components of FD
[interleaved]
> cor.test(ranef(z2.hr.lmer)$VP$binr,ranef(z2.hl.lmer)$VP$binl)
-0.864 
[blockwise]
> cor.test(ranef(zb.hr.lmer)$VP$binr,ranef(zb.hl.lmer)$VP$binl)
0.874

In experiment 4 we observed a strange result: In the monocular calibrations the vergence state of the subjects in the monocular presentations is expected to be deteremined by the individual heterophoria. In the experiment, we found the deviations from binocular vergence evidently smaller for the subjects than on the mirror haploscope. As the probable reason for this unexpected difference that the separation provided by the shutter glasses were not sufficient for suppression the peripheral vergence stimuli from the monitor frame, though the calibration target itself was not visible on the false eye. Therefore it is questionable whether on this device the fixations are monocular in the full sense.

Conclusions

Though many aspects may still need more insight, some conclusions may be made with sufficient certainty. For the main question, whether monocular or binocular calibration is preferable, binocular calibration prevails in at least some aspects clearly:

Binocular calibration needs only half of the experimental effort, the calibration process is more efficient in terms of stress on the subjects.
The residual from the calibration regression is smaller. Therefore the confidence intervals of the calibrated values are tighter.
For monocular reference, given the small amount of practically occurring fixation disparities, minimising the distance in time between the fixation disparity measurements and the points of moncular reference is important, in order that the difference can be made statistically significant at all.

On the other hand, it is clear that, based on binocular calibration only, no absolute fixation disparity is determinable. Only changes can be monitored without additional efforts to address the question of the relation of true eye direction and measurement outcome. The method of inserting monocular control trials seems to be a solution at least as good, but probably a lot better, as standard monocular calibration. From a purely geometric interpretation, it is symmetrical whether monocular calibrations are applied to binocular data or binocular calibrations to monocular data, but from the statistical properties the tighter bounds achievable make a difference. The instrumental needs for both procedures are the same, so the choice should be easy.