====== Calibration procedure for measuring binocular coordination ======
===== Introduction =====
The movements of the eyes are performed in order to project the
object of regard onto the centre of the fovea, since only in this distinguished
region, detailed spatial information can be detected. In natural
visual conditions, this occurs during the short fixation periods
between saccadic eye movements. The need to move the eyes to the
target allows the observer to know where and which the subject's
objects of regard are, and how long the subjects dwell on the target.
To get this information, researchers use eye trackers to monitor
the movements of the eyes.
In order to link the eye tracker output data to the locations in
the visual world, a process called calibration is needed. For some
purposes it is sufficient to rely on the procedures provided by the
device manufacturer, for other, and especially the research on
binocular coordination, these may not be fully sufficient. One of
these reasons may be the need for monocular calibration. For some
researchers monocular calibration is mandatory, others publish
results about binocular coordination based on binocular calibration.
Whether monocular calibration is necessary or advantageous, or even
whether the resulting data are affected or not, therefore is needed
to be evaluated on experimental data. At least, it is clear that
providing the monocular presentation is expensive in terms of
calibration time and instrumental effort.
It is the aim of the present study to explore some of the conditions
of the calibration process with respect to questions of resolution,
reliability and validity, especially with respect to the benefits
of monocularity.
For tracking of eye movements, several methods have been developed.
Today, most widespread are devices relying on the recording of the
pupil location on digital video systems. Other methods are based
on the location of a reflex of an external light source on the
cornea, which has a different curvature from the eye ball, or, more
elaborate, two of the Purkinje images, the change of reflected
luminance of the moving limbus induced in a light sensor, the change
of magnetic induction in a coil imbedded in a contact lense on the
sclera, or the electric potential change on the skin caused by the
electric dipole moment of the eye. Among the methods used for eye
tracking, traditionally the scleral search coil has the best
reputation for accurateness, though in comparative studies it has
been shown that the wearing of the contact lense may affect the eye
movements themselves. From the calibration aspect, this system has
the least conceptual problems, as the magnetic induction is not
affected by anything else than the angle of the coil plane with
respect to the reference frame. Though, some calibration is still
needed to establish the relation between these angles and the
relevant angles of the visual axis. In oculomotor research,
angles with respect to a head-oriented coordinate system are needed.
In this case, an additional coil to monitor head movements is
needed.
For some methods it may be useful to distinguish between two different
concepts of an axis in the visual system. For most applications
the relevant visual axis, or line of sight, is the straight line
from the centre of the foveola to the object of regard, which lies
about five degrees nasal to the optical axis of the eye, which is
defined as the axis of symmetry of the optical system. The latter
can be determined without reference to perception. The angle between
these axes is constant and its spatial orientation is fully determined
by the the horizontal plane and Listings's law which describes the
dependency of the rotation angle of the eye from the other two
degrees of freedom of the eye ball.
For instance, in a scleral search coil system it suffices to place
the coil symmetrically around to pupil to ensure proper determination
of the direction of the optical axis. The influence of location of
the center of the eyeball is not critical, as a translation moves
both the eye and the projection of the optical axis in the plane
of regard by equal amounts. For a head-mounted system the same
argument holds with respect to a head based coordinate system, if
the position of symmetry can be determined. This is normally not
worth the effort, because the head position and additionally the 6
degrees of freedom of the device in the head coordinate system have
to be calibrated anyway.
Therefore actual calibration procedures attempt to shorten the chain
of transformations to a single one for the visual axis. For an
analysis of error, it is important, to track the full chain. For a
video based system, the image of the pupil moves on surface of the
eye ball, and it is monitored from a camera, either head-mounted
or fixed in an outer coordinate system. In the first case, slippage
from the head mount may affect results negatively, in the other,
head movements will result in spurious translational movements of
the pupil images both in the distance and in the plane of observation.
The determination of the actual visual axis can be a difficult task.
The only way known to do this exactly is to determine the projected
image location of the object of regard on the retina itself. This
can be achieved by viewing the fundus of eye using a half-mirror
in the light path. Using this method, Putnam et al. reported that
the locus of fixation was generally not exactly in the centre of
the foveola but quite near, at about 10 arc minutes with standard
deviation in the same order of magnitude. It is not clear, however,
under which conditions this statement holds, especially whether the
locus of fixation can be assumed constant for repetition in a
different experimental session or under different workload situations.
Without the necessary adaptive optics technology, or in more
convenient situations, it may be sufficient to rely on the subject's
judgement of proper fixation. In this case, the accuracy which can
be achieved or which was actually achieved, is not easily determinable.
For instance, in case of microstrabismic subjects the estimated direction is
likely false at all. In any case, there is no evidence that direction
judgement is tightly coupled to acuteness of vision, or that this
coupling is constant in time. In the case of seeing in depth, the
horizontal direction is additionally dependent on the viewing
distance. This may be the case even in monocular viewing when the
subject may adjust the locus of fixation to accommodation, proximity
or other non-visual distance stimuli. Therefore, it seems highly
questionable, whether the tedious monocular calibration procedure
is warranted. On the other hand, a fixation disparity, meaning that
the (monocularly) optimal fixation condition is not achieved in
binocular viewing because of problems in binocular coordination,
cannot be determined at all without using data in monocular viewing
conditions. The proper way of handling seems to be using binocular
calibration to get the calibration data under most stable viewing
conditions, and make the monocular conditions part of the data on
which the calibration is applied.
Given these restriction on absolute validity of our results, we
concentrate on the questions of reliability and relative validity.
The only evidence available for this evaluation is the statistics
of the calibrations themselves and that of the differences on the
calibrated data caused by choosing different calibrations procedures.
The most important information is the consistency of recorded data
to the actual pattern of calibration targets, mainly whether these
data reproduce the common rectangular 9x9 grid sufficiently good.
Sufficience in this case probably has to be interpreted as "not
being an outlier in the set of calibrations done". We try to
collect this basic statistical information from a series of
experiments designed to evaluate the calibration procedure used
in binocular experiments specifically for the purpose of
correct determination of horizontal vergence.
===== Experiments =====
We report data from 4 different experiments in this paper. In all of them we used an Eyelink II in detached mode, where the cameras are mounted fixed with respect to the workspace and to the display device. Two different display devices were used:
* a shutter haploscope built from a CRT monitor and TFT shutter glasses, and
* a mirror haploscope built from tow TFT monitors.
All experiments were carried out at 60 cm viewing distance. As calibration targets, we used a white shrinking square from 10 pixels downto one pixel in 100 ms steps to attract attention, and the changed to a x-cross of size 3x3. In some conditions this cross was displayed for a fixed time of 400 ms, in others until the subject acknowledged correct fixation with a mouse click. The head was loosely fixated using a chin rest, and a flexible tape to support a feedback in order to avoid unintentional movements. The position was aligned at the forefront of one of the eyes for all trials in the same experiment, but cameras
could be adjusted for an individual, though this was not needed generally. Calibration targets were displayed in pseudo-random order starting with a central fixation for the left eye in a standard 3x3 grid obtaining a range of 10 degrees horizontally and 6 degrees vertically. Unless otherwise noted, the targets were displayed monocularly pseudo-random interleaved between the eyes.
For each experiment, we give the number of subjects n, the number of experimental sessions of each subject s, and the repetitions inside a session r, and the number of experimental conditions c. By repetition, in the context of this paper, is meant the number of before-and-after pairs of complete calibrations.
==== Experiment 1 -- Eich ====
( n = 19, s = 2, r = 4, c = 2 )
The simplest experiment is built from 4 blocks of calibrations
immediately following each other. In the outer blocks we used 18 monocular fixations,
in the inner 9 binocular fixations. From this we can compare the calibration coefficients and
statistical quality for monocular and binocular target fixations
at identical distances. In contrast, a targetwise interleaved display was used, where the 27 monocular and binocular target presentations were pseudo-randomly interleaved, twice in each trial. Only the mirror haploscope was used.
As no other than the 9 target positions were evaluated, the outcome of this experiment can be discussed without actually using the calibration, which would involve a linear or nonlinear transformation of the horizontal and vertical measurement channels. Only the fixations and the deviations under the different experimental conditions are used, and they can be discussed on the base of the raw data, independently for the directions.
This experiment minimises the time distance between monocular and binocular fixations for the targetwise interleave, and even in the blockwise interleave, the best condition for the distance of a conventional calibration to the binocular payload is achieved, at the price of double the number of calibration fixations relative to the number of payload fixations.
==== Experiment 2 -- mishfd ====
( n = 25, s = 2, r = 6, c = 2 )
In this experiment we tried to compare the two presentation devices in the context of a comparison of subjective and objective methods of determining fixation disparity (FD). As the subjective measurement of FD involves a relatively high number of psychophysical decisions about the relative location of nonius lines, we use ionly the last of them for the comparison. Therefore the calibration was carried out before the last 10 experiment cycles and after them. For the discussion of calibration issues this has the advantage that the experiment has a more realistic payload and time distance between the calibrations than in experiment 1. As the length of the cycles varied in the experimental conditions considerably between 10 fixations and 30 fixations in the other condition, this time distance is an experimental factor.
==== Experiment 3 -- varia ====
( n = 13, s = 2, r = 6, c = 4 )
Here we compared both devices under variation of the fixed time frame and the mouseclick paradigm for the subject in the calibration. As experimental payload we used a vergence shift paradigm. The stimulus in this experiment was a binocular fixation object consting of several crosses on the foveal and parafoveal area at varying horizontal disparities between 20 arc degrees eso and exo. The subjects were instructed to blink between each presentation in order to make the fixations independent of each other as far as possible. This set of measurements is considered a direct test on the usability of the calibration process in determining fixation disparities, because the vergence movements induced are small enough not to be disturbed by proximal or accommodation induced vergence. We may be able to give quantitative figures for the reliability of vergence measurements in this range from the statistics of this experiment.
==== Experiment 3a ====
( n = 13, s = 2, r = 6, c = 2 )
The same experimental condition was used to evaluate the Eyelink II cornea reflex mode on the CRT. As the lighting device for the reflex conflicted with the use of shutter glasses, here the monocular presentation was completely different. The eyes had to be calibrated blockwise for the left and right eye using a paper occlusion device. The idea of the cornea reflex method is that the location of the cornea reflex relative to the center of pupil is less affected by translational movement of the head than either of them alone. The firmware of the Eyelink does not allow to record these two variables directly, only the internally computed value, probably the difference, is accessible. In our opinion, this is inferior than knowing them both. Together with restrictions on the use caused by the fact, that both inputs have to be valid at each instance, we decided not to explore this mode further.
==== Experiment 4 -- 3wort ====
( n = 13, s = 2, r = 3, c = 2 )
Towards a solution of the problem using binocular calibrations and still providing a monocular reference, we inserted monocular presentations of the central target in regular intervals during the course of the experiment. Both monocularly and binocularly calibrated trials were measured at the shutter haploscope, and the subjects were the same as in experiment 3. Payload was the 3-word paradigm as in experiment 2.
===== Results =====
From experiment 1, which simulates the best which can be achieved with monocular calibration without additional incommodation of the subjects using bitebars or other instruments, we report the figures for the monocular components of fixation disparity obtained in a specific mixed effects model. Effectively we compare the monocular and binocular fixations of the same target locations directly.
The model assumes that there may be various sources determining the outcome of the monocular component measurement, namely the subject, the session and the trial inside a session. The underlying assumption is that either a FD is a trait of a person stable over sessions, or at least inside a session (i.e. over a medium time range).
There may be non-subject effects on the FD also in our design, e.g. the target location or a general time effect.
Otherwise the measured FD may be inside a range of random variation which is determined from the repetitions inside the session. We take the significance of a random term on the subject level as indication of a successful determination of a FD component.
Targetwise interleave
Term hor left hor right vert left vert right
Fixed effects
hpos 56.57 55.86 5.21 -3.63
vpos -0.54 0.48 62.12 62.2
hpos:vpos -0.01 0.02 0.22 0
Random effects (SD)
VP hpos 7.74 0.03 1.28 2.02
vpos 0.9 1.34 11.27 9.12
bin 2.4 3.68 0.09 3.5
VP:exp hpos 0.6 5.33 0.07 0.95
vpos 0.74 0.82 1.04 1.03
Residual 30.52 29.13 85.61 77.8
Blockwise interleave
Term hor left hor right vert left vert right
Fixed effects
hpos 55.65 54.09 3.98 -4.24
vpos -0.73 0.13 61.9 60.92
hpos:vpos -0.05 0.01 0 0.06
Random effects (SD)
VP hpos 7.64 6.31 2.13 0.49
vpos 1.13 1.64 10.51 8.75
bin 16.25 12.87 33.6 31.05
VP:exp hpos 0.11 0.37 0.06 0.48
vpos 0.63 0.56 1.13 0.16
Residual 23.03 21.74 77.34 75.85
Raw data are prescaled by division through 5 to yield approximate minutes of arcs.
The expectation on the values in this table are, that there are entries near to 60 for fixed effects in hpos for horizontal measurements and in vpos for vertical, all other small. Only the ''bin'' random effects of the subjects in the horizontal measurements are really interesting, because these comprise the difference between the two experimental designs. At a first look, the blocked design yields larger values for these. The fact that the vertical ''bin'' random effects are even larger, shows that the values should be interpreted with caution.
A further expected property of the estimates for the monocular components of FD can be used to verify that they are consistent with our presumptions, namely the left components should be approximately the negatives of the right components on every level where they are significant. The presumption is that a FD disparity showing over- or undervergence should occur symmetrically in most cases. Of course, this implies that if we find this symmetry in the better conditioned case of targetwise interleave, it also means that seemingly significant FD components in the worse condition cannot be taken as valid data, if they fail to be symmetric. Indeed, in targetwise interleave the estimators correlate nearly at -1, in the blockwise interleave condition they do not even correlate negatively. We searched for the reasons for this negative result, and found that making the statistical method more robust by discarding data of lesser quality is sufficient to get the intended result for these data. Concretely, we discarded a 4th of the data with biggest residual of the calibration regression. The solution to discard bad calibrations is applicable also in real application of a calibration. Therefore we find it sufficient to be confident in the general correctness of our results.
Correlation of estimates of individual monocular components of FD
[interleaved]
> cor.test(ranef(z2.hr.lmer)$VP$binr,ranef(z2.hl.lmer)$VP$binl)
-0.864
[blockwise]
> cor.test(ranef(zb.hr.lmer)$VP$binr,ranef(zb.hl.lmer)$VP$binl)
0.874
In experiment 4 we observed a strange result: In the monocular calibrations the vergence state of the subjects in the monocular presentations is expected to be deteremined by the individual heterophoria. In the experiment, we found the deviations from binocular vergence evidently smaller for the subjects than on the mirror haploscope. As the probable reason for this unexpected difference that the separation provided by the shutter glasses were not sufficient for suppression the peripheral vergence stimuli from the monitor frame, though the calibration target itself was not visible on the false eye. Therefore it is questionable whether on this device the fixations are monocular in the full sense.
===== Conclusions =====
Though many aspects may still need more insight, some conclusions may be made with sufficient certainty. For the main question, whether monocular or binocular calibration is preferable, binocular calibration prevails in at least some aspects clearly:
* Binocular calibration needs only half of the experimental effort, the calibration process is more efficient in terms of stress on the subjects.
* The residual from the calibration regression is smaller. Therefore the confidence intervals of the calibrated values are tighter.
* For monocular reference, given the small amount of practically occurring fixation disparities, minimising the distance in time between the fixation disparity measurements and the points of moncular reference is important, in order that the difference can be made statistically significant at all.
On the other hand, it is clear that, based on binocular calibration only, no absolute fixation disparity is determinable. Only changes can be monitored without additional efforts to address the question of the relation of true eye direction and measurement outcome. The method of inserting monocular control trials seems to be a solution at least as good, but probably a lot better, as standard monocular calibration. From a purely geometric interpretation, it is symmetrical whether monocular calibrations are applied to binocular data or binocular calibrations to monocular data, but from the statistical properties the tighter bounds achievable make a difference. The instrumental needs for both procedures are the same, so the choice should be easy.