Human Vision, Saccades and Robots

The human eye and the camera in an artificial vision system both function as image acquisition devices so they share important commonalities. For example they both have a lens, they both have mechanisms for focusing the lens, and they both have a light sensitive area—film, pixel array or retina—where the image is formed. However the design of the ordinary camera and the eye differ in one significant way: artificial vision system cameras capture an entire frame then proceed to analyze the image, whereas if our eyes tried it the same way we would end up not seeing as well. For one thing, the high-resolution part of our retinas (fovea) covers only the central 2 degrees of our field of view so a single frame would be fuzzy except in the very center. More importantly, the object(s) of interest almost never occupy the entire 180 degrees of the human field of view, so a dynamic image capture process where only the regions of interest are imaged onto the fovea would make more sense. Movements of the eye that quickly place different regions of interest onto the fovea are called "saccades."


Figure 1 (Source: schorlab.berkeley.edu)	Figure 2 (Source: en.wikipedia.org/wiki/Saccade)

Figures 1 and 2 show saccades in action. In these separate experiments the saccadic movements of the eyes in between fixations were tracked as the subject viewed a face. Note the most important areas of interest for processing a face image seem to be the eyes, then the lips and nose area, and finally an overall view of the head. This hierarchy is indicated by the density of the saccadic movement lines and fixation points as each subject viewed the image.

Unlike the simple single-frame method of image capture used by artificial vision systems, saccades require complex interaction between the eye and higher level circuits of the human vision system. Obviously, recognizing what objects in the scene are of interest and therefore need saccades and fixation requires high level processing. But even at a lower level, the system must block visual processing during a saccade so as to reject the blurry image that results from the motion of the eye. This is why we can never see the motion of our own eyes in the mirror. Try it! Interestingly, we don't even consciously experience this brief period of blindness. This phenomenon is known as "saccadic masking."

Another engineering problem that the saccadic approach to image capture must solve has to do with the fact that saccades can overshoot or undershoot the target. Once a saccade is started there is no feedback for course correction. Saccadic eye movement it is an entirely ballistic process like throwing a ball; once thrown you can't change its trajectory. For this reason a saccade is sometimes followed by a smaller saccade to zero in on the target. But this correction creates its own problem: how do we recognize the target now that the first saccade has moved it in our field of view? We would need some sort of short term visual memory to remember the identifying features of the original target so the corrective saccade can move to it. It turns out we do have a visual short term memory but whether or not it is used for this purpose is still being studied.

The average industrial vision system doesn't need the sophisticated image capture mechanism of the human eye, but our fast and efficient saccadic approach is of high interest in robotics applications where visual processing at a human level is required—such as driverless cars. The following link shows a robot programmed to become interested in moving objects in its periphery, and then use saccades to center it in its field of view.

Human Vision, Saccades and Robots

Current Issue

October 2024

Related Stories

Design of Iseikonic Lenses, Part Two

Current Issue

October 2024

Related Education Topics

Aging and Visual Impairment

A Scary Time for Eyes