Human Visual System: A Quick Introduction (Part 1: Encoding)

Human visual system (HVS) constitutes our sensory organs (eyes) and parts of the central nervous system (retina, optic nerve, and visual cortex). It gives us the ability to detect light and interpret the world around us. The insight into its working is a crucial tool to have if we were to design optimal displays and rendering algorithms. Though this topic is not usually considered in traditional computer graphics, it is well-studied both physiologically and behaviorally in vision science. While there are many ways to classify and study the visual system, in this blog, I follow the taxonomy used by Wandell to provide a broad overview of different stages of HVS: encoding (transformation of light into neural signals), representation (organization of neural pathways), and interpretation (inference of scene properties from neural image).


This blog is meant as a quick introduction to readers unfamiliar with the workings of the human visual system. For more in-depth coverage of the topic, I refer the readers to the seminal textbook on vision science Foundations of Vision.


Encoding



The first stage of the visual system involves encoding incoming light by the eye. The process involves focusing the light through optical elements in the eye onto the retina consisting of light-sensitive elements called photoreceptors that convert the light into neural signals. The encoding stage imposes fundamental limits on our visual perception, effects of which can be found in the later stages of HVS. Let’s look at the functioning of the eye’s optics and the structure of its retina.


Optics

Following the above figure from right to left, light enters the eye through the cornea. The cornea bends the light to help the eye focus and is responsible for 2/3 of the eye’s optical power. Some of this light is blocked by the pupil (similar to a camera’s aperture), and the pupil’s size is controlled by the iris (colored part of the eye), thereby controlling the amount of light entering the eye. The pupil can make a difference of up to 10 times the light intensity. Reducing the pupil size reduces the chance of rays deviating from their path and increases image sharpness. However, it also reduces the amount of energy available to activate photoreceptors on the retina and may introduce additional blurring due to diffraction. Next, light passes through the lens. The lens and the cornea help to focus light correctly on the retina. Objects at different depths get focused at different distances behind the lens. These distances can be calculated as a function of the power of the lens using Lensmaker’s equation. When we view faraway objects, the lens + cornea power is roughly 60 diopters. To focus on nearby objects, the ciliary muscles attached to the lens change its shape to increase its power and this process is known as accommodation. Another thing to note is that different wavelengths of light may also be focused at different distances leading to chromatic aberration. The empty chambers between lens and cornea and between lens and retina are filled with fluid substances called aqueous humor and vitreous humor, respectively. They help provide nutrients to different eye components and maintain its structure. They have refractive properties similar to that of water.


When the light hits the retina, some of it is reflected back. This escaping light can be captured using an ophthalmoscope to image the retina and thus measure the response of the eye’s optical system. Experiments using this procedure have determined that the eye’s optical system can be well modelled as a shift-invariant linear system. This finding has a very practical implication: if we measure an eye’s response to a point source of light, say, a single pixel, we can calculate the complete 2D image projected on the retina through a simple convolution operation. The measured response is called the point spread function (PSF) of the eye. It is usually a circularly symmetric function but defects in eyes such as astigmatism can change that. When represented in terms of spatial frequency, the same function is called optical transfer function (OTF). OTF is a complex value function which describes the scale and phase shift the optical system induces to each spatial frequency. A plot of the magnitude of our eye’s OTF (called modulation transfer function) is shown in the figure below.



As seen in the plot, our eye’s optics does not respond equally to all frequencies and act as a low pass filter with a cut-off frequency of around 60 cycles-per-degree (cpd).


Retina

The light entering through the eye is projected onto the photosensitive layer on the inner wall of the eye called the retina. This layer is covered with special neurons called photoreceptors that convert light into electro-chemical neural signals. There are two fundamentally different types of photoreceptors: rods and cones. Rods are activated at low illumination (scotopic light levels), while cones require higher light levels (photopic light level). Intensities at which both are active are called mesopic light levels. The cones are responsible for color vision and can be further divided into three sub-types of L-, M-, and S-cones depending on the range of wavelengths they are sensitive to.





The spatial arrangement and the density of rods and cones vary widely over the retina and is an important factor to study as it determines how the retinal image is sampled and passed on to the subsequent stages of the visual system. There are approximately 5 million cones and 120 million rods in each eye. Rods are present in high density, but many rods converge onto a single neuron and thus have very poor visual acuity but high sensitivity. The region of highest visual acuity on the retina lies in its center and is called the fovea. The fovea has no rods but a high concentration of cones. The cones are tightly packed in the fovea and form a regular pattern that allows them to sample image signals up to 60 cpd accurately. The cone density (and the visual acuity) drops rapidly as we move away from the fovea and intermix with rods in a random manner. This irregularity helps replace aliasing resulting from a low sampling rate with visually insignificant noise.