3D graphics, artificial vision and the inner eye

 

Christopher Isaac Larnder, Chair
ACM SIGGRAPH-Montréal


Delivered as an introduction to the
October 2001 meeting of Montreal ACM SIGGRAPH

 


Tonight's invited speakers will explore the theme of artificial vision from two very different, but complementary perspectives. Dr. Sawan, from the Ecole Polytechnique, will present his daring work on an artificial eye which will send digitized images directly into the human visual cortex, bypassing the retina and the optic nerve altogether. Dr. Roy, from the Universite de Montreal, will demonstrate computer programs that can reconstruct a 3D scene from pairs of images, analogous to our ability to perceive depth and move through 3D space by combining the stream of images captured by our left and right eyeballs.

On the surface, these two Montreal researchers are working in completely non-overlapping fields of research. Dr. Roy is contributing to a relatively new, but rapidly evolving field within computer graphics community, lying somewhere between 3D modeling and image processing. Dr. Sawan's primary interest, on the other hand, is biomedial applications of modern microelectronics, of which his electronic eye is but one example.

Both of these researchers, however, are contributing to a vast change in how we understand the human brain and the nature of subjective perception. How do we understand and remember what we see? What goes on inside our head when we visualize something we have never actually seen or done? How do we store and recall old images? These questions, in one form or another, are as old as art and philosophy, but in the context of today's computer graphics technologies, they take on a new form with very practical implications for both industry and modern culture.

Dr. Sawan's electronic eye easily captures the popular imagination and the media's attention, fitting well into the futurist fascination with the cyborg man-machine, robotics, artificial organs and the bionic man. Various human-computer interaction devices have become familiar to the computer graphics community, such as head-mounted displays for virtual reality applications, or "exoskeleton"-style motion tracking devices. An artificial eye, however, especially one that is grafted directy onto the visual cortex, is arguably the most extreme example of a man-machine interface in today's world.

While Dr. Sawan replaces the eyeball, the retina and the optic nerve, Dr. Roy's 3D reconstruction program functions very much like a visual cortex, the next organ after the optic nerve, in the brain's visual perception pipeline. The input to the visual cortex is 2D image data, the output is, roughly speaking, a recognized set of 3D shapes arranged in a 3D space. In psychology, this is called the mental map, or internal representation. It is not the image itself, it represents rather our understanding of the world conveyed by the image.

The human brain runs an extremely elaborate reconstruction program inside our heads, continuously updating this mental map of the world using input from the senses. In today's computer graphics, the analogy of the mental map is the scenegraph, and long-term memories or snapshots of the map are the equivalent of 3D modeling file formats. So, somewhere in the visual cortex, some form of scenegraph-like data is being communicated to higher levels of consciousness. Of course, the formats actually used by the brain are more sophisticated than today's lists of vertices and surface vectors, having evolved more compact and higher-performance 3D representations over the millions of years of experimentation we have had as a species, but the principles are identical. Visual simulation applications and video game engines everywhere are recreating, in crude form, the basic spatio-cognitive information-processing dynamics going on inside the human brain.

Perhaps the 3D graphics card manufacturers, looking for faster and more compact 3D representations, should start paying attention closer attention to advances in neuroscience research. And neuroscientists could likely gain insight into the functioning of the visual cortex, if not of the entire brain, by learning more about the hardware and software issues particular to the computer graphics community.

The importance of cross-disciplinary cooperation with neuroscientists has also been recognized by the robotics community. The international conference on humanoid robotics, hosted just south of us at MIT last year, invites submissions from biologists as part of the core program. The exchanges bring insight into the development of motor control systems, as well as artificial vision, the latter being arguably the biggest bottleneck in the advancement of intelligent robotic systems.

The eye and the experience of seeing have always been an important metaphor for the nature of mind and perception. The existence of 3D reconstruction layers seperating our mind's perceptions from the outer world of immediate experience has been long recognized among philosophers, artists and scholars in many cultural traditions. The perennial problematic of mind could be formulated as follows: We continuously process image data from our outer eye in order to update our mental map, but our inner eye views only this resulting map, not the original image. It is all computed in real time, so to speak, so we don't notice the intervening layers of computation, and the sensation is quite convincing that we are "seeing through our eyes" directly. So convincing is this illusion that it is only recently that experiments have been devised to demonstrate that we indeed do not see directly what our eyes see. The best known is the one in which patients were made to wear special glasses that made them "see upside down". After a time of disorientation, they actually recovered upside-right vision, even though their eyeballs were still receiving upside-down images.

The evolutionary importance of this abstraction layer in maintaining a stable mental map in the face of variable input conditions is obvious. The drawback, however, is that a rich diversity of visual stimuli are always reduced to the same set of 3D "scenegraph" icons. Indeed, the history of modern visual arts can be seen as a series of attempts to circumvent this built-in reconstruction program in order to open up new and more direct ways of responding to the raw perceptions: Impressionism, cubism and completely non-representational visual arts just cannot be assimilated using the standard 3D-reconstruction mindset, forcing the mind to take in the "raw" experience directly, and producing new visual languages.

"I see" has long been used metaphorically to mean both "I perceive" and "I understand" in a general way. When computers became widely known and used, they were quickly adopted as a new metaphor for the functioning of mind. The advent of computer graphics and computer vision can marry the technological metaphor to the traditional one and thus bring a powerful new vocabulary to the age-old mysteries of mind and perception.

Individuals like ourselves, working in the fields of computer graphics or artificial vision, find ourselves thinking in new ways about sight, shape, light and the nature of perception. Full assimilation of this knowledge, initially formal and technological, inevitably leads to a personal reevaluation of our own subjective experiences of seeing, imagining and dreaming: Following closely on the technological revolution is a cultural one. By playing with these metaphors, we not only bring a sense of unity to diverse fields of endeavor, we also create a basis for exchange with non-technological cultures. Such collaboration may be our best chance at making sense of our rapidly changing world.