3D Graphics and the Telecommunications Era
ACM SIGGRAPH-Montréal
It was exactly one week ago today that NaN Inc., the creators of Blender3D, shut down and filed for bankruptcy. Blender3D is a 3D animation tool that was distributed free of charge. It had gained a certain popularity: I remember having seen books devoted to it in Montreal bookstores, although I imagine they have already been removed from the shelves by now.
NaN's strategy was to distribute their tool for free, but charge money for those who want to use it for web publishing. It didn't work, it seems: A big disappointment for independent animators and free software fans, it underlines for everyone the importance of ensuring that there is a good business model backing up software projects that are not volunteer-based.
Tonight, we will be hearing about two tools with very different modes of financing. AXEL is developed by employees at MindAvenue, a privately-owned company right here in Montreal. Whereas Nan gave away their tool for free but charged for web publishing, MindAvenue charges for their tool, but allows unlimited web publishing for free. Only time will tell the strength of this business model. FreeWRL, in contrast, a web3D tool funded by the Canadian government in Ottawa , is free for any use whatsoever and is open-source.
The fusion of 3D graphics and telecommunications leads to some interesting analogies with brain function and traditional spoken-word communication. I thought it would fun to explore this a bit tonight, as it builds on some analogies that were discussed during our October meeting on the theme of artificial vision.
In October, Dr. Roy demonstrated the reconstruction of a 3D scene from stereo images. We had discussed how this reconstruction is analogous to the function of the human visual cortex, which receives image data and converts it into 3D scene data. Only the 3D data is stored, and presumably used in the higher levels of brain processing, while the original "raw image" data itself is thrown away.
One important result of this perception process, which is very relevant to telecommunications, is data compression: We all have the impression of being able to recall childhood images in detail, and yet there is just no way that the brain could store all the years of high-resolution stereo footage that a person views in a lifetime. The answer, of course, is that we only store shorthand descriptions derived from images, high-level cognitive symbols which, when recalled, are "composed" in some way into the sensation of seeing an image.
3D web players operate on exactly the same principles. The content is stored as a digital "words", a high-level representation in terms of 3D objects. When somebody wants to view the content, no images are actually transmitted, only the "pure" high-level representation. The final image is generated only on the client's computer. The benefits are the same as those within a human brain: bandwidth and data compression.
This is strikingly similar to traditional spoken-word communication: when I say "a blue house with white windows and two trees in front", you immediately generate a rough image in your mind, without me having to show you an actual picture. In the case of 3D web technology, every receiver of the message reconstructs the identical image, and considerable cleverness goes into the choice of vocabulary for this message and into the client program that reconstructs images from it.
The brain's perception process, which extracts 3D data from raw image data, is not only important in terms of data compression, but is also essential in our ability to develop higher and higher levels of abstraction, fundamental to reasoning and human thought. Contrary to 3D web players, these higher levels of reasoning operate on abstract 3D data directly, without ever composing or "rendering" the data into 2D images.
Analogies for such non-visual operations can be found in the operation of distributed simulations and on-line distributed games. Events in the simulation are triggered, for example, by how close you are to a bomb, or what surface you are in contact with. Many such computations take place continuously throughout the duration of the game, and they all operate directly on abstract geometrical properties, independently of whether 2D images are being rendered or not. This also means that players need not be human: independent software agents can also "see" and interact with the game elements by inspecting and modifying the 3D data representations directly, similar to the way text-based search engines "see" web pages by inspect the high-level markup tags in HTML pages.
The power of 3D graphics on the web is the ability to replace the transmission of low-level, large-data images with the transmission of small, high-level, word-based descriptions of 3D scenes. The analogy with brain function and traditional human communication is indeniable, and seems to be a useful way of making the technologies we use understandable in human terms.
They say a picture is worth a thousand words. Next time you are stuck waiting for an image to dowload on your favorite browser, you will have to time to note how this traditional saying continues to be relevant to the realities of the digital age: A picture today costs a lot more than a thousand words!