Version: v2, Published online: 2010
Retrieved July 07, 2020, from https://www.rep.routledge.com/articles/thematic/vision/v-2
4. Computational models of vision: general approach
The predominant theoretical approach in cognitive psychology in recent years has been computationalism, which treats human cognitive processes, including perceptual processes, as a species of information processing (see Mind, computational theories of). Computational theories of vision attempt to specify the aspects of the external world that are represented by the visual system, and to characterize the operations that derive these representations from the information contained in the retinal image.
One of the most prominent early computational vision theorists was David Marr (1945–80), a researcher in the Artificial Intelligence Laboratory at the Massachusetts Institute of Technology. While the details of Marr’s specific computational model have been challenged by later theorists, his work is of continuing interest to philosophers and psychologists concerned with the foundations of the computational approach to vision. Accordingly, I will use Marr’s theory to highlight significant features of the computational approach.
Marr argued in his book Vision (1982) that an information-processing capacity can be analysed at three distinct levels of description. The ‘theory of the computation’ is a precise specification of the function computed by the mechanism, in other words, what the mechanism does. For example, the theory of the computation for a particular device may tell us that it adds numbers, or computes averages when given a list of numbers as input. The algorithm specifies the procedure or rule for computing the function, and the implementation level describes how the computation is carried out in neural or computer hardware. The first two levels in the hierarchy – the abstract characterization of the problem and the rule for its solution – exemplify a fundamental commitment of the computational approach: that cognitive processes can be understood in a way that is independent of the particular mechanisms that implement them in the brain.
Computational models treat the visual system as computing from the retinal image a representation of the three-dimensional structure of the distal scene. Marr’s theory divides this process into three distinct stages, positing at each stage the construction of a representation that makes explicit (some of) the information contained in the image and represents it in a way that is efficient for later use. Various computational processes, some running in parallel, are defined over these representations. The algorithmic level of description characterizes the procedures the visual system uses to produce increasingly useful representations of the scene.
Most of the processes that Marr describes are data driven, or ‘bottom up’ – they operate on information contained in the image, without supplementation by information or beliefs about specific objects and features in the scene. These processes use information about intensity changes across the visual field, or the orientation of surfaces, not such facts as that objects of a particular shape typically make good cutting implements. Marr advocated ‘squeezing every ounce of information out of the image’ before positing the influx of supplementary knowledge.
Data-driven models of perception have a number of advantages over hypothesis-driven models which appeal to high-level knowledge very early in visual processing. Data-driven processes are generally faster – the visual system does not have to retrieve the relevant piece of specialized knowledge before processing the information in the image – and tend to be more reliable. In Marr’s model, the point at which high-level information is available to the visual system marks a distinction between early and late vision. Early visual processes are said to be ‘cognitively impenetrable’ by the subject’s beliefs about the world (see Modularity of mind). As a consequence, they cannot be influenced by learning.
Marr emphasized the importance of the ‘topmost’ level of description – the theory of the computation – in developing accounts of human cognitive capacities. He noted that there is no point attempting to describe how a mechanism works before knowing what it does. A crucial first step in constructing a theory of a perceptual capacity is discovering very general constraints on the way the world is structured that enable adapted organisms to solve perceptual problems in their normal environments. An example should make the point clear. Marr’s student and colleague Shimon Ullman (1979) proved that three distinct orthographic views of four noncoplanar points are sufficient to determine the three-dimensional structure of a rigid body (the ‘structure from motion’ theorem). If a body is not rigid, much more information is required to compute its shape. In a world such as ours, where most things are relatively rigid, a visual system built (that is, adapted) to assume that the objects in its environment are rigid would be able to compute the structure of those objects more easily and quickly than a visual system that had to consider the many nonrigid interpretations consistent with the data. Accordingly, Marr posited a mechanism that, given three views of four noncoplanar points as input, computes the unique rigid interpretation consistent with the data.
Recall Berkeley’s objection to the geometric theorists’ accounts of size and distance perception. He claimed that the information required for the postulated calculations was not generally available to the visual system, nor to the organism. Such a criticism, if true, is devastating for a computational account of a cognitive capacity. Any computational theory that posits processing beyond the computing capabilities of the mechanism, or that relies on information unavailable to the mechanism, is a nonstarter as a biological model. An important lesson of Marr’s work is that the theorist must attend to the general structure of the organism’s environment before attempting to characterize computational mechanisms, because the environment determines the nature of the computational problems that the organism’s visual system needs to solve. The perceptual systems of adapted organisms can be assumed to ‘exploit’ very general information about the environment. Consequently, the problems they have to solve may be simpler and computationally more tractable than might initially be assumed.
The work by Gestalt theorists to characterize perception in terms of general organizational principles, mentioned above, can be seen as the articulation of general environmental constraints and hence as contributing to the specification of theory at the topmost level in Marr’s hierarchy. These principles are justified by reference to very general features of the environment. For example, proximity, the idea that nearby elements tend to be grouped together, reflects the fact that objects are cohesive.
Egan, Frances and Nico Orlandi. Computational models of vision: general approach. Vision, 2010, doi:10.4324/9780415249126-W047-2. Routledge Encyclopedia of Philosophy, Taylor and Francis, https://www.rep.routledge.com/articles/thematic/vision/v-2/sections/computational-models-of-vision-general-approach.
Copyright © 1998-2020 Routledge.