REVISED

Vision

DOI: 10.4324/9780415249126-W047-2
Version: v2, Published online: 2010
Retrieved July 18, 2026, from https://www.rep.routledge.com/articles/thematic/vision/v-2

5. Computational models of vision: modularity

Another characteristic feature of Marr’s theory is that it treats the visual system as comprising a number of individual components or modules that can be analysed independently of the rest of the system. A ‘module’ is, by definition, cognitively impenetrable: its operation is not influenced by information external to it that may be available to the cognitive system as a whole, for example, information in the system’s memory (see Modularity of mind). Marr posited a module responsible for computing three-dimensional structure from apparent motion, another for computing depth from disparity information available in stereo images, a third for computing shape from shading. Each of these modules is designed to exploit general environmental constraints in the manner that the ‘structure from motion’ module, described above, incorporates the rigidity assumption.

The various modules operate in parallel, and since they yield information about the depth of the distal scene from different input data, they may give inconsistent results. This is an advantage for the organism, because in cases where the general environmental constraints assumed by a processing module do not hold, the output of the module is subject to correction by another module operating on different data, and exploiting different environmental constraints. For example, imagine a nonrigid mass of jelly moving through space. Since the ‘structure from motion’ module is built to assume rigidity it will probably give an incorrect interpretation of the jelly’s structure. But its output is then likely to be inconsistent with, and correctable by, the output of modules operating on shading or disparity information, which, though they exploit other environmental constraints, do not assume rigidity.

The principle of modular design has an evolutionary rationale. Modular processes are typically fast, because a time-consuming search of general memory is avoided. And assuming that the constraints governing a module’s operation are generally true, the process will normally be reliable. Commitment to the principle of modular design makes the computational theorist’s job easier, since modular processes can be studied and modelled without the theorist knowing how more central reasoning systems work. For all their theoretical advantages, however, modules do pose a general problem. The theorist has to explain how the outputs of various modular processes are combined in a single representation of the structure of the scene. The possibility of inconsistent results from different modules suggests that this is a nontrivial problem.

In general, then, the visual processes posited in Marr’s theory have three important features. They are data-driven, adapted to exploit general environmental constraints, and modular. The visual system, according to Marr, computes a series of intermediate representations of distal information, culminating in a representation of the three-dimensional structure of the scene. The input to the system is the image on the retina, in effect, a grey-level-intensity array. The initial processing of the image produces what Marr called the ‘primal sketch’, a representation of the way that light intensities change over the visual field. The primal sketch makes explicit precisely the information that is required for subsequent processing. Discontinuities in intensity tend to be correlated with significant features of the scene, that is, object boundaries, although it is too early at this stage to assume that all sharp intensity changes in the image indicate edges in the world. Some may be produced by changes in illumination or surface reflectance (see Colour and qualia).

The various processing modules described above operate on aspects of the information contained in the primal sketch. The results are encoded in a representation that Marr called the ‘2.5-dimensional sketch’. It makes explicit the depth and surface orientation of the scene, and is the input representation for later visual processing. The visual system is assumed to be cognitively impenetrable up to the production of the 2.5-dimensional sketch, hence its operation to this point cannot be influenced by learning.

Citing this article:
Egan, Frances and Nico Orlandi. Computational models of vision: modularity. Vision, 2010, doi:10.4324/9780415249126-W047-2. Routledge Encyclopedia of Philosophy, Taylor and Francis, https://www.rep.routledge.com/articles/thematic/vision/v-2/sections/computational-models-of-vision-modularity.
Copyright © 1998-2026 Routledge.

Contents

Vision

5. Computational models of vision: modularity

Related Articles