DOI: 10.4324/9780415249126-W047-2
Version: v2,  Published online: 2010
Retrieved July 15, 2020, from

7. Bayesian models of vision

Computational models based on Bayesian decision principles are currently very popular (see Probability theory and epistemology §§2–4). This approach follows Helmholtz (see Helmholtz, H. von) in treating vision as a species of unconscious inference, in particular, as probabilistic inference. Bayesian theories treat the visual system as an ideal observer that uses prior knowledge about visual scenes and information in the image to infer the most probable interpretation of the image.

The fundamental idea underlying Bayesian perceptual models is that the posterior probability of a possible real–world structure S is proportional to the product of the prior probability of S (that is, the probability before receiving the stimulus I) and the likelihood (the probability of I given S). Prior probability distributions in typical applications of the Bayesian strategy represent knowledge of the regularities governing object shapes, constituent materials, and illumination, and likelihood distributions represent knowledge of how images are formed through projection on the retina. Some examples of prior knowledge that figure in Bayesian models are that solids are more likely to be convex than concave and that the light source is above the viewer.

The Bayesian approach provides a framework for taming the ambiguity and complexity in natural images. Perception in the Bayesian framework is explicitly seen as a trade-off between image reliability – p(I/S) – and the prior p(S). The less likely the image is given a structure – in other words, the more ambiguous the image – the greater the influence of prior knowledge in yielding a nonambiguous percept. Some perceptions may be more data-driven, others more knowledge driven. The Bayesian framework provides a schema for explicitly comparing the relative contributions of image data and prior knowledge in alternative proposals. Perceptual constancies – the fact that the visual system is able to detect a fixed structure in successive retinal transformations due to movement, or viewpoint or illuminant changes – are modelled in the Bayesian framework as discounting, where the confounding variables – motion, or viewpoint, or illuminant – are discounted in the computation by integrating them out, or summing over them.

Bayesian models rely on statistical analysis of natural images and their real-world causes to arrive at plausible hypotheses concerning image formation (p(I/S)) and prior knowledge about naturally occurring structures (p(s)). Uncovering statistical regularities relating image features to object or scene properties has enabled theorists to design systems that group images consistently with the natural constraints noted above (such as that nearby edges with similar orientations belong to the same contour). Such work has yielded computer vision solutions for edge detection, face recognition, interpretation of bodily movement, and provided insight into the functional nature of certain kinds of visual illusions.

The Bayesian framework affords several advantages for the study of human vision. Perhaps most obviously, it provides a convenient and natural framework for studying all aspects of perception in a unified manner, by treating perception as a Bayesian decision problem. Secondly, Bayesian methods allow the development of quantitative theories at Marr’s topmost level, avoiding premature commitment to specific neural mechanisms. Thirdly, Bayesian theories explicitly model uncertainty, and hence are an important tool in understanding how the visual system might combine large amounts of objectively ambiguous information to arrive at percepts that are rarely ambiguous. Finally, as noted above, it provides an explicit account of the interaction between information in the stimulus and prior knowledge of the world.

Nonetheless, Bayesian visual modelling raises some pressing questions about the appropriateness of particular Bayesian models and of the Bayesian approach more generally for understanding human vision. One question concerns how the visual system knows the relevant priors. Some priors, or strategies for learning priors, are assumed to be innate, encoded in our genes. For the reasons discussed above in §4, the idea of innate knowledge available to the visual system is quite plausible for general natural constraints but less so for specific knowledge concerning visible properties (shapes, textures, etc.) of specific objects. For those priors which are not plausibly innate, the question is whether the visual system can learn the relevant probability distributions p(S) and p(I/S) from the available data. There are further issues concerning priors. It is likely that some probability distributions will be context sensitive; compare, for example, a forest with a city scene. It is conceivable (in fact, likely) that more than one prior will be applicable in a given context. What does the system do when priors are inconsistent? How are priors updated when the environment changes? These questions suggest directions for further research.

Another issue concerns the idealization inherent in the Bayesian framework itself. As mentioned above, this has certain advantages, notably generality and simplicity. But because human vision is limited not just by the partial nature of the information available – a feature nicely modelled in the Bayesian framework – but also by the available neural hardware, we might expect significant departures from optimality. The assumption of logical omniscience central to Bayesian epistemology – that degrees of belief satisfy the probability laws – is an issue for Bayesian perceptual theories as well. Is it reasonable to assume that the visual system knows the probability calculus and operates according to it?

Citing this article:
Egan, Frances and Nico Orlandi. Bayesian models of vision. Vision, 2010, doi:10.4324/9780415249126-W047-2. Routledge Encyclopedia of Philosophy, Taylor and Francis,
Copyright © 1998-2020 Routledge.

Related Articles