Physics Based Vision

Physics based vision is about interpreting an image and extracting information about its contents based on an understanding of the underlying physics which govern how the image was formed. Our work in this area has concentrated on one particular problem: that of illuminant estimation. Estimating the prevailing illumination in a scene given only an image of the scene is a much studied problem and one that we have studied from a number of different perspectives. In this work we start from a a particular physical model of image formation called the dichromatic model which was first proposed by Shafer et al [9].

**Figure 1:** The dichromatic reflection model: Light is reflected either by the process of *surface* reflection or *body* reflection.

In simple terms the dichromatic model of image formation says that light is reflected from a surface is reflected in one of two ways, as illustrated in Figure 1. The first type of reflection is known as surface reflection and refers to the case in which the surface acts like a mirror and simply reflects any light which is incident upon it. In this case the light does not interact with the surface in any way and thus the reflected light is spectrally the same as the incident light. The second type of reflection is referred to as body reflection and in this case light enters the surface where it is selectively absorbed and emitted by colorant particles within the body of a material before eventually leaving the surface at some point. The process of body reflection spectrally alters the incident light so that the reflected light depends on properties of the surface as well as the incident light. Light reflected by the body process can be reflected in any direction whereas surface reflected light is reflected in a single direction. Any light, however it arises, can be characterised by its spectral power distribution (SPD) which is a continuous function of wavelength. If we adopt the dichromatic model of reflection then the light

reflected from a point on a surface consists of a linear combination of two distinct lights: body reflected light

and surface reflected light

. That is:

Now suppose that an image is formed by a trichromatic camera with linear response. It follows that the

recorded by the camera for the light

can be expressed as the same linear combination of two

Equation 2 is useful because it splits out that part of the

which depends on the surface (

) and the part which depends only on the incident light (

). In the context of illuminant estimation, it is this quantity which we would like to determine and Equation 2 provides the first step towards obtaining it.

Suppose we consider all pixels corresponding to a single surface. By the analysis above we know that the

of any such pixel consists of two constituent

s corresponding to body and surface reflected light. Pixel values for different points on this surface can differ because the relative amounts of body and surface reflected light they contain can differ. But what Equation 2 tells us is that all pixels for this surface must fall on a plane in

space and that this plane is spanned by two vectors : the body

and the surface

. Consider also the pixels on a second, distinct surface. These pixels two must also fall on a plane in

space though the plane will be different to that of the first. However, these two planes share something in common: one vector which is in their span is (

) which is the

corresponding to the prevailing illumination. In fact, the two planes must intersect at a line and this line of intersection is exactly the line in the direction of (

). Thus we have a simple method of determining the

of the scene illuminant. Such a method has been proposed by a number of different authors [10,11,7,2].

An alternative, but related approach, was suggested by Lee et al [8]. He proposed that we look not at

values but rather at their projection in a 2-d chromaticity space such as the

space defined by:

In this case chromaticity co-ordinates can again be split into a linear combination of the chromaticity corresponding to surface reflectance (and thus the illuminant) and that corresponding to body reflectance:

Equation 4 implies that chromaticity co-ordinates corresponding to a single surface will fall on a line in the 2-d space as illustrated in Figure 2. Co-ordinates for pixels corresponding to a second surface will fall on a second distinct line, pixels for a third surface fall on a third line and so on. Once again the intersection of the two or more lines identifies the illuminant: in this case in terms of its chromaticity co-ordinate.

**Figure 2:** Chromaticities for a single surface fall on a line in chromaticity space (left) delimited by the body and surface reflection chromaticities. The intersection of lines from multiple surfaces identifies the scene illuminant chromaticity (right).

Single Surface Colour Constancy

Intersecting planes (in

space) or lines (in chromaticity space) gives a theoretically sound way to discover the scene illuminant. However the algorithms rely, for their success, on being able to reliably distinguish pixels corresponding to a single surface. What is more, they require at least two surfaces to be able to operate at all. In our research we set out to remove the second of these restrictions and to derive a physics based algorithm which can estimate the scene illuminant given an image consisting of only a single surface.

**Figure 3:** Illuminant chromaticities (blue) and the Planckian locus (red).

Our method works by incorporating prior knowledge of the physical nature of the world into Lee's framework for illuminant estimation. Specifically we enforce constraints on the set of plausible illuminants which might be encountered in the world. Figure 3 illustrates the chromaticity of a wide range of illuminants which occur in the world. It is clear that most of these lights are clustered around the red line drawn in this figure. This line represents the Planckian Locus: it represents the chromaticity co-ordinates of all blackbody radiators whose SPDs can be determined from the following equation:

**Figure 4:** Planckian spectra for three different correlated colour temperature (T). T=2500 (left) T=5500 (middle) T=10000 (right).

Now suppose that we restrict illuminants to those which fall on the Planckian locus and consider an image of a single surface under some arbitrary illuminant. Following Lee et al all pixels in this image will project to a line in chromaticity space which has the form of Equation 5. It follows that we can identify the scene illuminant by intersecting the line of image chromaticities with the Planckian locus thus defining a method for estimating a scene illuminant which relies only on a single surface. Figure 5 illustrates the principal of the algorithm and Figure 6 shows an example of the algorithm's performance on an image of a single green plant. The left-hand image shows the raw image captured by a camera and the right-hand image shows the image corrected using an estimate of the scene illuminant obtained by the method set out above.

**Figure 5:** Single Surface Colour Constancy: Chromaticities from a single surface fall along a line (black) This line intersects the Planckian locus (red). The point of intersection is the estimated illuminant chromaticity.

**Figure 6:** Example performance of Single Surface Colour Constancy. The left-hand image shows the original image captured under illuminant A whilst the right-hand images shows the image corrected to D65 using the algorithm's estimate of the scene illuminant.

Robust Dichromatic Colour Constancy

That we can solve for an estimate of the scene illuminant given the simplest scene content possible: a single surface, is a satisfying theoretical result. However, most scenes have a more complex structure than this and so we would like to extend our method to deal with the more common case of scenes with many surfaces. We would also like to relax the constraint that illuminants must lie on the Planckian locus and allow a wider range of non-Planckian lights to be considered. We consider first the relaxation on the illumination constraint. Looking again at Figure 3 which shows the chromaticities of a wide set of measured illuminant spectra we see that a more accurate representation of these plausible lights is a small subset of chromaticity space. We propose as others have done previously [3,5] to model this set by the convex hull of the points as illustrated by Figure 7. Now, we argue that any chromaticity in this set might correspond to an illuminant.

**Figure 7:** Plausible illuminants can be characterised by the convex hull (red polygon) of a set of commonly occurring illuminants (left). Intersecting this polygon with lines corresponding to pixels from distinct surfaces leads to an estimate of the scene illuminant (right).

Extending the algorithm to deal with multiple surfaces implies that we must be able to segment our image into regions corresponding to different surfaces. Suppose for the moment that such a segmentation is attainable. Then we know that pixels belonging to a single surface must fall on a line in chromaticity space (or a plane in

space). The right-hand plot of Figure 7 illustrates that a line corresponding to a single surface intersects the set of possible illuminants and so restricts the feasible illuminants to the line segment which lies within the set. Adding a second surface, in theory identifies the illuminant exactly: the lines from the surfaces intersect at a point which falls within the set of possible illuminants. In practice this ideal case might not occur: we have many surfaces whose chromaticity lines may not all cross at a single point and moreover, this point may fall outside the set of possible lights. To deal with these practical difficulties we pose the problem as an optimisation such that we wish to determine the point which best represents the intersection of all chromaticity lines and which is further constrained to fall within the set of possible illuminants. It turns out that the necessary optimisation belongs to a class of problem which is called Sequential Quadratic Programming [6]: essentially we minimise a quadratic error function subject to a set of linear and quadratic constraints.

To complete the algorithm it remains to specify how a scene should be segmented. Accurate segmentation of image data is very difficult to achieve but we have found that our algorithm is robust to even large inaccuracies in the segmentation. In fact, we have found that very good results can be achieved even using a very naive segmentation in which we simply divide the image into a number of small sub-images. If a sub-image corresponds to a single surface its

s ought to lie on a plane in

space. If they do not, we reject this sub-block and in this way obtain a set of blocks which usually correspond to single surfaces.

In other work on this method we have experimented with representing the set of possible illuminants not by its convex hull but using an

-shape [1] representation which we have found further improves accuracy. Work is ongoing to obtain a more robust segmentation of the scene and to also combine this physics based approach with statistical techniques for estimating the scene illuminant which have also been developed in the group [5,4]. on this method to further improve the seg