Peter McCullagh’s Observation on Cauchy Statistics
Author: Nicolae Mazilu
Published on Saturday, January 21st, 2012 in category ProtoQuant
The modern problem of visualization of tensor quantities depends obviously on the extension of those quantities along different directions in space. In case the representative matrix of a certain quantity is a symmetrical matrix, there are three orthogonal directions of extreme values of the quantity, and those values are the eigenvalues of the matrix. The properties of these eigenvalues as roots of a cubic equation, warrant a certain statistical argument, whereby the trilinear coordinates that can be constructed from the eigenvalues are probabilities (Kindlmann, 2004). This theory seems to be too rigid in order to respond properly to the necessities of regularization involved in practical problems where the evolution of physical quantities is in focus. In order to better illustrate the problem here, let’s reproduce the “eigenvalue wheel” from the Figure 2.6a of Kindlmann’s work just cited, using however our own notations.
The three eigenvalues (x1, x2, x3) are ordered as suggested in the figure, and being real they can be represented on the very same axis of real numbers. One always can find on a certain circle having the center in their arithmetic mean, a set of three points at 120º with respect to each other, such that their projections upon the axis containing the center of the circle are our eigenvalues. It is obvious that the construction of the circle – its radius and the three points – depend heavily on our eigenvalues. For instance one cannot represent all of the three eigenvalues in our figure as projection of points on a circle having the radius arbitrarily small, unless we accept some kind of non-Euclidean geometry. In general, a specific physical problem involves certain eigenvalues at certain position in space, so that the statistical appearance of this construction, when related to measurement, is more on the sampling side of the issues rather than on the side of probabilities themselves. On the other hand, the evolution itself inside the circle of a fixed radius involves an arbitrary angle of rigid rotation of the triad of points whose projections are our eigenvalues, so that this evolution can appear as a gauge evolution.
In view of these observations, it seems that the problem of regularization of representations of a tensor may substantially benefit if we consider it as a problem of sampling, maintaining of course all along the initial idea of probabilistic approach. It just happens that the sampling has here a nice connotation from algebraical point of view. It was Professor Peter McCullagh from the University of Chicago, the one who has noticed a curious property of the one-dimensional Cauchy distribution, which is obtained as a benefit of a complex parameterization of its probability density (McCullagh, 1996). The parameters of a statistical distribution are usually taken as real, but McCullagh shows a clear advantage of representing them in a complex form, at least when it comes specifically to the Cauchy distribution. At the risk of repeating perhaps well-known algebraic facts, and maybe for the benefit of some, we give here explicit calculations following exactly Professor McCullagh’s guidance from the work just cited. He starts with the fact that this distribution for a single Cauchy variate X can be written in the form
|
(1) |
where θ is the ‘complex parameter’ of the distribution, and |…| denotes either the absolute value or the modulus as the case may present itself. The real part of this parameter gives the location of data, while the imaginary part roughly characterizes the spread of their distribution. One knows that this class of distributions is closed with respect to the homographic transformation of the variable: any linear fractional transform of X has also a Cauchy distribution. In fact, this is the essential way to characterize this class of distributions (Knight, 1976; Knight, Meyer, 1976). However, Professor McCullagh has shown that the complex representation of the parameter brings to light one of the most important consequences of this theorem: if X belongs to the Cauchy class with the complex parameter θ, i.e. symbolically X ≈ C(θ), then we have
|
(2) |
This property allows us, as Professor McCullagh shows, to give efficient estimators for the complex parameter θ, based on the principle of maximum likelihood. The procedure is as follows.
As a rule, the likelihood function used in estimations is simply the product of the values of the probability density for the different measured values of X. In taking the maximum likelihood with respect to parameters, it would be therefore appropriate to work with the logarithm of the likelihood, and this is what practically happens. For instance if one measures two values of X having the probability density (1), say x1 and x2, the likelihood function constructed based on this information is simply:
|
(3) |
The likelihood is maximum with respect to θ when the derivatives of this function with respect to θ1 and θ2 are null. In terms of the log-likelihood, which is a lot easier to handle, we then have:
|
(4) |
In view of the fact that
|
(5) |
where the summation index runs over the two measured values, and a star denotes complex conjugation, the two equations (4) become:
|
(6) |
Therefore the sum here is a purely imaginary number, as we assume that the values xj are real. The second one of these equation shows that
|
(7) |
If we sum up here and clear the denominators, we get
|
(8) |
Solving this equation shows what one already knows well about the Cauchy distribution. First, with the information of only two measured values we cannot have an estimation for the mean; it can be any value between the two measured ones. As to the variance estimator, it is also indeterminate, but this is quite a natural characteristic, so to speak, of this type of distribution, because it has no finite moments of higher order.
At this point we can easily see the advantage of equation (2): it shows that the best definition of the Cauchy distribution from sampling involves just as many measured values of X, as the definition from sampling of a real linear-fractional, or Möbius transformation, to use the terms of McCullagh. Therefore we need to have three measurements of the statistical variable X, in order to determine a Cauchy distribution the best possible way. The general estimator will then be calculated from a particularly convenient Cauchy distribution through a well-defined transformation. Let’s do some calculations.
In equations (6) and (7) nothing changes, except the fact that the sum should be now performed on three values of X, say x1, x2, x3, instead of two. So, instead of (6) we have
|
(9) |
and instead of (7) we have
|
(10) |
as well as the complex conjugate of this equation.
Now, the direct calculation of the estimators for θ1 and θ2 is rather tedious in general. Nevertheless, we can simplify it here, using the property (2). The procedure amounts to choosing three particular values for X, say –1, 0,1, and calculate the estimator of θ for them; then take the homographic transform of this estimator through the homography that carries –1, 0,1, into the values x1, x2, x3 of X. Indeed such a real homography is well determined. Let us consider that the values (x1, x2, x3) do correspond to the values (–1, 0,1) in this order. If the matrix of this homography has the entries a, b, c, d, then we can find it up to a normalization factor from the system of equations
|
(11) |
This gives
|
(12) |
The problem is now to find the estimator θ for the particular values (–1, 0,1). This can be easily done from equation (10) and its complex conjugate, which give the system
|
(13) |
Therefore, in this particular case we have simply i/√(3) as an estimation for the complex parameter θ: it is purely imaginary. The estimator according to arbitrary data (x1, x2, x3) will then be obtained through the homography given by equations (12):
|
(14) |
In real terms we have:
|
(15) |
where the summation extends over the positive permutations of the indices. Therefore, the complex estimator of the Cauchy distribution is in close relationship with the Hessian of the cubic having the roots (x1, x2, x3). More to the point, it is the root of that Hessian. Indeed, in terms of the roots of a cubic equation its Hessian is:
|
(16) |
The roots of this equation are θ above and its complex conjugate. The expressions from equation (15) are the real and imaginary parts of these roots, as one can easily convince oneself, or can refer to the specialty literature (Burnside, Panton, 1960). Even more, the sum and the product of the two complex estimators are given by the mean and the standard deviation of the three values, with respect to the system of probabilities:
|
(17) |
which they determine quite naturally. This is exactly the initial ‘statistical’ spirit of Gordon Kindlmann’s discussion, but with an extra bonus: one doesn’t need to limit the values of the three eigenvalues, as in the case of their use in tensor glyphs, for instance. They can be placed anywhere along the real axis, and a statistical theory makes sense. And not only from this limited point of view, but also from the more general space show of matter, so to speak.
To conclude: in this “statistical interpretation”, the root of the Hessian of a cubic representing the characteristic equation of a tensor, is the complex parameter of a Cauchy distribution. The roots of the corresponding cubic are three measurements – as actually they always are! – of the Cauchy variate. The root of Hessian gives then the most reliable estimate of the parameter of distribution. The problem with this connection between cubic and its Hessian is that the Cauchy distribution is referring to a one-dimensional variate. However, everything gets in order if we take the Cauchy density of probability as a marginal distribution of a Gaussian in plane, for in that case even the parameter of the distribution can have a physical meaning related, for instance to the eccentricity of a Kepler problem describing the planetary motion or the atomic nucleus, or even the polarization properties of light.
References
Burnside, W. S., Panton, A. W. (1960): The Theory of Equations, Dover Publications
Kindlmann, G. (2004b): Visualization and Analysis of Diffusion Tensor Fields, PhD Dissertation, School of Computing, University of Utah
Knight, F. B. (1976): A Characterization of the Cauchy Type, Proceedings of the American Mathematical Society, Vol. 55, pp. 130 – 135
Knight, F. B., Meyer, P.A. (1976): Une Caractérization de la Loi de Cauchy, Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, Vol. 34, pp. 129 – 134
McCullagh, P. (1996): Möbius Transformation and Cauchy Parameter Estimation, The Annals of Statistics, Vol. 27 pp. 787 – 808