A redefinition of mahalanobis depth function

Depth function is a new notion intensively developed in the last decade in the field of non-parametric statistics, computational geometry, algebra, and computer science. It is closely related to multivariate ordering, robust estimation, and outlier detection. One of the most widely used in statistics and related areas is the so-called Mahalanobis depth. In this paper we redefine that depth function by introducing a new one which is equivalent to the former, in the sense that they give the same multivariate ordering, less complicated to compute, and generalizes the “vanishing at infinity” property of depth function. | center | covariance matrix | Mahalanobis depth | multivariate ordering |


Introduction
Suppose a random cloud in p R or a probability distribution is given.A depth function measures how central a application in multivariate control charts is presented in Liu et al. (1999) and Dai et al. (2006) and an application in aviation safety analysis is presented in Cheng et al. (2000).
By definition, depth function is closely related to multivariate ordering in the sense of center-outward ordering in p R , data outlyingness, and robust estimation.See, for example, Zuo and Serfling (2000) for the notion of these terminologies.In multivariate setting outlier region is defined as the complement of a depth central region.Furthermore, as in classical approach, the primary concern on robust estimation of location and covariance matrix lies in its property to have a high breakdown point.In classical approach, the most popular robust estimations are those constructed by minimizing the volume of ellipsoid (MVE) and by minimizing the determinant of covariance matrix (MCD) introduced by Rousseeuw (1985).Some improved versions of these two methods are proposed by many authors such as feasible solution algorithm in Hawkins (1994) and Hawkins and Olive (1999), fast MCD in Rousseeuw and van Driessen (1999), block adaptive computationally efficient outlier nominators (BACON) in Billor et al. (2000), and minimum vector variance in Herwindiati et al. (2006).It is to be noted that these versions are proposed in order to increase the computational efficiency.
The popularity of MVE-and MCD-based robust estimations is due to their commendable properties.They are affine-equivariant and have high breakdown point.See Lopuhaa and Rousseeuw (1991), Hadi (1992), Croux and Haesbroeck (1999), Rousseeuw and van Driessen (1999), Werner (2003), Hardin and Rocke (2004) for further discussion on these properties, and Jensen et al. (2005) for potential application in multivariate process control.However, because these robust estimations are constructed based on Mahalanobis depth, they are complicated to compute due to the need of inversion of covariance matrix.
The computational complexity of Mahalanobis depth, in terms of the number of operations in its computation, is still questionable especially for high dimensional data sets.The higher the dimension of the data sets the greater the number of operations in the computation of Mahalanobis distance the higher the computational complexity and the lower the computational efficiency.Can we redefine that depth function in a less complicated manner to compute?This is the problem that we intent to discuss in this paper.The main result consists of a new definition of Mahalanobis depth and a generalization of the "vanishing at infinity" property.The new definition will be formulated by introducing a new depth function which is equivalent to the former, i.e., they give the same multivariate ordering in the sense of center-outward ordering, and less complicated to compute.This paper is organized as follows.In Section 2 we propose a new depth function and redefine the Mahalanobis depth.Section 3 will be focused on its computational complexity in terms of the number of operations in its computation.We show that asymptotically its relative complexity with respect to Mahalanobis depth is eight eleventh.This is a promising advantage.Additional remarks in Section 4 will close this presentation.

A Proposed Depth Function
Let Φ be the class of p-variate distributions and X F be the distribution of a given random vector X in p R .The following formal definition of depth function is given in Zuo and Serfling (2000).

Definition 1.
A non-negative and bounded mapping D . Then, by using the property of the determinant of a partitioned matrix, see Appendix A in Anderson (1966) or Mardia et al. (1979), we obtain, 2 1 as we have to prove. In ), that property must be extended as follows: ( , ) D x F tends to 0 or − ∞ at infinity.

Futher Result
An advantage of i M as a measure of the depth of i X is that it does not need any matrix inversion in its computation.It only needs to compute the deteminant of a symmetric matrix.This means that i M is less complicated to compute than i MD .Its computational complexity is certainly lower than that of Mahalanobis depth.More precisely, by using Cholesky decomposition to calculate the determinant of a matrix and the inverse of covariance matrix, the asymptotic relative computational complexity of i M with respect to i MD , i.e., the ratio of the number of operations in their computations, is less than 1.This is given in Proposition

Additional remarks
The advantage of the proposed depth function i M lies in its computation which is less complicated than that of Mahalanobis depth i MD .Its computational complexity, i.e., the number of operations in the computation of i M , is less than that of i MD .Specifically, its asymptotic relative computational complexity is eight eleventh for p sufficiently large.However, i M has its own limitation with respect to i MD .In the latter we need to compute the inverse of S once for all sample items, whereas in the former we need to involve S in i M for each sample item i.We also note that these two depth functions need the condition that the second moment of the population exists.
is called depth function if it satisfies the following X D x F for any random vector X in p R , any non-singular matrix A of size p × p, and any vectors b in x D x F , for any F in Φ and q in p R called the center of F; n X be a random sample from p-variate distribution where the second moment exists.The sample mean vector and sample covariance matrix are, respectively, 1 Liu (1990)very tedious job especially for high dimensional data sets.Its computational complexity, in terms of the number of operations in its algorithm, is high.In what follows we redefine the Mahalanobis depth by introducing a new depth function with the following properties:1.It is equivalent to Mahalanobis depth in the sense that they give the same multivariate ordering, i.e., the same center-outward ordering described by the second and third properties in Definition 1. See also, for example,Liu (1990)and Liu et al. (1999); 2. Its computation is less complicated than that of Mahalanobis depth.A new definition of Mahalanobis depth will be formulated based on Proposition 1 which can be proved by using the property of the determinant of a partitioned matrix.
i MD the closer the point i X to the center X .The second term of the denominator at the right hand side of i MD is the so-called 2 T -Hotelling's statistic or Mahalanobis distance.In the literature, see for example, Hadi (1992), Liu et al. (1999), Rousseeuw and van Driessen (1999), Werner (2003), and Herwindiati et al. (2006), that distance is computed directly from the definition.Thus, we need the inversion of sample covariance matrix S.

Table 1
illustrates the difference of the number of operations in the computation of the two depth functions for various values of p.We see that, as p gets larger, the column

Table 1 .
Number of operations in the computation ofFigure1is a graphical display of Table1.The upper curve is for Mahalanobis depth and the lower for the proposed depth function.We see how the number of operations in the algorithm to compute the two depth functions differs considerably.