Robust start up stage for beltline moulding process variability monitoring using vector variance

One of the primary problems encountered in monitoring the variability of beltline moulding process in an automotive industry is the estimation of parameters in the start-up stage. This problem becomes more interesting because the process is in multivariate setting and must be monitored based on individual observations, i.e., the sample size of each subgroup is 1. This paper deals with a robust estimation of location and scale during the start-up stage. For this purpose, we use Mahalanobis distance in data ordering process. But, in data concentration process, we use vector variance (VV). This method is highly robust and computationally efficient. Its advantage in monitoring the variability of beltline moulding process will be compared with the non-robust method. | breakdown point | covariance determinant | Mahalanobis distance | robust estimation | vector variance | ® 2010 Ibnu Sina Institute. All rights reserved. http://dx.doi.org/10.11113/mjfas.v6n1.179


INTRODUCTION
It is known that a successful of monitoring process in Phase II depends on a successful analysis during start up stage (SUS) or Phase I (Jensen et. al, 2005). Even though the two phases are both dedicated to identify out-of-control states, each phase has a unique objective. If SUS is used to estimate parameters, Phase II consists of monitoring future observations by using information from in-control historical data set (HDS) in SUS to determine whether or not the process continues to be in stable condition. Consider the situation when random sample data are stored in n p × matrix where n and p are the number of observations and variables, respectively. Let i X be the vector representing the i-th row. We assume that ; 1, 2,..., i X i n = are independent and follow a multivariate normal distribution. These data vectors will be used in start-up stage to obtain an in-control data subset which will be used to estimate the process parameters. Since and are unknown, they are replaced with an appropriate estimators mean vectors, X r and covariance matrix, These estimators are needed to monitor the process variability right after a future data vector or, equivalently, individual observation is available. Since the data is in multivariate setting, it is not easy to identify the outliers during the start-up stage as the analysis will be done simultaneously.  Derquenne (1992) stated the measure of identification multivariate outliers is created by a technique of transforms the random vectors to be random variables so that candidates of outlier will be seen more clearly. The most popular transformation is the Mahalanobis squared distance (MSD). A large value of MSD may indicate that corresponding observation is an outlier. Explained by Hadi (1992), outliers do not necessarily have large value of MSD and not all observations with large MSD value are necessarily outliers. These problems are known as masking and swamping effect due to the fact that mean vector and covariance matrix are not robust.
To handle this problem, the method of robust estimator is applicable as theoretical foundations of the construction of distance which is robust MSD. This paper is organized as follows. Section 2 and 3 presents classical approach and robust approach in SUS. We present an illustrative example on real life data of beltline moulding to demonstrate the effectiveness of robust approach compared with classical approach.

CLASSICAL APPROACH
Classical method based on MSD is powerful when there is only one out-of control point. Its power will decrease if more than one out-of control points are present in the data (Hadi, 1992). It is sensitive not only to the shift in mean vector but also in covariance matrix (Tracy, 1992). Any shift in mean vector and or covariance matrix will lead to unstable process.

Classical distance and distribution
The classical distance is generated from the arithmetic mean. The arithmetic mean is an estimation of the classical mean which is computed from the whole sample. Let 1 2 , ,..., n X X X be a random sample from pvariate distribution where the second moment exists. The sample mean vector and sample covariance matrix are, respectively, Based on MSD, the in control data is determined by plotting the value in control chart. Presence of one or more extreme data or called outliers changes the arithmetic mean significantly, and the distance increases. It is sensitive not only to the shift in mean vector but also in covariance matrix (Tracy, 1992). Any shift in mean vector and or covariance matrix will lead to unstable process. Since our aim in SUS is to check whether the observations are fall within the control limit, MSD is distributed as Beta distribution. Gnanadesikan and Kettenring (1972) based on results of Wilks (1962). Specifically, Knowing the distribution of MSD, it is possible to construct the control limits. It is given by

ROBUST APPROACH
Classical estimation methods will not yield appropriate control limits if there are unusual data points in SUS. Robust estimation methods have advantage over classical methods in that they are not unduly influenced by outlier data points. Jensen et al. (2005) discussed the use of robust MSD method based on minimum volume estimate (MVE) and minimum covariance determinant (MCD) criteria in start up stage. Both criteria were introduced by Rousseeuw (1985) have good properties, which are affine equivariant and have a high breakdown point if the data set is in general position. Later on, in order to improve its computational efficiency, Rousseeuw (1999) introduced a faster algorithm called FMCD. However, see for example Werner (2003) and Djauhari (2007) this algorithm is still cumbersome when the data set is of high dimension. Djauhari (2007) introduced a new robust estimator method called as minimum vector variance (MVV). This estimator still has the same structure like FMCD but use the different concept. Like FMCD, in the first step, we still use robust MSD as data ordering. In data concentration step, instead of calculate generalize variance; we change the procedure of data concentration by using vector variance (VV). The objective of FMCD is to find the best subset of that having the minimum covariance determinant or generalized variance (GV). This objective will be the stopping rule of FMCD. However, the objective and stopping rule of MVV is to find the best subset of h that having the minimum vector variance (VV). VV is the sum of square of all elements of the covariance matrix. As a measure of multivariate variability, VV performs much better than GV (Djauhari, 2008) for small shift in covariance matrix. There are two advantages of using VV as multivariate data concentration. First, the computation is far more efficient even for large matrix size than GV. Second, VV does not need the condition of non-singularity of covariance matrix. Unlike VV, the GV needs the condition that the covariance matrix must be non-singular.

Data concentration using VV
Consider a random vector data set of -variate normal observations which is in general position.  Otherwise, the process is continued until the kth iteration.
Let us denote the location and covariance matrix given by MVV as follows;

Robust distance and distribution
By using robust estimates, it gives MSD with unknown distributional properties. However, using robust estimates gives MSD with unknown distributional properties. In Hardin and Rocke (2005)

ILLUSTRATIVE EXAMPLE
One of the existing problems in any automotive industry is in the production process of beltline moulding. Beltline moulding over the outer lip of the drip rail prevents water from leaking into the car. If the lip is quite short, beltline moulding often will not position well. On the other hand, if the lip is longer the window glass will not move smoothly (Bon, 2008). This type of problem is not easily solved by applying standard procedure of manufacturing since the variability among the materials, machine processes, ambient conditions and end products exist and cannot be avoided.
The beltline moulding data are stored in n p × data matrix where n and p are the number of observations and variables, respectively. In this paper, we use the data in Bon (2008) which consist of 57 n = and 8 p = . Since the beltline moulding data set is multivariate setting, it is not easy to realize outliers in SUS.
Firstly, we will show the difference in SUS between classical estimation and robust estimation. The mean vector and covariance matrix based on classical estimation are  .
The determinant value of robust estimation is 11 2.74 10 − × . The value of determinant calculated from robust estimation is small compared to classical estimation. The variability of covariance matrix of robust estimation is less. The above parameter estimation, is calculated from sample subset, 33. h = Figure 2 shows the SUS control chart based on robust MSD. From the table of F distribution, with p =8, probability of false alarm = 0.0027, c = 1.06455 and  Figure 1 does not signal any out of control condition. Consequently, we already obtained the best parameter estimation by using robust approach.
Removing all four outlying observations and recalculating the parameter estimates with Reconstructing the control chart, we observe as in Figure 3 that none of the observations are outside the control limits. The new control chart has been established by eliminating the special cause of variation from outlying observations at Figure 2.
This beltline moulding data illustrates the effectiveness of the robust MVV estimator compared to the classical estimator in detecting process variability.

CONCLUSION
The monitoring variability during SUS by using beltline data indicates that VV is effective as CD as one of robust method. It is practically approved to be used as one of method for engineering consideration in control the production process because of the computational efficiency and easy to implement. Since the engineering experiments are quite particular with number of sample, VV can be applied in both conditions, either sub grouped observations or individual observations.

PROBLEM TO SOLVE
In view of the fact that in this paper, we just show the approach of MVV estimator during Phase I or SUS, we will further analyse in the process Phase II monitoring.