There exist numerically stable alternatives. The formula states that the variance of a sum is equal to the sum of all elements in the covariance matrix of the components. The simplest estimators for population mean and population variance are simply the mean and variance of the sample, the sample mean and uncorrected sample variance — these are consistent estimators they converge to the correct value as the number of samples increases , but can be improved. The general formula for the variance of the outcome, X , of an n -sided die is. The resulting estimator is unbiased, and is called the corrected sample variance or unbiased sample variance. Thus the total variance is given by. In general, if two variables are statistically dependent, the variance of their product is given by:. If a distribution does not have a finite expected value, as is the case for the Cauchy distribution , then the variance cannot be finite either. Variance has a central role in statistics, where some ideas that use it include descriptive statistics , statistical inference , hypothesis testing , goodness of fit , and Monte Carlo sampling. It is therefore desirable in analysing the causes of variability to deal with the square of the standard deviation as the measure of variability. Real-world observations such as the measurements of yesterday's rain throughout the day typically cannot be complete sets of all possible observations that could be made. In general, the population variance of a finite population of size N with values x i is given by. We take a sample with replacement of n values Y 1 , Correcting for this bias yields the unbiased sample variance :. This expression can be used to calculate the variance in situations where the CDF, but not the density , can be conveniently expressed. In linear regression analysis the corresponding formula is. Using the linearity of the expectation operator and the assumption of independence or uncorrelatedness of X and Y , this further simplifies as follows:. In such cases, the sample size N is a random variable whose variation adds to the variation of X , such that,. The Sukhatme test applies to two variances and requires that both medians be known and equal to zero. We shall term this quantity the Variance The variance of a probability distribution is analogous to the moment of inertia in classical mechanics of a corresponding mass distribution along a line, with respect to rotation about its center of mass.{/INSERTKEYS}{/PARAGRAPH} The resulting estimator is biased, however, and is known as the biased sample variation. For example, if X and Y are uncorrelated and the weight of X is two times the weight of Y , then the weight of the variance of X will be four times the weight of the variance of Y. If the Y i are independent and identically distributed, but not necessarily normally distributed, then [13]. This also holds in the multidimensional case. The expression for the variance can be expanded:. In this example that sample would be the set of actual measurements of yesterday's rainfall from available rain gauges within the geography of interest. In this sense, the concept of population can be extended to continuous random variables with infinite populations. The same proof is also applicable for samples taken from a continuous probability distribution. The great body of available statistics show us that the deviations of a human measurement from its mean follow very closely the Normal Law of Errors , and, therefore, that the variability may be uniformly measured by the standard deviation corresponding to the square root of the mean square error. Most simply, the sample variance is computed as an average of squared deviations about the sample mean, by dividing by n. Resampling methods, which include the bootstrap and the jackknife , may be used to test the equality of variances. Being a function of random variables , the sample variance is itself a random variable, and it is natural to study its distribution. This formula is used in the theory of Cronbach's alpha in classical test theory. One reason for the use of the variance in preference to other measures of dispersion is that the variance of the sum or the difference of uncorrelated random variables is the sum of their variances:. Other tests of the equality of variances include the Box test , the Box—Anderson test and the Moses test. The variance can also be thought of as the covariance of a random variable with itself:. In other words, the variance of X is equal to the mean of the square of X minus the square of the mean of X. This implies that in a weighted sum of variables, the variable with the largest weight will have a disproportionally large weight in the variance of the total. So for the variance of the mean of standardized variables with equal correlations or converging average correlation we have. This implies that the variance of the mean increases with the average of the correlations. That is, it always has the same value:. This equation should not be used for computations using floating point arithmetic because it suffers from catastrophic cancellation if the two components of the equation are similar in magnitude. The estimator is a function of the sample of n observations drawn without observational bias from the whole population of potential observations. The standard deviation is more amenable to algebraic manipulation than the expected absolute deviation, and, together with variance and its generalization covariance , is used frequently in theoretical statistics; however the expected absolute deviation tends to be more robust as it is less sensitive to outliers arising from measurement anomalies or an unduly heavy-tailed distribution. As such, the variance calculated from the finite set will in general not match the variance that would have been calculated from the full population of possible observations. There are cases when a sample is taken without knowing, in advance, how many observations will be acceptable according to some criterion. Similarly, the second term on the right-hand side becomes. Correcting for bias often makes this worse: one can always choose a scale factor that performs better than the corrected sample variance, though the optimal scale factor depends on the excess kurtosis of the population see mean squared error: variance , and introduces bias. Variance is invariant with respect to changes in a location parameter. The standard deviation and the expected absolute deviation can both be used as an indicator of the "spread" of a distribution. The second moment of a random variable attains the minimum value when taken around the first moment i. Secondly, the sample variance does not generally minimize mean squared error between sample variance and population variance. For example, the approximate variance of a function of one variable is given by. In other words, additional correlated observations are not as effective as additional independent observations at reducing the uncertainty of the mean. This definition encompasses random variables that are generated by processes that are discrete , continuous , neither , or mixed. Unlike expected absolute deviation, the variance of a variable has units that are the square of the units of the variable itself. Testing for the equality of two or more variances is difficult. {PARAGRAPH}{INSERTKEYS}In probability theory and statistics , variance is the expectation of the squared deviation of a random variable from its mean. This formula for the variance of the mean is used in the definition of the standard error of the sample mean, which is used in the central limit theorem. Using integration by parts and making use of the expected value already calculated:. Informally, it measures how far a set of random numbers are spread out from their average value. If the mean is determined in some other way than from the same samples used to estimate the variance then this bias does not arise and the variance can safely be estimated as that of the samples about the independently known mean. This makes clear that the sample mean of correlated variables does not generally converge to the population mean, even though the law of large numbers states that the sample mean will converge for independent variables. The population variance for a non-negative random variable can be expressed in terms of the cumulative distribution function F using. In general the variance of the sum of n variables is the sum of their covariances :. Estimating the population variance by taking the sample's variance is close to optimal in general, but can be improved in two ways. Thus independence is sufficient but not necessary for the variance of the sum to equal the sum of the variances. However, using values other than n improves the estimator in various ways. If two variables X and Y are independent , the variance of their product is given by [7]. This means that one estimates the mean and variance that would have been calculated from an omniscient set of observations by using an estimator equation. A similar formula is applied in analysis of variance , where the corresponding formula is. This formula is used in the Spearman—Brown prediction formula of classical test theory. The next expression states equivalently that the variance of the sum is the sum of the diagonal of covariance matrix plus two times the sum of its upper triangular elements or its lower triangular elements ; this emphasizes that the covariance matrix is symmetric. In the case that Y i are independent observations from a normal distribution , Cochran's theorem shows that s 2 follows a scaled chi-squared distribution : [11]. The square root is a concave function and thus introduces negative bias by Jensen's inequality , which depends on the distribution, and thus the corrected sample standard deviation using Bessel's correction is biased. The F test and chi square tests are both adversely affected by non-normality and are not recommended for this purpose. Either estimator may be simply referred to as the sample variance when the version can be determined by context. Of this test there are several variants known. That is, if a constant is added to all values of the variable, the variance is unchanged:. For example, a variable measured in meters will have a variance measured in meters squared. Therefore, the variance of the mean of a large number of standardized variables is approximately equal to their average correlation. An asymptotically equivalent formula was given in Kenney and Keeping , Rose and Smith , and Weisstein n. Samuelson's inequality is a result that states bounds on the values that individual observations in a sample can take, given that the sample mean and biased variance have been calculated. They allow the median to be unknown but do require that the two medians are equal. For this reason, describing data sets via their standard deviation or root mean square deviation is often preferred over using the variance. The population variance matches the variance of the generating probability distribution. One can see indeed that the variance of the estimator tends asymptotically to zero. If the variance of a random variable is 0, then it is a constant. The delta method uses second-order Taylor expansions to approximate the variance of a function of one or more random variables: see Taylor expansions for the moments of functions of random variables. However, some distributions may not have a finite variance despite their expected value being finite. Variance is an important tool in the sciences, where statistical analysis of data is common. When dealing with extremely large populations, it is not possible to count every object in the population, so the computation must be performed on a sample of the population. Moreover, if the variables have unit variance, for example if they are standardized, then this simplifies to. That is, the variance of the mean decreases when n increases. This can also be derived from the additivity of variances, since the total observed score is the sum of the predicted score and the error score, where the latter two are uncorrelated. In many practical situations, the true variance of a population is not known a priori and must be computed somehow. The Lehmann test is a parametric test of two variances. These results lead to the variance of a linear combination as:.