Using Statistics

From Support

Jump to: navigation, search

Batching is where non-overlapping, adjacent, equal-sized groups of data are averaged. The resulting series of batched means will often be more independent, less erratic, and have an approximately normal distribution. If there are about twenty batches, there is a sufficient number of degrees of freedom for most inferences (Schmeiser, 1983). Batched means are computed after truncation. From within SIGMA, you are limited to 10,000 batches of output data.

Sufficient statistics and maximum likelihood estimators are used for common inference. The assumptions behind the statistics are probably more important than the numbers themselves and should be understood before too much faith is placed in their values. When looking up the simple formulas, you can review the assumptions. Do not let an unfamiliarity with statistics prevent you from looking at the charts. Averages of squares (scatter plots) are sufficient for batched means confidence interval estimation and other inferences. Approximately twenty to thirty large batches are recommended (Schmeiser, 1983). Batching tends to make the output better approximate an independent sample from a normal distribution.

To illustrate the effects of batching, 1000 observations of the EAR process discussed in Input Modeling were simulated. The mean of the process, μ was 10. The sample mean was 9.43. The autocorrelation function of this process and the histogram showed us that this output series does not appear to be independent nor does it have the characteristic "bell shaped" distribution function expected of normally distributed observations. We also saw that the correlation between the observations appears to slowly decrease as the lag (time interval) between the observations increases. There appeared to be significant correlation between neighboring observations at lag 1. The sample standard deviation was 9.339. (This is approximately equal to the sample mean, as expected from exponential data.)

To see the effects of batching, averaged groups of 10 observations in a sample of 5000 from the same process. The sample mean of the batched process was 9.739.

Even with a batch size as small as 10, we saw from the autocorrelation plot that the data appears now to have very little serial correlation. We also saw that the sample standard deviation is 4.838. The histogram of the batched means showed the data is beginning to look like it came from a normal, bell-shaped distribution (as expected from the Central Limit Theorem of statistics). With averages of only 10 observations from a highly-skewed dependent exponential distribution, we begin to approximate an independent normal data set. The mean and variance of the batched process can safely be used to form a batched means confidence interval for the mean of the process, based on the usual t-statistic with 499 degrees of freedom. (We can use the normal approximation to the t distribution for such a large degrees of freedom parameter.) For example, a 90% confidence interval for the process mean, μ, is found to be,

Substituting our statistics into the above interval estimation formula, we get the following confidence interval for the true mean of the process.

The true value of μ=10 was within this confidence interval, as it was for the wider 90% confidence interval of 9.696Image:le.pngμImage:le.png10.763 constructed with 1000 unbatched observations.

In addition to averages, time averages, and standard deviations, The STS area, A, STS{X} can be used for confidence intervals as described by Goldsman and Schruben (1984). For large samples, A might behave like the standard deviation times a chi-square random variable with 1 degree of freedom independent of the sample mean. (Use independently seeded replications to increase the degrees of freedom.)

The (unscaled) STS maximum, M, located at the Kth of N batches might behave like the standard deviation times (N-K)K/N times a chi-square random variable with 3 degrees of freedom if N is very large (and the batch size is moderate). See Goldsman and Schruben (1984). This can be used for computing the STS maximum confidence interval estimator.

Back to Inputs/Output

Personal tools