# Inputs/Outputs

(Difference between revisions)
 Revision as of 18:58, 1 August 2008 (edit) (→Standardized Time Series)← Previous diff Revision as of 18:59, 1 August 2008 (edit) (undo) (→Standardized Time Series)Next diff → Line 463: Line 463: [[Image:sts30.png|center]] [[Image:sts30.png|center]] - :4. '''''Apply Limit Theorems''''': The standardized series, [[Image:sts29.png]], will converge in probability distribution to that of a Brownian Bridge stochastic process. Thus, the Brownian Bridge process plays the role in time series standardization that the normal random variable played in scalar standardization. An important feature of the standardized series, [[Image:sts29.png]], is that it is constructed to be asymptotically independent of the sample mean, [[Image:sts30.png]]. + :4. '''''Apply Limit Theorems''''': The standardized series, [[Image:sts29.png]], will converge in probability distribution to that of a Brownian Bridge stochastic process. Thus, the Brownian Bridge process plays the role in time series standardization that the normal random variable played in scalar standardization. An important feature of the standardized series, [[Image:sts29.png]], is that it is constructed to be asymptotically independent of the sample mean, [[Image:sts31.png]]. - :There are several functions of [[Image:sts29.png]] that will also be asymptotically [[Image:sts31.png]] distributed. The area, [[Image:sts32.png]], will have a limiting [[Image:sts33.png]] normal distribution with zero mean and variance + :There are several functions of [[Image:sts29.png]] that will also be asymptotically [[Image:sts32.png]] distributed. The area, [[Image:sts33.png]], will have a limiting [[Image:sts34.png]] normal distribution with zero mean and variance :Therefore, :Therefore,

# Modeling Input Processes

Discrete event simulations are typically both dynamic (they change over time) and stochastic (they are random). So far, we have concentrated on learning to model the dynamics of a system. In this chapter, we look at some of the issues and techniques for modeling randomness. Random numbers, trace-driven simulations, parametric input distributions, and empirical input distributions are discussed. The exponential autoregressive process is used to illustrate dependence in the input process. Various sources of data with which to model input processes are presented. A discussion on reusing random number seeds to reduce variance in model output is also included. The chapter closes with some techniques for generating random variates, including the generation of non-homogeneous Poisson processes. These processes are useful models for generating exogenous events to drive a simulation model.

## Randomness

Although there is much literature in this area, here we will be brief. This is partly due to the fact that simulation modelers rarely write code for this part of their programs. Algorithms for imitating random sampling are well developed, and reliable codes are readily available. Furthermore, while the dynamics of most simulation models are unique, the stochastic logic tends to be the same. Almost all of the simulations we have discussed have had at least one random process as an input to the model. For instance, our carwash model was driven by customers arriving at randomly spaced intervals. We modeled the car arrival process by assuming that the intervals between arrivals were independent and had a particular uniform probability distribution. Realistic situations can be quite a bit more complicated: cars might arrive at a higher rate during rush hour, on weekends, or on days with nice weather. In this chapter we will review some of the considerations and techniques for generating random input processes to drive a simulation.

Broadly speaking, there are three popular approaches to modeling input processes: using pre-recorded data, using sample probability distributions, and using mathematical probability models. Pre-recorded data is also called a process "trace", a sample probability distribution is also called an "empirical" distribution, and mathematical probability models are sometimes referred to as theoretical or "parametric" models. All but trace-driven simulations require the use of random numbers.

## Trace Driven Simulations

In a trace-driven simulation whenever a value for a random variable is needed by the simulation, it is read from a data file. When it is practical, this input file contains actual historical records. In our carwash example, the trace might be a file of the intervals between successive car arrivals recorded while watching the system. Sometimes only a portion of the input is trace driven. In a fire department simulation, the times and locations of calls might be read from a file with data from a dispatcher's log book while other inputs, such as equipment repair status and travel times, might be generated as they are needed.

Trace data is simple to read into SIGMA using the DISK{} function. Recall that the DISK function has two arguments. The first argument is the full name of the data file (with drive and directory path if necessary); the second is an integer index telling which entry is to be read. When the index is zero, the file is read sequentially, wrapping around to start at the beginning again when the end of the file is reached.

We could place the values (separated by at least one space) of times between customer arrivals at our carwash in a data file called ARRIVAL.DAT. We then could use the function,

```DISK{ARRIVAL.DAT;0},
```

as the delay time on the self-scheduling edge for the ENTER vertex that generates successive customers arrivals. If ARRIVAL.DAT contains only five observations and looks like the following

```.32    2.6
.78    4.3  .85,
```

then the sequence read by the DISK{ARRIVAL.DAT;0}would be

```.32, 2.6, .78, 4.3, .85, .32 (wrapped around) 2.6, ...
```

There are some distinct advantages to having the values of random processes read from a data file. Foremost, there is less concern with the validity of the trace input data than when the inputs are artificially generated. We never really know how individual input variables might actually be distributed. Furthermore, it is quite difficult to capture the dependencies between different input processes or between successive values of the same input process. (We will re-examine the question of dependent input later in this chapter.)

When attempting to validate that a model accurately represents the behavior of a real system, there is probably no better test than to simulate previous system behavior using past input data. All discrepancies between the performance of the model and the system can then be attributed to model assumptions or errors in the simulation code. Assumptions and code errors are only dangerous if they are hidden. Assumptions can be evaluated as to their potential impact on decisions and the benefits they offer in model simplification. Known coding errors can be corrected. Accurate representation of the past is a reasonable minimal expectation. However, just because your model closely imitates last year's performance with last year's input data by no means makes the model correct or even useful. Driving a simulation with an artificial data trace rather than historical data is a useful technique for debugging the logic of a model. Specific sequences of otherwise random events can be forced to reoccur in a model that is being tested or enriched.

Many of the disadvantages of trace driven simulations are more or less obvious; some are not. While historical data traces provide a valuable source of simulation input when developing or changing a model, traces are not a good general approach to driving a simulation model when it is being used for analysis. Trace driven simulations require storage space for the data buffers, or they can be very slow due to the significant overhead of reading input files. Historical data that is detailed enough to drive a simulation is probably not available and would be time-consuming and expensive to collect. Actual data is also subject to errors in observation or may be invalid merely due to the intrusion of the observer. Even if detailed data is readily available, the simulation still could not be independently replicated or run for a longer period than the interval for which actual data was collected. In using trace input, we are giving up two of the major advantages to simulation modeling: time compression and independent replication.

Using a historical data trace as input when considering alternative policies or designs is probably not valid. Most trace driven simulations are closed systems. That is, the laws governing the input processes are not dependent on the state of the simulation. On the other hand, the actual system most likely is an open system that influences its environment. Although they are run all the time, trace-driven, "what if" simulation experiments are usually not appropriate. All statements about the effect of a change are based on the implicit assumption that the change has no influence on the environment in which the system operates. This is analogous to assuming that a system operates in an inelastic economy; demand is not altered by supply, price, quality, etc. What we most likely want to know is how sensitive the new system might be to changes in the input. It is difficult to do input sensitivity analysis with trace input.

Another disadvantage concerns rare but important events (i.e., a single, very long repair time at a service center). Since an unusual event is, by definition, unlikely to be on any given historical data trace, we may never see its influence in our simulation runs. Perhaps worse, if such a rare event happens to be in our trace, it will occur in every system we simulate. We might choose an alternative that handles this unusual case well but is too expensive or does not perform well in more typical situations.

## Random Number Generators

At the heart of stochastic modeling are random numbers. We define random numbers as positive fractions whose values are assumed to be independent of each other and equally likely to occur anywhere between zero and one. Random number generators are algorithms that imitate the sampling of random numbers. In SIGMA, RND has its values generated by such an algorithm. The algorithm simply takes an integer that you supply as the "seed" and recursively multiplies it by a fixed constant, divides by another constant, and uses the remainder as the next "seed." This remainder is also scaled to lie strictly between zero and one and used as the current value for RND. The random number generator we are using originated with Lewis, Goodman, and Miller (1969).

We will forego a discussion of the philosophy of random number generation. Suffice it to say that, by most common notions of what we mean by randomness, it is impossible to "generate" random numbers. Indeed, there have been some widely used algorithms that generate numbers that look far from random. Perhaps with the exception of the seed you give it, there is nothing whatsoever truly random in the values of RND or the outputs from any other random number generator; they just "look" random if you are not overly particular. Any random number generator that has passed all statistical tests for randomness simply has not been tested enough. Nevertheless, modeling the output from many random number generators as being true random numbers has been amazingly successful.

It is very easy to modify the SIGMA-generated source code in C to include multiple random number input streams. This is discussed when we introduce correlation induction techniques used in variance reduction in Section 9.8.

## Using Empirical Input Distributions

With this approach to input modeling, a sample of observations of an input variable is used to estimate an empirical probability distribution for the population from which the sample was taken. The customary estimate of the empirical probability distribution is to assign equal probability to each of the observed values in the sample. If the sample contains N observations, then the empirical probability distribution will assign a weight of 1/N to each of these observations.

Suppose that a trace of interarrival times of customers to our carwash is in the data file ARRIVALS.DAT. Sampling from the empirical distribution is equivalent to reading a randomly chosen value from this file. This is like shuffling and drawing from a data trace. In SIGMA this is done by making the index of the DISK function a random integer from 1 to N. For example, the file, ARRIVALS.DAT, may look like the following:

```.3   .42   .2   .54   .79
```

A delay time of

```DISK{ARRIVAL.DAT;1+5*RND}
```

will result in one of these five numbers (chosen at random) being used. Wrapping around the data file will not occur here since the index is never greater than 5. A considerably more efficient but perhaps less flexible approach is to place the data in an array and generate the index of the array uniformly. If the data is in the array, X, then

```X[1+5*RND]
```

would select one of these values with equal probability. In SIGMA the index is automatically rounded down to the nearest integer.

Similar to using historical data traces as input, the big advantage to using sample distributions to generate input is that there is less concern over validity. However, with sample distributions we can replicate and compress time. The disadvantages to using the empirical input distributions are similar to the disadvantages to using trace input: the data might not be valid, sensitivity analysis to changes in the input process is difficult, we cannot generalize the results to other systems, and it is hard to model rare events. The one major advantage trace input has over empirical distribution sampling comes in modeling dependencies in the input processes. The trace will capture these dependencies whereas the empirical distributions will not.

## Using Parametric Input Distributions

Efficient algorithms have been developed for imitating the sampling from a large number of parametric families of probability models. In Chapter 7, we saw some SIGMA functions for artificially generating samples that behave very much as though they were actually drawn from specific parametric distributions.

The values of parameters for these models determine the particular characteristics of the sample. This ability to easily change the nature of the input by changing a few parameter values is the primary advantage of using these models to drive a simulation. The variate generation algorithms in common use are fast and require very little memory. Furthermore, you can easily run replications, compress time, and generalize the results to other systems having the same structure. The major drawback to using parametric input distributions is that they can be difficult to explain and justify to people who have no background in probability and statistics.

Devroye (1986) provides a very complete reference on variate generation algorithms. The article by Leemis (1986) catalogs the relationships between dozens of probability laws.

There are several obvious classifications of probability models: finite or infinite range, continuous or discrete values. On a practical level we can also classify probability models as primarily providing good models for input processes or good models for output statistics.

Most of the common distributions that are used to model output statistics are derived from the normal (also called the Gaussian) distribution. In SIGMA the function NOR{M;S} will imitate sampling from a normally distributed population with a mean of M and a standard deviation of S. M can be any real-valued expression, and S can be any positive real-valued expression. Samples from other distributions such as the t, F, and χ-square can easily be derived from their relationship to the standard normal distribution (Leemis, 1986). A selection of parametric distributions that are good for input modeling are provided as SIGMA functions.

Standard Beta Density Shapes

The function BET{P;Q} imitates sampling from a beta random variate with parameters given by the positive real-valued expressions P and Q. This is a standard beta on the interval from 0 to 1 and can be scaled to the interval (C,C+D) in the usual way as

```C+D*BET{P;Q}.
```

The beta distribution is one of the most useful in simulation input modeling because of the richness of shapes it can take with simple changes of its two parameters. Figure 9.1 provides a convenient "matrix" of beta distribution shapes for various values of its parameters.

ERL{M} will give a sample imitating an M-Erlang random variate. The parameter, M, can be any positive integer-valued expression. Multiplication by a real number, A, will move the mean from M to A*M. Since the M-Erlang is the sum of M independent exponentially distributed random variates with mean 1, A*ERL{1} will be an exponential random variate with mean, A.

If you want a sample that is even more highly skewed than one having an exponential, the function, GAM{A}, will imitate sampling from a gamma distribution with a fractional parameter. Here the shape parameter, A, is a real variable strictly between 0 and 1. The result can be multiplied by a scale parameter to give a variety of distribution shapes. For integer values of M, M-Erlang random variates have the same distribution as gamma variates.

The function, TRI{C}, will imitate sampling from a triangular shaped distribution over the range from 0 to 1. The mode (peak) of the distribution is at the value of the real-valued expression, C, which is between 0 and 1 inclusive. The linear function,

``` A+(B-A)*TRI{(D-A)/(B-A)}
```

imitates a sample from a triangular distribution between A and B with a mode at D.

It is easy to code other probability models with SIGMA. We will illustrate this with two very useful models, the multinomial and the lambda. The multinomial probability law models independent sampling, with replacement, where there are only K possible mutually-exclusive outcomes. The simplest example of multinomial sampling is drawing a numbered ball from a jar where after each sample the chosen ball is placed back in the jar. While there are more efficient algorithms available, multinomial sampling is easily done in SIGMA using the DISK function. To illustrate, suppose that there are only three possible outcomes from our experiment. We will see a 1 with a probability of 1/2, a 2 with a probability of 1/3, and a 3 with a probability of 1/6. We simply set up a data file called, MULTINOM.DAT, which has the following entries

```1  1  1  2  2  3
```

The statement

```X = DISK{MULTINOM.DAT;1+6*RND}
```

will assign the values of X with the given probabilities. The index of the above DISK function is a randomly chosen integer from 1 to 6 (rounding the index down is automatic).

The lambda (more properly the "generalized" lambda) distribution is like the beta in that it can take on a wide variety of shapes. This distribution is discussed by Ramberg, et al. (1979). The major difference between the lambda and the beta is that the lambda can take on an infinite range of values whereas the beta is restricted to take on values only within a specified interval. There are four parameters to the lambda that can be estimated using subjective data. Generation of a lambda variate is very easy. Suppose that you have defined real-valued state variables, X and R, along with the four lambda parameters, L1, L2, L3, and L4. The statements

```R=RND,
X=L1+(R^L3-(1-R)^L4)/L2
```

will give values of X that imitate sampling from the lambda.

You should exercise caution when using Erlang, exponential, gamma, lambda, and normal distributions for input modeling. Variates from these families can take on very large values. You need to check and/or control for reasonableness. For instance, if you are using an exponential (ERL{1}) variate to model the service time at a store, it is not reasonable for the service time to exceed some upper limit. No one is going to wait years for service. Truncating these distributions at some upper bound is advised. To illustrate, a vertex with the state changes,

```X=ERL{1},
X=X*(X<5)
```

would produce a sample from an exponential variate truncated to be strictly less than 5. The truncated probability (the likelihood that an exponential variate exceeds 5) is added to the probability of zero occurring.

## Modeling Dependent Input

The book by Johnson (1987) along with the articles by Lewis (1981) and McKenzie (1985) are devoted primarily to the generation of dependent input processes. To illustrate the critical importance of recognizing and modeling dependence in the input processes for a simulation, we will use a simple process called the exponential autoregressive (EAR) process (Lewis, 1981).

Successive values of an EAR process, X, with mean, M, and correlation parameter, R, are generated from the recursion

```X = R*X+M*ERL{1}*(RND>R)
```

with an initial value of X given by the exponential M*ERL{1}. The values of this process will have an exponential distribution, but they are not independent. The correlation between two values of an EAR process that are K observations apart is Rk. At the extremes when R=1, the Boolean variable (RND>R) is always equal to zero and the above expression reduces to

``` X=X.
```

The process never changes value, so the serial correlation is a perfect 1. When R=0, (RND>R) is always equal to 1, and independent (zero correlation) exponential random variables are generated as

```X=M*ERL{1}.
```

The EAR process is easy to use since its serial dependency can be controlled with the single parameter, R. Although histograms of the values of this process look like a simple exponentially distributed sample, the line plots of successive values of an EAR process look rather strange. As is obvious from the EAR process equation, the value of X takes large randomly-spaced jumps and then decreases for a while.

To see the effects of dependent input, consider our simple queueing model, CARWASH.MOD, where we change service times. We will use an EAR process, with mean, M, and a correlation parameter of R. This model is called EAR_Q.MOD. If you run EAR_Q.MOD with the same M but very different values of R, you will see a radical difference in the output series. Dependence in the service times has made the two systems behave very differently. When building this model, if we had looked only at the histograms of service times and ignored the serial dependence on service times, we might have had a very poor model.

## Sources of Data

One of the most common excuses given for not successfully completing a simulation study is the lack of "real-world" data with which to model the input processes. While real data is valuable in establishing the credibility of a simulation, lack of data is not a good excuse for not proceeding with the study. You should be trying a wide range of reasonable input processes to assess system performance sensitivity to changes in the environment. Furthermore, there are many sources of data that should not be overlooked when planning and conducting a simulation study. Each source of data has different associated costs, risks, and benefits.

At the least detailed level, there are physical constraints of the system being modeled, such as space limitations for a waiting line. This is reliable, low-cost information that gives design insight. It tells a great deal about the rationale for the way a system was designed. Unfortunately, it is static information and is of little help in modeling the system dynamics. At the next level of detail, there are the subjective opinions of persons involved with the system. This is low-cost (but unreliable), static data that provides behavioral insights about the people involved with designing, operating, and managing the system. Increasingly detailed information can be obtained from aggregate reports on system operations such as labor, production, and scrap reports. This is low-cost, verifiable, static data that can provide performance insight. Information that is useful in modeling the dynamics of the system can sometimes be obtained from artificial data, classical Industrial Engineering MTM methodology. This type of information can give standard times (with allowances for fatigue) for performing different manual operations in a system. The cost of this information is moderate, and you do not need to have access to the actual system to obtain it. However, its validity depends on the skill and experience of the person doing the analysis. This data uses detailed motion analysis and provides policy insights in how the system managers and designers intend people to perform their jobs. Finally, the most expensive source of data is direct observation. The validity of this data depends on the skill of the observer and the relationship between the observer and the person being observed. Direct observations of a system's operations might be collected manually with time studies or mechanically with sensors. This data provides the operational insights needed to accurately model system dynamics.

Alternate sources of information are sometimes overlooked. For example, part routing sheets can be used to verify traced job flows in a factory. Production records might be used to augment and validate data on the reliability of machines. Knowing the number of machine cycles in a particular time period from production records along with the total number of failures from maintenance records permit you to estimate the probability that a machine will fail on a given operation cycle. It is unlikely that these failures are independent; however, at least you have a starting place for your sensitivity analysis and a potential consistency check for verifying more detailed machine failure testing data.

Finally, one needs to be alert for communication problems when collecting data. You might think that the data is about one thing when it is really about another. Or the data might have been translated, scaled, or simply recorded incorrectly. Fortunately, these are not fatal problems in carefully conducted simulation studies. We are going to change the input data during our sensitivity analysis experiments anyway.

To help keep the relative importance of real-word data in perspective it may be useful to remember the following Five Dastardly D's of Data. Data can be:

```1. Distorted: The values of some observations may be changed or not consistently defined. For
the value of a faster vehicle, or a product demand data may include only backlogged orders,
ignoring customers who refused to wait.
```
```2. Dated: The data may be relevant to a system that has or will be changed. Perhaps factory data
was collected for an older process or using last year's product mix.
```
```3. Deleted: Observations may be missing from the data set. This might be because the data was
collected over an interval of time, and events such as machine failures simply did not occur
during the study period. Medical trial data might be censored by patients dropping out of the
study for various and unknown reasons.
```
```4. Dependent: Data may be summarized (i.e., only daily averages are reported). This may remove
critical cycles or other trends in a data sequence or hide relationships between different
sequences. For example, data from a surgical unit might give very accurate estimates of the
distributions of preparation, operation, and recovery times. However, it may fail to capture the
fact that some procedures will tend to have large values for all three times while others
procedures may tend to have all small values.
```
```5. Deceptive: Any of the first four data problems might be intentional.
```

## Variance Reduction

It is often possible to obtain significantly better results by using the same random number streams for different simulation runs. For example, you might want to compare the performance of two different systems. When doing so, it is a good general experimental technique to make "paired" runs of each system under the same conditions. In simulations you do this by using the same random number seed in a run of each system. This technique, called using common random numbers, extends to more than two alternative systems, You would re-use the same seed for a run of each system. To replicate, choose another seed and run each system again. This technique reduces the variance of estimated differences between the systems.

Another example where re-using random number seeds can help reduce the variance of the output applies when making two runs of the same system. As before, you use the same seeds for both runs in the pair. However, for the second run, use 1-RND where RND was used before. If RND is a random number, then 1-RND is also. Furthermore, there is a perfect negative correlation between RND and 1-RND. If RND is a small random number, 1-RND will be large; if RND is large, 1-RND will be small. The pair of runs using RND and 1-RND are called antithetic replications. The hope is that the negative correlation between the input streams will carry over to the output. If one run produces an output that is unusually high, its antithetic run will have an output that is unusually low. When the antithetic replicates are averaged, a run with an unusually high result is canceled by its antithetic replicate having an unusually low outcome and vice versa.

Re-using random number seeds falls under the general category of variance reduction techniques. For a discussion of common and antithetic random numbers as well as some other techniques (which tend to be much less successful in practice), see the text by Bratley, Fox, and Schrage (1987).

The chances that the beneficial results of correlated input streams carry over to the output are greater if the runs can be synchronized as much as possible. That is to say, we want any unusual sequence of random numbers in a run also to be used in the same manner in its commonly seeded replicate(s). For example, if one run in a queueing simulation has an unusual sequence of long service times that causes the system to become very congested, we would like its antithetic replicate to have an unusual sequence of short service times that reduces congestion in the system. We would like all systems using common random numbers to have the same experiences. Synchronization of runs is generally improved if we use different random number streams exclusively for different stochastic components of our simulation. For example, in a queue we might use one stream to generate interarrival times and another stream to generate service times. Thus, we will want to use more than one sequence of random numbers in our simulation. A detailed example is in Appendix B.18.

## Using Multiple Random Number Streams (Development licensees only)

Function definitions in C make it very easy to change your SIGMA random number stream to a "vector" of random number streams. We discuss how to generate C simulation programs in Chapter 11. In your SIGMA-generated C code, replace RND with RND[I], where I is an integer indicating which stream you want to use. For example, to draw a random number from stream 3, replace RND with RND[3]. Then "vectorize" your library functions by replacing rndsd with rndsd[i] and RND with RND(I) in SIGMALIB.C (for development licensees only) Note that RND is now a function and rndsd becomes an array. If you are using three different random number streams in your model, you would make two changes in the library header file for your C compiler (SIGMALIB.H, SIGMAFNS.H). The two replacement lines would be:

```long rndsd[3];  /*makes rndsd a vector*/
#define RND(j) ((float) (rndsd[j] = lcg(rndsd[j]))*4.656612875e-10)
```

Like before, you still would have to read in the seeds for each stream when you run your model. You can always substitute the random number generator that comes with your compiler for RND. Appendix B.18 gives a general approach.

## Methods for Generating Random Variates

In this section, techniques for generating random variates are presented. Included among the techniques is the generation of non-homogeneous Poisson processes.

### Distribution Function Inversion

The cumulative distribution function, , for the random variable, , is the probability that the value of a random variable will be less than or equal to the function argument,

```
```

Discrete valued random variables can be generated by partitioning the interval between zero and one into sections whose widths are equal to the probability of each value. The value of the random variable is determined by the interval in which a pseudo-random number, U, falls. The probability of U falling in an interval is equal to the width of the interval, which, in turn, is the probability of the corresponding value occurring. This is equivalent to inverting the cumulative distribution function as illustrated in Figure 9.2.

The Cumulative Distribution Function of a Discrete Random Variable

The same technique can be applied to continuous valued random variables. For example, the cumulative distribution for an exponential random variable is,

```
```

So, the inverse distribution function of a uniform random number will generate an exponential variate as

```
```

When the cumulative distribution function is easily inverted, this technique is recommended. Unfortunately, not very many of the more commonly used probability distributions have easily inverted cumulative distribution functions. Exceptions that are particularly useful are order statistics from a sample of uniform random variables.

Order Statistics: Order statistics are sorted samples; the minimum order statistic is the smallest value in a sample. Suppose we want to know the first time that one of several identical independent components will fail. We could generate the lifetimes of each component and sort them to find the shortest. If there are many components, it would be easier to generate the minimum order statistic from the lifetime distribution.

Order statistics can be generated by evaluating an inverse cumulative distribution function at the corresponding uniform order statistic. The ith smallest of K uniform random variables has a beta distribution with parameters i and K-i+1. A common special case is where we want the smallest of K independent values of a random variable. The cumulative distribution function of the smallest of K independent uniform random variables is,

```
```

Therefore,

```.
```

If the lifetime of each of K independent components has an exponential distribution, the distribution of the time until the first component fails is equal to

```
```

### Other Methods

Other methods for generating random variates include acceptance/rejection, composition, and special relationships. Like inversion, these other methods are not always possible to use in their pure form, and special algorithms that combine these methods have been invented. The reference by Devroye (1986) contains many of these algorithms.

Acceptance/rejection involves bounding the probability density and generating a point uniformly within this bounding area. If the generated point falls below the density function, the horizontal value of the generated point is used as the random variate. If the point falls outside the density function, the generated point is rejected and a new point is tried. This is continued until an accepted point that falls within the region under the density function is found. This is analogous to throwing uniform points onto the bounding region and accepting those points that fall below the probability density. The algorithm for generating beta variates, beta_fn, in the SIGMA C library uses an acceptance/rejection algorithm due to Cheng (1978).

Composition involves breaking the probability distribution into regions from which it is relatively easy to generate variates. One of these regions is selected with the probability in that region and the variate is generated from that region.

Special properties exploit relationships between different types of random variables. Some commonly used relationships include generating an Erlang variate as a sum of exponentials, generating a geometric variate as the integer part of an exponential, generating a chi-square variate as the sum of squared normal variates, and generating a Poisson as the count of exponential variates whose sum falls within an interval. (When the Poisson rate is large, there are better methods for generating Poisson variates given in the references).

### Generating Non-Homogeneous Poisson Processes

Random arrivals to a service system can often be modeled using a Poisson process. This process has proven to be a rather good model for many processes that occur naturally. It is also used for other types of exogenous events that drive a simulation model, such as equipment failures or flaws occurring in a piece of fabric. The parameter for a Poisson process is its rate, λ, expressed in the number of events per time unit (e.g., customer-arrivals/hour or flaws/square-foot).

There are numerous methods for generating Poisson processes which exploit different properties of these processes. The fact that the times between Poisson events have exponential distributions can be used to simply make the delay time for a self- scheduling edge have an exponential distribution with a mean equal to the inverse of the Poisson rate. For example, if the arrivals to a queue are going to be modeled as a Poisson process with an rate equal to RATE, then the edge delay time for the self-scheduling edge that models sequential arrivals would have a delay time of

```(1.0/RATE)*ERL{1} (recall that Erlang-1 and exponential distributions are same).
```

If we know the total number of Poisson events that occur in an interval of time, we can better match the process by conditioning on this knowledge. We do this using the fact that the distribution of the times of K Poisson events in an interval have the same distribution as K uniform random numbers on the same interval. Say we know that K Poisson events occurred in an interval between 0 and T. We can generate the minimum order statistic of K uniforms over the interval for our first event time, T1. The second event time will be the minimum of the remaining K-1 uniforms distributed over the remaining interval between T1 and T. The rest of the K events in the original interval can be generated in this manner.

Unfortunately, processes in the real world do not often occur with a constant rate. The arrival rate may be relatively high during a rush hour and slack (or zero) late at night. A Poisson process with a changing rate is called a non-homogeneous Poisson process. An easy way to model a non-homogeneous Poisson process is by a technique called "thinning" (Lewis and Shedler, 1979). Here we simply generate Poisson events at the maximum rate and keep them with a probability proportional to the current rate. To illustrate: assume we know that during an 24-hour day, customers will arrive at our carwash with the following hourly rates: R[0], R[1],..., R[23] (the rate can be zero if the facility is closed). Let RMAX be the maximum of these rates. The event graph to generate Poisson arrival events according to this daily demand cycle is called NONPOIS.MOD (it is shown in Figure 9.3.

Non-Homogeneous Poisson Arrivals to our Carwash Model

# Graphical & Statistical & Output Analysis

In keeping with our philosophy of utilizing pictures whenever possible, several graphical methods of presenting simulation results have been incorporated into the SIGMA software. The graphical output charts available to you while in SIGMA are line plots, step plots, scatter plots, array plots, histograms, autocorrelations, and standardized time series. This chapter discusses these graphical output plots and concludes with an explanation of standardized time series.

## Keeping a Perspective

It is easy to become overwhelmed by the information produced by a simulation model. Different types of simulation output are most useful during different stages of a simulation study. Animations, graphs, and statistics all have their appropriate roles to play.

During the initial development and testing of the simulation model, animating the simulation logic while running SIGMA in Single Step or Graphics run mode is the most valuable. The logical animation in SIGMA is different from the physical animation of a system. Physical animations are useful in selling the simulation to prospective users and for catching gross logic errors in the simulation model. Physical animations using SIGMA are discussed in Chapter 8.

The Translate to English feature of SIGMA (found under the File menu) is an extremely effective tool for catching program logic errors or communicating the details of a model to persons not familiar with simulation. If the SIGMA-generated English description of your simulation is nonsense, it is likely that the logic in your model will not make sense either.

When evaluating alternative system designs at a high level, charts of the output are most useful. Here we are comparing the performance of very different systems (e.g., manual operations versus automated ones). Plots of the output not only offer information on overall system performance but also on the dynamics of the system. We can see, for example, if the manner in which we initialized the variables in our simulation has an inappropriate influence on the output. We will say more about the bias caused by initializing variables later in this chapter.

Once a particular design has been tentatively selected, it is important to do detailed sensitivity analysis and design optimization before a final recommendation is proposed. Here we are going to run a great many replications with different settings of the factors and parameters in our model. It is neither fun nor particularly informative to watch hundreds of different animations or look at hundreds of output plots. In the detailed design phase of a simulation study, numerical summaries of system performance in the form of output statistics are the most appropriate form of simulation output.

Finally, once a design has been finalized, the most effective form of output is a physical animation that lets people more fully understand the changes being suggested. Charts, statistical summaries, and the English description of your model can also be effective in helping you sell your ideas. The above discussion is summarized in Table 10.1.

```Typical Phases of a Simulation Project and Predominant Form of Output.
```
```Phase of the Study	        Predominant Form of the Output
```
```Model building and validation	Logical animations and English Descriptions
System evaluation	        Charts and plots
Detailed design	        Statistics
Implementation	                Physical animations
```

## Elementary Output Charts

The Output Plot dialog box is discussed in detail in Chapter 4. In SIGMA there are five basic output plots and two plots for advanced analysis. The basic charts are:

1. Step plots, which show the values of traced variables during a simulation run.
2. Line plots, which are similar to step plots except straight lines are drawn between successive data points.
3. Array plots, which show values of each element in an array.
4. Scatter plots, which show the relationship between pairs of traced output variables.
5. Histograms, which count the relative frequencies that different values of a variable occur.

More advanced output analysis is possible using the following charts:

6. Autocorrelation functions, which shows dependencies in the output.
7. Standardized time series (STS), which can be used to detect trends.

### Step and Line Plots

Step plots and line plots (also called index plots) are by far the most common form of graphical simulation output. Here we see how one variable changes during the simulation run. The variable of interest is chosen for the vertical axis of the plot and an indexing variable is chosen for the horizontal axis. The indexing variable is often simulated time, but other variables (such as customer identification number) might be used; the indexing variable should not decrease in value during the run for an index plot to make much sense. Figure 10.1 shows a line plot of the queue length in our simulated carwash (exponentially distributed service times were used here).

Queue Length as a Function of Customer Number

### Array Plots

Array plots show the maximum and current values of each element in an array. This looks like a series of "thermometers". This is useful, say, when you have an array of queues. You can see how each queue size changes and effects the other queues. (See Figure 10.2) A good example is to look at the variable Q in the model, FLOWSHOP.MOD.

Array Plot of Jobs in each Queue in FLOWSHOP.MOD

### Scatter Plots

When two variables are of equal importance, they can be chosen as the two axes of a scatter plot. A point is plotted for every observed pair of values for these variables. If the points tend to fall along a line with a positive slope, the two variables are likely to be positively correlated. (Small values of one variable are observed along with small values of the other variable and large with large.) Similarly, if the points in a scatter plot tend to fall along a line with a negative slope, negative correlation between the variables should be suspected. Figure 10.3 shows a scatter plot of the waiting time of each customer for our simulated bank in Chapter 5 and the number of customers in line when each customer departed; the expected positive correlation is evident.

Scatter Plot of Waiting Time and Line Length in a Bank

### Histograms

Histograms show counts of the number of times the observed values of a variable fall within a specified interval. These counts show the relative frequency that values of a variable are observed. Figure 10.4 shows a histogram of the number of customers in the carwash model with exponentially distributed interarrival times.

Histogram of Number of Customers in Carwash Queue

### Detecting Trends using Standardized Time Series

Using a standardized time series (STS) plot, it is possible to detect trends in the output that might not otherwise be visible. The STS plot should appear to be symmetric about zero if there is no trend in the data (adequate initial truncation). When there is a trend, the STS plots will be pulled either above (increasing trend) or below (decreasing trend) the zero line. STS can also be used for other types of inference such as confidence interval estimation. The STS plots in SIGMA are the unscaled standardized time series for the selected output measurement. The best way to learn how to use STS plots is to look at a few simple output series. Try generating a sequence of independent random variables, say, with the one state change

```X=NOR{0;1}.
```

Look at the line plot of the successive values of X and the corresponding STS plot. Now add a trend to the data, say,

```X=CLK*NOR{0;1}
```

and look at the difference in the same two plots. Do this for some more interesting and subtle trends (exponential decay, quadratic, sudden shift, etc.) and see how the STS behaves. You will find that familiarity with the behaviors of STS plots is a valuable visual tool for output analysis.

Standardizing a time series is similar to the familiar procedure of standardizing or normalizing a scalar statistic. Standardizing a scalar statistic, such as a sample mean, involves centering the statistic to have a zero mean and scaling its magnitude to generic units of measurement called standard deviations. Limit theorems can be applied that give us the asymptotic (large sample) probabilistic behavior of correctly standardized statistics under certain hypotheses. This limiting model for scalar statistics is typically the standard normal probability distribution. This model can be used for statistical inference such as testing hypotheses or constructing confidence intervals. Here we extend this concept to the standardization of an entire time series. More information about STS can be found in Section 10.5, Standardized Time Series.

The value of standardizing time series comes from the fact that the same mathematical analysis can be applied to series from a variety of sources. Thus, the technique of standardization serves as a mathematical surrogate for experience with the data under study. No matter what the original time series looks like, the standardized time series will be familiar if certain hypotheses are correct. An unusual appearance of a standardized time series can be used to conclude that these hypotheses are not valid. The statistical significance of these conclusions can be computed in the same manner as with standardized scalar statistics.

To illustrate using STS plots to detect trends, consider first the STS plot in Figure 10.5. Since this plot is mostly negative, a clear downward trend in the data is evident.

STS Plot Indicating a Downward Trend in the Data

A line plot of the actual data is shown in Figure 10.6, where the downward trend indicated by the STS plot is at best only marginally apparent.

Plot of the Raw Data Series for the STS Plot Above

Figure 10.7 is an STS plot that indicates the presence of a strong increasing trend in the data. The STS plot is pulled in the positive direction by this positive trend in the data. Figure 10.6 should be compared to Figure 10.7, where a negative trend was indicated.

An STS Plot indicating an Increasing Trend in the Data

The raw data for the STS plot in Figure 10.7 is plotted in Figure 10.8, where the increasing trend in the data is again only slightly detectable.

Raw Data for the STS Plot Above

The sensitivity of STS plots to trend has a down-side: they can indicate a trend which may disappear as more data is collected. However, given the potential seriousness of simulation initialization bias causing an artificial trend in the output, it seems better to be able to detect trends easily at the risk of falsely indicating a non-existent trend.

### Dependencies

The autocorrelation function is a plot of the correlation between two observations in the same output series as a function of how far apart (lag) the observations are. For example, the lag 1 autocorrelation function is the sample correlation between two adjacent output measurements. The autocorrelation function should drop off sharply at a lag of 1 if the observations are not correlated. This would indicate that the batch size is large enough to remove correlations between successive batched means and initial transient output observations have been truncated. Crude 95% confidence bounds at can be used as a very rough guide (Brockwell and Davis, 1987).

The standardized time series (STS) plots can also be used to visually assess whether or not there is significant positive serial correlation in an output series. The more jagged the STS plot appears, the less serial dependency in the output. If the STS plot is smoother than you are typically used to seeing, you can suspect either a serious trend in the data or significant positive serial dependency between successive observations of the output.

## Using Statistics

Batching is where non-overlapping, adjacent, equal-sized groups of data are averaged. The resulting series of batched means will often be more independent, less erratic, and have an approximately normal distribution. If there are about twenty batches, there is a sufficient number of degrees of freedom for most inferences (Schmeiser, 1983). Batched means are computed after truncation. From within SIGMA, you are limited to 10,000 batches of output data.

Sufficient statistics and maximum likelihood estimators are used for common inference. The assumptions behind the statistics are probably more important than the numbers themselves and should be understood before too much faith is placed in their values. When looking up the simple formulas, you can review the assumptions. Do not let an unfamiliarity with statistics prevent you from looking at the charts. Averages of squares (scatter plots) are sufficient for batched means confidence interval estimation and other inferences. Approximately twenty to thirty large batches are recommended (Schmeiser, 1983). Batching tends to make the output better approximate an independent sample from a normal distribution.

To illustrate the effects of batching, 1000 observations of the EAR process discussed in Chapter 9 were simulated. The mean of the process, μ was 10. The sample mean was 9.43. The autocorrelation function of this process and the histogram showed us that this output series does not appear to be independent nor does it have the characteristic "bell shaped" distribution function expected of normally distributed observations. We also saw that the correlation between the observations appears to slowly decrease as the lag (time interval) between the observations increases. There appeared to be significant correlation between neighboring observations at lag 1. The sample standard deviation was 9.339. (This is approximately equal to the sample mean, as expected from exponential data.)

To see the effects of batching, averaged groups of 10 observations in a sample of 5000 from the same process. The sample mean of the batched process was 9.739.

Even with a batch size as small as 10, we saw from the autocorrelation plot that the data appears now to have very little serial correlation. We also saw that the sample standard deviation is 4.838. The histogram of the batched means showed the data is beginning to look like it came from a normal, bell-shaped distribution (as expected from the Central Limit Theorem of statistics). With averages of only 10 observations from a highly-skewed dependent exponential distribution, we begin to approximate an independent normal data set. The mean and variance of the batched process can safely be used to form a batched means confidence interval for the mean of the process, based on the usual t-statistic with 499 degrees of freedom. (We can use the normal approximation to the t distribution for such a large degrees of freedom parameter.) For example, a 90% confidence interval for the process mean, μ, is found to be,

Substituting our statistics into the above interval estimation formula, we get the following confidence interval for the true mean of the process.

The true value of μ=10 was within this confidence interval, as it was for the wider 90% confidence interval of 9.696μ10.763 constructed with 1000 unbatched observations.

In addition to averages, time averages, and standard deviations, The STS area, A, STS{X} can be used for confidence intervals as described by Goldsman and Schruben (1984). For large samples, A might behave like the standard deviation times a chi-square random variable with 1 degree of freedom independent of the sample mean. (Use independently seeded replications to increase the degrees of freedom.) STS is discussed in the next section.

The (unscaled) STS maximum, M, located at the Kth of N batches might behave like the standard deviation times (N-K)K/N times a chi-square random variable with 3 degrees of freedom if N is very large (and the batch size is moderate). See Goldsman and Schruben (1984). This can be used for computing the STS maximum confidence interval estimator.

## Standardized Time Series

As a guide to standardizing a time series, we will first review the procedure of standardizing a scalar statistic. We will use the familiar t-statistic as an example. The data will consist of n observations

that are independent and have identical normal distributions. We wish to make inferences about the unknown population mean, μ. The average of the data sample

will be the statistic used for these inferences. The population variance, , is an unknown nuisance parameter.

Standardization involves the following steps.

1. Center the Statistic: The population mean, μ, is subtracted from the sample mean giving the random variable, , which has an expected value of zero. (Strictly speaking, this difference between the average and the mean would not be called a "statistic" since it includes the parameter, μ.)
2. Scale the Statistic Magnitude: Since statistics can come in an almost endless variety of measurement units, we will need to express the statistic in a common unit of measurement called a standard deviation. The magnitude of the statistic is scaled by dividing by . Our statistic is now
which is our standardized statistic. Standardized sample means will all have the same first two moments. The unknown scaling parameter, σ, can be either estimated or cancelled out of a ratio statistic; the cancelling out of this parameter in a ratio statistic is the more common approach and is followed here.
3. Cancel the Scale Parameter: The data is aggregated or batched into b exclusive adjacent groups of size m (if necessary, discard data from the front of the run so that ). The average of each batch is denoted as , i = 1,..., b. The usual unbiased estimator of the variance of the batched means is
Inferences about the parameter, µ, are based on the random t-ratio,
4. Apply Limit Theorems: The limiting distribution of Tb-1 is known. As (making since b is fixed), the distribution function of converges to that of a random variable with b-1 degrees of freedom. As , will converge to the constant μ from the law of large numbers. Also, from the Central Limit Theorem of statistics, the distribution function of will converge to that of a standard normal random variable. Thus, the distribution function of the ratio, Tb-1 (being a continuous mapping) will converge to that of a t random variable with b-1 degrees of freedom. The unknown scaling constant, σ, is cancelled out of the ratio.
5. Use the Limiting Probability Model for Inference: The limiting distribution of Tb-1 can be used for statistical inference and estimation.

The concept of standardization can be applied to an entire time series. The original series of observations is transformed into a standardized series of observations. We will hypothesize (and test) that the series is stationary. We also assume that there is some minimal amount of randomness in the process; however, we do not assume that the data is independent. The mathematical assumptions needed are given in a paper by Schruben (1983), where it is argued that many simulations on a computer will meet the imposed restrictions for applicability. Suppose we want to standardize the output from the ith run of a simulation, with i representing the run number. Let

denote m stationary but perhaps dependent observations in the ith output time series. We will standardize the sequence of cumulative means up to and including the kth observations, given by,

Similar steps to those in standardizing a scalar statistic are followed in standardizing a time series. The steps in standardization are as follows:

1. Center the Series: For run i, the sequence given by
will have a mean of zero if the series has a constant mean.
2. (a) Scale the Series Magnitude: The scaling constant for dependent sequences (independent of the run i) that we use is defined as
which is just the population variance in the special case of independent identically distributed data. Magnitude scaling is done by dividing by . The scaling constant is again unknown but will cancel out of our statistics as before.

There is one additional step required that was not necessary in the scalar standardization case. Different time series can be of different length, so we must also scale the length of the series. Thus, we have the additional step:

(b) Scale the Series Index: We will define the continuous index, . Our previous index is thus given by . We also add the starting point so that the standardized series is 0 at t = 0 and t = 1. The result is that all standardized time series have indices on the unit interval and start and end at zero. We now have what we will call a standardized time series given by
where is the greatest integer function.
3. Cancel the Scale Parameter: There are several functions that might be considered for the denominator of a ratio that cancels σ. We will consider here only one such function, the sum or limiting area under the function .
4. Apply Limit Theorems: The standardized series, , will converge in probability distribution to that of a Brownian Bridge stochastic process. Thus, the Brownian Bridge process plays the role in time series standardization that the normal random variable played in scalar standardization. An important feature of the standardized series, , is that it is constructed to be asymptotically independent of the sample mean, .
There are several functions of that will also be asymptotically distributed. The area, , will have a limiting normal distribution with zero mean and variance
Therefore,
will have a limiting distribution with one degree of freedom.
Now consider where each of b independent replications (or b batches of data) are standardized in the manner above. We can then add the resulting random variables, , for each replication or batch to obtain a random variable with b degrees of freedom.
5. Use the Limiting Probability Model for Inference: In a manner similar to the scalar case, the standardized (scalar) sample mean of all of the data is divided by the square root of over b to form a ratio where the unknown scale parameter, σ , cancels. For large values of m, the distribution of this ratio can be accurately modeled as having a t distribution with b degrees of freedom. The resulting (1-α)100% asymptotic confidence interval for the mean μ is
where is the "grand mean" of all of the data in all batches or replications. More complicated, but superior, confidence intervals can be obtained by weighting the standardized time series as in Goldsman and Schruben (1990). The SIGMA function STS{X} is equal to for the output time series of values of X; this function can be used with the other weightings given in the reference.
Also, each of the replication or batch means can be treated as a scalar random variable and standardized and squared, giving another random variable. Due to the independence of and the 's, these random variables can be added, giving a random variable with 2b-1 degrees of freedom. This can be considered as a "pooled" estimator of , which we will denote as Q. The same types of inferences can be made for the dependent simulation output series as were applicable in the independent data case. The resulting "t variate" is given by

Theoretical properties of confidence intervals formed using standardized time series are presented by Goldsman and Schruben (1984), which compares the standardized time series approach to conventional methods.

Standardized time series has been implemented in several simulation analysis packages, most notably at IBM (Heidelberger and Welch, 1983), Bell Labs (Nozari, 1985), and General Electric (Duersch and Schruben, 1986). These packages typically control initialization bias (see also Schruben, Singh, and Tierney, 1983, and Schruben, 1982) and run duration as well as produce confidence intervals. Other applications of standardized time series have been to selection and ranking problems (Goldsman, 1983) and simulation model validation (Chen and Sargent, 1984).

The asymptotic arguments above require that the batch size, m, become large as n is increased. The common method is to allow the batch size to grow as the sample size increases and keep the number of batches, b, fixed. Fixing b at 10 or 20 seems reasonable in most applications as long as the sample size is large (see Schmeiser, 1983).

The limiting probability model of a t random variable has the virtue that it is widely tabulated and has been studied extensively. There exist other limiting models that might be used, but none have been developed to the extent of the t model.