Trace Driven Simulations

From Support

Jump to: navigation, search

In a trace-driven simulation whenever a value for a random variable is needed by the simulation, it is read from a data file. When it is practical, this input file contains actual historical records. In our carwash example, the trace might be a file of the intervals between successive car arrivals recorded while watching the system. Sometimes only a portion of the input is trace driven. In a fire department simulation, the times and locations of calls might be read from a file with data from a dispatcher's log book while other inputs, such as equipment repair status and travel times, might be generated as they are needed.

Trace data is simple to read into SIGMA using the DISK{} function. Recall that the DISK function has two arguments. The first argument is the full name of the data file (with drive and directory path if necessary); the second is an integer index telling which entry is to be read. When the index is zero, the file is read sequentially, wrapping around to start at the beginning again when the end of the file is reached.

We could place the values (separated by at least one space) of times between customer arrivals at our carwash in a data file called ARRIVAL.DAT. We then could use the function,


as the delay time on the self-scheduling edge for the ENTER vertex that generates successive customers arrivals. If ARRIVAL.DAT contains only five observations and looks like the following

.32    2.6
.78    4.3  .85,

then the sequence read by the DISK{ARRIVAL.DAT;0}would be

.32, 2.6, .78, 4.3, .85, .32 (wrapped around) 2.6, ...

There are some distinct advantages to having the values of random processes read from a data file. Foremost, there is less concern with the validity of the trace input data than when the inputs are artificially generated. We never really know how individual input variables might actually be distributed. Furthermore, it is quite difficult to capture the dependencies between different input processes or between successive values of the same input process. (See Modeling Dependent Input)

When attempting to validate that a model accurately represents the behavior of a real system, there is probably no better test than to simulate previous system behavior using past input data. All discrepancies between the performance of the model and the system can then be attributed to model assumptions or errors in the simulation code. Assumptions and code errors are only dangerous if they are hidden. Assumptions can be evaluated as to their potential impact on decisions and the benefits they offer in model simplification. Known coding errors can be corrected. Accurate representation of the past is a reasonable minimal expectation. However, just because your model closely imitates last year's performance with last year's input data by no means makes the model correct or even useful. Driving a simulation with an artificial data trace rather than historical data is a useful technique for debugging the logic of a model. Specific sequences of otherwise random events can be forced to reoccur in a model that is being tested or enriched.

Many of the disadvantages of trace driven simulations are more or less obvious; some are not. While historical data traces provide a valuable source of simulation input when developing or changing a model, traces are not a good general approach to driving a simulation model when it is being used for analysis. Trace driven simulations require storage space for the data buffers, or they can be very slow due to the significant overhead of reading input files. Historical data that is detailed enough to drive a simulation is probably not available and would be time-consuming and expensive to collect. Actual data is also subject to errors in observation or may be invalid merely due to the intrusion of the observer. Even if detailed data is readily available, the simulation still could not be independently replicated or run for a longer period than the interval for which actual data was collected. In using trace input, we are giving up two of the major advantages to simulation modeling: time compression and independent replication.

Using a historical data trace as input when considering alternative policies or designs is probably not valid. Most trace driven simulations are closed systems. That is, the laws governing the input processes are not dependent on the state of the simulation. On the other hand, the actual system most likely is an open system that influences its environment. Although they are run all the time, trace-driven, "what if" simulation experiments are usually not appropriate. All statements about the effect of a change are based on the implicit assumption that the change has no influence on the environment in which the system operates. This is analogous to assuming that a system operates in an inelastic economy; demand is not altered by supply, price, quality, etc. What we most likely want to know is how sensitive the new system might be to changes in the input. It is difficult to do input sensitivity analysis with trace input.

Another disadvantage concerns rare but important events (i.e., a single, very long repair time at a service center). Since an unusual event is, by definition, unlikely to be on any given historical data trace, we may never see its influence in our simulation runs. Perhaps worse, if such a rare event happens to be in our trace, it will occur in every system we simulate. We might choose an alternative that handles this unusual case well but is too expensive or does not perform well in more typical situations.

Back Inputs/Outputs

Personal tools