Monte Carlo methods, also called simulation studies, are statistical experiments used to study the behavior of statistical methods or measures under controlled situations. Lately, sophisticated methodological and computing tools have increased considerably the efficiency in the simulation process considering “variance reduction” but in most of the cases these experiments are still subject to uncertainty because of their finite nature. It can be easily noticed when you run a simulation different times, most probably different results are obtained. The main topics that should be considered are Monte Carlo error, as it is a measure of uncertainty, and the number of replications needed.
A simulation study is composed of the repeated application of three steps;
- generating simulated data
- performing some statistical procedure
- recording the results.
Generally, after some finite number of times some summary statistics are computed, but these methods are limited by their finite nature and there exists sampling variability. Also, when different simulations are made we may notice some differences in results and this between-simulation variability is called Monte Carlo error. The reasoning behind these differences that occur across simulations depends on the setting(design) of the experiment and on the number of simulated data sets or replicates.
MONTE CARLO METHODS
Monte Carlo methods rely on the possibility of producing an endless flow of random variables for well-known and moreover unknown distributions. Generally, the simulation is based on the idea that we produce uniform random variables without any interest on how these uniform random variables are obtained because there is a function already implemented in R which allows us to generate uniform random variables. The function in R is “runif”. The uniform distribution provides the basic probabilistic representation of randomness on a computer generator and it is needed in order to generate all other distributions. All the methods used (including runif) produce pseudo-random numbers in that there is no randomness involved. Based on an initial value u0 of a uniform U (0, 1) sequence and a transformation D, the uniform generator produces a sequence (ui) = (Di(u0)) of values in (0, 1) but the outcome has the same statistical properties as an iid sequence. There are different methods used to generate random variables.
Some of these methods are: Inverse Transform Method, General Transformation Methods.
Inverse Transform Method
The idea behind Inverse Transform Method is that it allows us to transform any random variable into a uniform random variable and vice versa. The implementation behind this method is: if X has density “f” and cdf “F”, then we have the relation:
F(x) = ∫_(-∞)
and we set U = F(X) and solve for X. This is because:
P(U £ u) = P[F(X) £ F(x)] = P[F-1(F(X)) £ F-1(F(x))] = P(X £ x),
where we have assumed that F has an inverse. This assumption is explained better (Monte Carlo Statistical Methods (Robert and Casella, 2004), but holds for most continuous distributions.
General Transform Methods
General Transform methods are used when a distribution with density f is linked in a relatively simple way to another distribution that is easier to simulate. This relationship can be used to construct an algorithm to simulate variables from f. A simple explanation(example) for this method is: you first generate random variables from uniform distribution then by using the inverse transform method (also called probability integral transform) you obtain exponential random variables then you may implement the general transformation depending on their relation in order to obtain different distribution such as gamma, beta, chi-square. However, there are some limitations when we use these methods: for example, we can’t generate variables for a chi-square with odd degrees of freedom in order to use the transformation, also the efficiency for this generation need to be considered.
Other important cases to discuss are the discrete distributions. For this group we have an all-purpose algorithm in order to generate random variables. In order to generate X ∼ Pq , where Pq is supported by the integers, we can calculate once for all, assuming we can store the probabilities:
p0 = Pq(X £ 0), p1 = Pq(X £ 1), p2 = Pq(X £ 2), p3 = Pq(X £ 3), … ,
and then generate U ∼ U[0,1] and take :
X = k if pk-1 < U < pk.
In most of these specific algorithms are more efficient. Improvement can come from a judicious choice of the probabilities first computed. Basically, knowing where the values of a certain distribution fall in most of the cases, you can put some restrictions in your algorithm in order for it to be more efficient by selecting a certain interval where most of observations are expected to fall. A simple example: when we want to generate from a Poisson distribution (with λ=196) we expect most of our observations to be in the interval
λ ± 3√λ. So, the restriction would be to accept only the values in the interval (154, 238).