# Chapter 7 The Sampling Distribution

## 7.1 Student Learning Objective

In this section we integrate the concept of *data* that is extracted
from a sample with the concept of a *random variable*. The new element
that connects between these two concepts is the notion of *sampling
distribution*. The data we observe results from the specific sample that
was selected. The sampling distribution, in a similar way to random
variables, corresponds to all samples that could have been selected.
(Or, stated in a different tense, to the sample that will be selected
prior to the selection itself.) Summaries of the distribution of the
data, such as the sample mean and the sample standard deviation, become
random variables when considered in the context of the sampling
distribution. In this section we investigate the sampling distribution
of such data summaries. In particular, it is demonstrated that (for
large samples) the sampling distribution of the sample average may be
approximated by the Normal distribution. The mathematical theorem that
proves this approximation is called the *Central Limit Theory*. By the
end of this chapter, the student should be able to:

Comprehend the notion of sampling distribution and simulate the sampling distribution of the sample average.

Relate the expectation and standard deviation of a measurement to the expectation and standard deviation of the sample average.

Apply the Central Limit Theorem to the sample averages.

## 7.2 The Sampling Distribution

In Chapter 5 the concept of a random variable was
introduced. As part of the introduction we used an example that involved
the selection of a random person from the population and the measuring
of his/her height. Prior to the action of selection, the height of that
person is a *random variable*. It has the potential of obtaining any of
the heights that are present in the population, which is the *sample
space* of this example, with a distribution that reflects the relative
frequencies of each of the heights in the population: the
*probabilities* of the values. After the selection of the person and the
measuring of the height we get a particular value. This is the *observed
value* and is no longer a random variable. In this section we extend the
concept of a random variable and define the concept of *a random
sample*.

### 7.2.1 A Random Sample

The relation between the random sample and the data is similar to the
relation between a random variable and the observed value. The data is
the observed values of a sample taken from a population. The content of
the data is known. The random sample, similarly to a random variable, is
the data that *will* be selected when taking a sample, prior to the
selection itself. The content of the random sample is unknown, since the
sample has not yet been taken. Still, just like for the case of the
random variable, one is able to say what the possible evaluations of the
sample may be and, depending on the mechanism of selecting the sample,
what are the probabilities of the different potential evaluations. The
collection of all possible evaluations of the sample is the *sample
space of the random sample* and the probabilities of the different
evaluations produce the *distribution* of the random sample.

(Alternatively, if one prefers to speak in past tense, one can define the sample space of a random sample to be the evaluations of the sample that could have taken place, with the distribution of the random sample being the probabilities of these evaluations.)

A *statistic* is a function of the data. Example of statistics are the
average of the data, the sample variance and standard deviation, the
median of the data, etc. In each case a given formula is applied to the
data. In each type of statistic a different formula is applied.

The same formula that is applied to the observed data may, in principle,
be applied to random samples. Hence, for example, one may talk of the
sample average, which is the average of the elements in the data. The
average, considered in the context of the observed data, is a number and
its value is known. However, if we think of the average in the context
of a random sample then it becomes a random variable. Prior to the
selection of the actual sample we do not know what values it will
include. Hence, we cannot tell what the outcome of the average of the
values will be. However, due to the identification of all possible
evaluations that the sample can possess we may say in advance what is
the collection of values the sample average can have. This is the sample
space of the sample average. Moreover, from the sampling distribution of
the random sample one may identify the probability of each value of the
sample average, thus obtaining the *sampling distribution* of the sample
average.

The same line of argumentation applies to any statistic. Computed in the context of the observed data, the statistic is a known number that may, for example, be used to characterize the variation in the data. When thinking of a statistic in the context of a random sample it becomes a random variable. The distribution of the statistic is called the sampling distribution of the statistic. Consequently, we may talk of the sampling distribution of the median, the sample distribution of the sample variance, etc.

Random variables are also applied as models for uncertainty in future measurements in more abstract settings that need not involve a specific population. Specifically, we introduced the Binomial and Poisson random variables for settings that involve counting and the Uniform, Exponential, and Normal random variables for settings where the measurement is continuous.

The notion of a sampling distribution may be extended to a situation
where one is taking several measurements, each measurement taken
independently of the others. As a result one obtains a *sequence* of
measurements. We use the term “sample” to denote this sequence. The
distribution of this sequence is also called the sampling distribution.
If all the measurements in the sequence are Binomial then we call it a
*Binomial sample*. If all the measurements are Exponential we call it an
*Exponential sample* and so forth.

Again, one may apply a formula (such as the average) to the content of
the random sequence and produce a random variable. The term *sampling
distribution* describes again the distribution that the random variable
produced by the formula inherits from the sample.

In the next subsection we examine an example of a sample taken from a population. Subsequently, we discuss examples that involves a sequence of measurements from a theoretical model.

### 7.2.2 Sampling From a Population

Consider taking a sample from a population. Let us use again for the
illustration the file “`pop1.csv`

” like we did in
Chapter 4. The data frame produced from the file
contains the sex and hight of the 100,000 members of some imaginary
population. Recall that in Chapter 4 we applied the
function “`sample`

” to randomly sample the height of a single subject
from the population. Let us apply the same function again, but this time
in order to sample the heights of 100 subjects:

```
pop.1 <- read.csv("_data/pop1.csv")
X.samp <- sample(pop.1$height,100)
X.samp
```

```
## [1] 178 182 183 172 177 161 163 185 143 157 171 182 182 178 157 165 153
## [18] 178 164 168 180 166 192 182 172 164 163 182 165 141 157 158 188 168
## [35] 161 158 160 162 163 170 171 183 173 160 178 171 159 170 190 179 159
## [52] 173 160 174 179 172 176 181 171 186 155 165 175 191 169 179 166 184
## [69] 181 166 158 168 165 168 155 185 196 145 153 163 172 163 177 184 165
## [86] 156 140 202 162 157 176 176 171 166 185 171 184 173 174 162
```

In the first line of code we produce a data frame that contains the information on the entire population. In the second line we select a sample of size 100 from the population, and in the third line we present the content of the sample.

The first argument to the function “`sample`

” that selects the sample is
the sequence of length 100,000 with the list of heights of all the
members of the population. The second argument indicates the sample
size, 100 in this case. The outcome of the random selection is stored in
the object “`X.samp`

”, which is a sequence that contains 100 heights.

Typically, a researcher does not get to examine the entire population.
Instead, measurements on a sample from the population are made. In
relation to the imaginary setting we simulate in the example, the
typical situation is that the research does not have the complete list
of potential measurement evaluations, i.e. the complete list of 100,000
heights in “`pop.1$height`

”, but only a sample of measurements, namely
the list of 100 numbers that are stored in “`X.samp`

” and are presented
above. The role of statistics is to make inference on the parameters of
the unobserved population based on the information that is obtained from
the sample.

For example, we may be interested in estimating the mean value of the heights in the population. A reasonable proposal is to use the sample average to serve as an estimate:

`mean(X.samp)`

`## [1] 170.19`

In our artificial example we can actually compute the true population mean:

`mean(pop.1$height)`

`## [1] 170.035`

Hence, we may see that although the match between the estimated value and the actual value is not perfect still they are close enough.

The actual estimate that we have obtained resulted from the specific sample that was collected. Had we collected a different subset of 100 individuals we would have obtained different numerical value for the estimate. Consequently, one may wonder: Was it pure luck that we got such good estimates? How likely is it to get estimates that are close to the target parameter?

Notice that in realistic settings we do not know the actual value of the target population parameters. Nonetheless, we would still want to have at least a probabilistic assessment of the distance between our estimates and the parameters they try to estimate. The sampling distribution is the vehicle that may enable us to address these questions.

In order to illustrate the concept of the sampling distribution let us select another sample and compute its average:

```
X.samp <- sample(pop.1$height,100)
X.bar <- mean(X.samp)
X.bar
```

`## [1] 169.08`

and do it once more:

```
X.samp <- sample(pop.1$height,100)
X.bar <- mean(X.samp)
X.bar
```

`## [1] 170.89`

In each case we got a different value for the sample average. In the first of the last two iterations the result was more than 1 centimeter away from the population average, which is equal to 170.035, and in the second it was within the range of 1 centimeter. Can we say, prior to taking the sample, what is the probability of falling within 1 centimeter of the population mean?

Chapter 4 discussed the random variable that emerges by
randomly sampling a single number from the population presented by the
sequence “`pop.1$height`

”. The distribution of the random variable
resulted from the assignment of the probability 1/100,000 to each one of
the 100,000 possible outcomes. The same principle applies when we
randomly sample 100 individuals. Each possible outcome is a collection
of 100 numbers and each collection is assigned equal probability. The
resulting distribution is called *the sampling distribution*.

The distribution of the average of the sample emerges from this distribution: With each sample one may associate the average of that sample. The probability assigned to that average outcome is the probability of the sample. Hence, one may assess the probability of falling within 1 centimeter of the population mean using the sampling distribution. Each sample produces an average that either falls within the given range or not. The probability of the sample average falling within the given range is the proportion of samples for which this event happens among the entire collection of samples.

However, we face a technical difficulty when we attempt to assess the sampling distribution of the average and the probability of falling within 1 centimeter of the population mean. Examination of the distribution of a sample of a single individual is easy enough. The total number of outcomes, which is 100,000 in the given example, can be handled with no effort by the computer. However, when we consider samples of size 100 we get that the total number of ways to select 100 number out of 100,000 numbers is in the order of \(10^{342}\) (1 followed by 342 zeros) and cannot be handled by any computer. Thus, the probability cannot be computed.

As a compromise we will approximate the distribution by selecting a large number of samples, say 100,000, to represent the entire collection, and use the resulting distribution as an approximation of the sampling distribution. Indeed, the larger the number of samples that we create the more accurate the approximation of the distribution is. Still, taking 100,000 repeats should produce approximations which are good enough for our purposes.

Consider the sampling distribution of the sample average. We simulated
above a few examples of the average. Now we would like to simulate
100,000 such examples. We do this by creating first a sequence of the
length of the number of evaluations we seek (100,000) and then write a
small program that produces each time a new random sample of size 100
and assigns the value of the average of that sample to the appropriate
position in the sequence. Do first and explain later^{17}:

```
X.bar <- rep(0,10^5)
for(i in 1:10^5) {
X.samp <- sample(pop.1$height,100)
X.bar[i] <- mean(X.samp)
}
```

In the first line we produce a sequence of length 100,000 that contains
zeros. The function “`rep`

” creates a sequence that contains repeats of
its first argument a number of times that is specified by its second
argument. In this example, the numerical value 0 is repeated 100,000
times to produce a sequence of zeros of the length we seek.

The main part of the program is a “`for`

” loop. The argument of the
function “`for`

” takes the special form: “*index.name* `in`

*index.values*”, where *index.name* is the name of the running index and
*index.values* is the collection of values over which the running index
is evaluated. In each iteration of the loop the running index is
assigned a value from the collection and the expression that follows the
brackets of the “`for`

” function is evaluated with the given value of
the running index.

In the given example the collection of values is produced by the
expression “`1:n`

”. Recall that the expression “`1:n`

” produces the
collection of integers between `1`

and `n`

. Here, `n`

= 100,000. Hence,
in the given application the collection of values is a sequence that
contains the integers between 1 and 100,000. The running index is called
“`i`

”. the expression is evaluated 100,000 times, each time with a
different integer value for the running index “`i`

”.

The `R`

system treats a collection of expressions enclosed within curly
brackets as one entity. Therefore, in each iteration of the “`for`

”
loop, the lines that are within the curly brackets are evaluated. In the
first line a random sample of size 100 is produced and in the second
line the average of the sample is computed and stored in the \(i\)-th
position of the sequence “`X.bar`

”. Observe that the specific position
in the sequence is referred to by using square brackets.

The program changes the original components of the sequence, from 0 to
the average of a random sample, one by one. When the loop ends all
values are changed and the sequence “`X.bar`

” contains 100,000
evaluations of the sample average. The last line, which is outside the
curly brackets and is evaluated after the “`for`

” loop ends, produces an
histogram of the averages that were simulated. The histogram is
presented in the lower panel of Figure 7.1.

Compare the distribution of the sample average to the distribution of the heights in the population that was presented first in Figure 4.1 and is currently presented in the upper panel of Figure 7.1. Observe that both distributions are centered at about 170 centimeters. Notice, however, that the range of values of the sample average lies essentially between 166 and 174 centimeters, whereas the range of the distribution of heights themselves is between 127 and 217 centimeter. Broadly speaking, the sample average and the original measurement are centered around the same location but the sample average is less spread.

Specifically, let us compare the expectation and standard deviation of the sample average to the expectation and standard deviation of the original measurement:

`mean(pop.1$height)`

`## [1] 170.035`

`sd(pop.1$height)`

`## [1] 11.23205`

`mean(X.bar)`

`## [1] 170.0346`

`sd(X.bar)`

`## [1] 1.11987`

Observe that the expectation of the population and the expectation of the sample average, are practically the same, the standard deviation of the sample average is about 10 times smaller than the standard deviation of the population. This result is not accidental and actually reflects a general phenomena that will be seen below in other examples.

We may use the simulated sampling distribution in order to compute an approximation of the probability of the sample average falling within 1 centimeter of the population mean. Let us first compute the relevant probability and then explain the details of the computation:

`mean(abs(X.bar - mean(pop.1$height)) <= 1)`

`## [1] 0.62649`

Hence we get that the probability of the given event is about 62.6%.

The object “`X.bar`

” is a sequence of length 100,000 that contains the
simulated sample averages. This sequence represents the distribution of
the sample average. The expression
“`abs(X.bar - mean(pop.1$height)) <= 1`

” produces a sequence of logical
“`TRUE`

” or “`FALSE`

” values, depending on the value of the sample
average being less or more than one unit away from the population mean.
The application of the function “`mean`

” to the output of the last
expression results in the computation of the relative frequency of
`TRUE`

s, which corresponds to the probability of the event of interest.

**Example 7.1**A poll for the determination of the support in the population for a candidate was describe in Example 5.1. The proportion in the population of supporters was denoted by \(p\). A sample of size \(n=300\) was considered in order to estimate the size of \(p\). We identified that the distribution of \(X\), the number of supporters in the sample, is \(\mathrm{Binomial}(300,p)\). This distribution is the sampling distribution

^{18}of \(X\). One may use the proportion in the sample of supporters, the number of supporters in the sample divided by 300, as an estimate to the parameter \(p\). The sampling distribution of this quantity, \(X/300\), may be considered in order to assess the discrepancy between the estimate and the actual value of the parameter.

### 7.2.3 Theoretical Models

Sampling distribution can also be considered in the context of theoretical distribution models. For example, take a measurement \(X \sim \mathrm{Binomial}(10,0.5)\) from the Binomial distribution. Assume 64 independent measurements are produced with this distribution: \(X_1, X_2, \ldots, X_{64}\). The sample average in this case corresponds to the distribution of the random variable produced by averaging these 64 random variables:

\[\bar X = \frac{X_1 + X_2 + \cdots + X_{64}} {64} = \frac{1}{64}\sum_{i=1}^{64} X_i\;.\] Again, one may wonder what is the distribution of the sample average \(\bar X\) in this case?

We can approximate the distribution of the sample average by simulation.
The function “`rbinom`

” produces a random sample from the Binomial
distribution. The first argument to the function is the sample size,
which we take in this example to be equal to 64. The second and third
arguments are the parameters of the Binomial distribution, 10 and 0.5 in
this case. We can use this function in the simulation:

```
X.bar <- rep(0,10^5)
for(i in 1:10^5) {
X.samp <- rbinom(64,10,0.5)
X.bar[i] <- mean(X.samp)
}
```

Observe that in this code we created a sequence of length 100,000 with
evaluations of the sample average of 64 Binomial random variables. We
start with a sequence of zeros and in each iteration of the “`for`

” loop
a zero is replaced by the average of a random sample of 64 Binomial
random variables.

Examine the sampling distribution of the Binomial average:

`mean(X.bar)`

`## [1] 5.000031`

`sd(X.bar)`

`## [1] 0.198051`

The histogram of the sample average is presented in the lower panel of Figure 7.2. Compare it to the distribution of a single Binomial random variable that appears in the upper panel. Notice, once more, that the center of the two distributions coincide but the spread of the sample average is smaller. The sample space of a single Binomial random variable is composed of integers. The sample space of the average of 64 Binomial random variables, on the other hand, contains many more values and is closer to the sample space of a random variable with a continuous distribution.

Recall that the expectation of a \(\mathrm{Binomial}(10,0.5)\) random variable is \(\Expec(X) = 10 \cdot 0.5 = 5\) and the variance is \(\Var(X) = 10 \cdot 0.5 \cdot 0.5 = 2.5\) (thus, the standard deviation is \(\sqrt{2.5} = 1.581139\)). Observe that the expectation of the sample average that we got from the simulation is essentially equal to 5 and the standard deviation is 0.1982219.

One may prove mathematically that the expectation of the sample mean is equal to the theoretical expectation of its components:

\[\Expec(\bar X) = \Expec(X)\;.\] The results of the simulation for the expectation of the sample average are consistent with the mathematical statement. The mathematical theory of probability may also be used in order to prove that the variance of the sample average is equal to the variance of each of the components, divided by the sample size:

\[\Var(\bar X) = \Var(X)/n\;,\] here \(n\) is the number of observations in the sample. Specifically, in the Binomial example we get that \(\Var(\bar X) = 2.5/64\), since the variance of a Binomial component is 2.5 and there are 64 observations. Consequently, the standard deviation is \(\sqrt{2.5/64} = 0.1976424\), in agreement, more or less, with the results of the simulation (that produced 0.1982219 as the standard deviation).

Consider the problem of identifying the central interval that contains
95% of the distribution. In the Normal distribution we were able to use
the function “`qnorm`

” in order to compute the percentiles of the
theoretical distribution. A function that can be used for the same
purpose for simulated distribution is the function “`quantile`

”. The
first argument to this function is the sequence of simulated values of
the statistic, “`X.bar`

” in the current case. The second argument is a
number between 0 and 1, or a sequence of such numbers:

`quantile(X.bar,c(0.025,0.975))`

```
## 2.5% 97.5%
## 4.609375 5.390625
```

We used the sequence “`c(0.025,0.975)`

” as the input to the second
argument. As a result we obtained the output 4.609375, which is the
2.5%-percentile of the sampling distribution of the average, and
5.390625, which is the 97.5%-percentile of the sampling distribution of
the average.

Of interest is to compare these percentiles to the parallel percentiles of the Normal distribution with the same expectation and the same standard deviation as the average of the Binomials:

`qnorm(c(0.025,0.975),mean(X.bar),sd(X.bar))`

`## [1] 4.611859 5.388204`

Observe the similarity between the percentiles of the distribution of
the average and the percentiles of the Normal distribution. This
similarity is a reflection of the Normal approximation of the sampling
distribution of the average, which is formulated in the next section
under the title: *The Central Limit Theorem*.

**Example 7.2**The distribution of the number of events of radio active decay in a second was modeled in Example 5.3 according to the Poisson distribution. A quantity of interest is \(\lambda\), the expectation of that Poisson distribution. This quantity may be estimated by measuring the total number of decays over a period of time and dividing the outcome by the number of seconds in that period of time. Let \(n\) be this number of second. The procedure just described corresponds to taking the sample average of \(\mathrm{Poisson}(\lambda)\) observations for a sample of size \(n\). The expectation of the sample average is \(\lambda\) and the variance is \(\lambda/n\), leading to a standard deviation of size \(\sqrt{\lambda/n}\). The Central Limit Theorem states that the sampling distribution of this average corresponds, approximately, to the Normal distribution with this expectation and standard deviation.

## 7.3 Law of Large Numbers and Central Limit Theorem

The Law of Large Numbers and the Central Limit Theorem are mathematical theorems that describe the sampling distribution of the average for large samples.

### 7.3.1 The Law of Large Numbers

The Law of Large Numbers states that, as the sample size becomes larger, the sampling distribution of the sample average becomes more and more concentrated about the expectation.

Let us demonstrate the Law of Large Numbers in the context of the Uniform distribution. Let the distribution of the measurement \(X\) be \(\mathrm{Uniform}(3,7)\). Consider three different sample sizes \(n\): \(n=10\), \(n=100\), and \(n=1000\). Let us carry out a simulation similar to the simulations of the previous section. However, this time we run the simulation for the three sample sizes in parallel:

```
unif.10 <- rep(0,10^5)
unif.100 <- rep(0,10^5)
unif.1000 <- rep(0,10^5)
for(i in 1:10^5) {
X.samp.10 <- runif(10,3,7)
unif.10[i] <- mean(X.samp.10)
X.samp.100 <- runif(100,3,7)
unif.100[i] <- mean(X.samp.100)
X.samp.1000 <- runif(1000,3,7)
unif.1000[i] <- mean(X.samp.1000)
}
```

Observe that we have produced 3 sequences of length 100,000 each:
“`unif.10`

”, “`unif.100`

”, and “`unif.1000`

”. The first sequence is an
approximation of the sampling distribution of an average of 10
independent Uniform measurements, the second approximates the sampling
distribution of an average of 100 measurements and the third the
distribution of an average of 1000 measurements. The distribution of
single measurement in each of the examples is \(\mathrm{Uniform}(3,7)\).

Consider the expectation of sample average for the three sample sizes:

`mean(unif.10)`

`## [1] 5.000141`

`mean(unif.100)`

`## [1] 5.000465`

`mean(unif.1000)`

`## [1] 4.999981`

For all sample size the expectation of the sample average is equal to 5, which is the expectation of the \(\mathrm{Uniform}(3,7)\) distribution.

Recall that the variance of the \(\mathrm{Uniform}(a,b)\) distribution is \((b-a)^2/12\). Hence, the variance of the given Uniform distribution is \(\Var(X) = (7-3)^2/12 = 16/12 \approx 1.3333\). The variances of the sample averages are:

`var(unif.10)`

`## [1] 0.1329634`

`var(unif.100)`

`## [1] 0.01341528`

`var(unif.1000)`

`## [1] 0.001338488`

Notice that the variances decrease with the increase of the sample sizes. The decrease is according to the formula \(\Var(\bar X) = \Var(X)/n\).

The variance is a measure of the spread of the distribution about the expectation. The smaller the variance the more concentrated is the distribution around the expectation. Consequently, in agreement with the Law of Large Numbers, the larger the sample size the more concentrated is the sampling distribution of the sample average about the expectation.

### 7.3.2 The Central Limit Theorem (CLT)

The Law of Large Numbers states that the distribution of the sample average tends to be more concentrated as the sample size increases. The Central Limit Theorem (CLT in short) provides an approximation of this distribution.

The deviation between the sample average and the expectation of the measurement tend to decreases with the increase in sample size. In order to obtain a refined assessment of this deviation one needs to magnify it. The appropriate way to obtain the magnification is to consider the standardized sample average, in which the deviation of the sample average from its expectation is divided by the standard deviation of the sample average:

\[Z = \frac{\bar X - \Expec(\bar X)}{\sqrt{\Var(\bar X)}}\;.\]

Recall that the expectation of the sample average is equal to the expectation of a single random variable (\(\Expec(\bar X) = \Expec(X)\)) and that the variance of the sample average is equal to the variance of a single observation, divided by the sample size (\(\Var(\bar X) = \Var(X)/n\)). Consequently, one may rewrite the standardized sample average in the form:

\[Z = \frac{\bar X - \Expec(X)}{\sqrt{\Var(X)/n}}= \frac{\sqrt{n}(\bar X - \Expec(X))}{\sqrt{\Var(X)}}\;.\]
The second equality follows from placing in the numerator the square
root of \(n\) which *divides* the term in the denominator. Observe that
with the increase of the sample size the decreasing difference between
the average and the expectation is magnified by the square root of \(n\).

The Central Limit Theorem states that, with the increase in sample size, the sample average converges (after standardization) to the standard Normal distribution.

Let us examine the Central Normal Theorem in the context of the example
of the Uniform measurement. In Figure 7.3 you may find
the (approximated) density of the standardized average for the three
sample sizes based on the simulation that we carried out previously (as
*red*, *green*, and *blue* lines). Along side with these densities you
may also find the theoretical density of the standard Normal
distribution (as a *black* line). Observe that the four curves are
almost one on top of the other, proposing that the approximation of the
distribution of the average by the Normal distribution is good even for
a sample size as small as \(n=10\).

However, before jumping to the conclusion that the Central Limit Theorem applies to any sample size, let us consider another example. In this example we repeat the same simulation that we did with the Uniform distribution, but this time we take \(\mathrm{Exponential}(0.5)\) measurements instead:

```
exp.10 <- rep(0,10^5)
exp.100 <- rep(0,10^5)
exp.1000 <- rep(0,10^5)
for(i in 1:10^5) {
X.samp.10 <- rexp(10,0.5)
exp.10[i] <- mean(X.samp.10)
X.samp.100 <- rexp(100,0.5)
exp.100[i] <- mean(X.samp.100)
X.samp.1000 <- rexp(1000,0.5)
exp.1000[i] <- mean(X.samp.1000)
}
```

The expectation of an \(\mathrm{Exponential}(0.5)\) random variable is \(\Expec(X) = 1/\lambda = 1/0.5 = 2\) and the variance is \(\Var(X) = 1/\lambda^2 = 1/(0.5)^2 = 4\). Observe below that the expectations of the sample averages are equal to the expectation of the measurement and the variances of the sample averages follow the relation \(\Var(\bar X) = \Var (X)/n\):

`mean(exp.10)`

`## [1] 1.999011`

`mean(exp.100)`

`## [1] 2.000074`

`mean(exp.1000)`

`## [1] 2.000183`

So the expectations of the sample average are all equal to 2. For the variance we get:

`var(exp.10)`

`## [1] 0.3980615`

`var(exp.100)`

`## [1] 0.04012221`

`var(exp.1000)`

`## [1] 0.004005182`

Which is in agreement with the decrease proposed by the theory,

However, when one examines the densities of the sample averages in
Figure @reF(fig:SampDist9) one may see a clear distinction between the
sampling distribution of the average for a sample of size 10 and the
normal distribution (compare the *red* curve to the *black* curve. The
match between the *green* curve that corresponds to a sample of size
\(n=100\) and the *black* line is better, but not perfect. When the sample
size is as large as \(n=1000\) (the *blue* curve) then the agreement with
the normal curve is very good.

### 7.3.3 Applying the Central Limit Theorem

The conclusion of the Central Limit Theorem is that the sampling distribution of the sample average can be approximated by the Normal distribution, regardless what is the distribution of the original measurement, but provided that the sample size is large enough. This statement is very important, since it allows us, in the context of the sample average, to carry out probabilistic computations using the Normal distribution even if we do not know the actual distribution of the measurement. All we need to know for the computation are the expectation of the measurement, its variance (or standard deviation) and the sample size.

The theorem can be applied whenever probability computations associated with the sampling distribution of the average are required. The computation of the approximation is carried out by using the Normal distribution with the same expectation and the same standard deviation as the sample average.

An example of such computation was conducted in Subsection 7.2.3 where the central interval that contains 95% of the sampling distribution of a Binomial average was required. The 2.5%- and the 97.5%-percentiles of the Normal distribution with the same expectation and variance as the sample average produced boundaries for the interval. These boundaries were in good agreement with the boundaries produced by the simulation. More examples will be provided in the Solved Exercises of this chapter and the next one.

With all its usefulness, one should treat the Central Limit Theorem with a grain of salt. The approximation may be valid for large samples, but may be bad for samples that are not large enough. When the sample is small a careless application of the Central Limit Theorem may produce misleading conclusions.

## 7.4 Exercises

**Exercise 7.1 **The file “`pop2.csv`

” contains information associated
to the blood pressure of an imaginary population of size 100,000. The
file can be found on the internet
(http://pluto.huji.ac.il/~msby/StatThink/Datasets/pop2.csv). The
variables in this file are:

- id:
A numerical variable. A 7 digits number that serves as a unique identifier of the subject.

- sex:
A factor variable. The sex of each subject. The values are either “

`MALE`

” or “`FEMALE`

”.- age:
A numerical variable. The age of each subject.

- bmi:
A numerical variable. The body mass index of each subject.

- systolic:
A numerical variable. The systolic blood pressure of each subject.

- diastolic:
A numerical variable. The diastolic blood pressure of each subject.

- group:
A factor variable. The blood pressure category of each subject. The values are “

`NORMAL`

” both the systolic blood pressure is within its normal range (between 90 and 139) and the diastolic blood pressure is within its normal range (between 60 and 89). The value is “`HIGH`

” if either measurements of blood pressure are above their normal upper limits and it is “`LOW`

” if either measurements are below their normal lower limits.

Our goal in this question is to investigate the sampling distribution of
the sample average of the variable “`bmi`

”. We assume a sample of size
\(n=150\).

Compute the population average of the variable “

`bmi`

”.Compute the population standard deviation of the variable “

`bmi`

”.Compute the expectation of the sampling distribution for the sample average of the variable.

Compute the standard deviation of the sampling distribution for the sample average of the variable.

Identify, using simulations, the central region that contains 80% of the sampling distribution of the sample average.

- Identify, using the Central Limit Theorem, an approximation of the central region that contains 80% of the sampling distribution of the sample average.

**Exercise 7.2 **A subatomic particle hits a linear detector at random
locations. The length of the detector is 10 nm and the hits are
uniformly distributed. The location of 25 random hits, measured from a
specified endpoint of the interval, are marked and the average of the
location computed.

What is the expectation of the average location?

What is the standard deviation of the average location?

Use the Central Limit Theorem in order to approximate the probability the average location is in the left-most third of the linear detector.

- The central region that contains 99% of the distribution of the average is of the form \(5 \pm c\). Use the Central Limit Theorem in order to approximate the value of c.

## 7.5 Summary

### Glossary

- Random Sample:
The probabilistic model for the values of a measurements in the sample, before the measurement is taken.

- Sampling Distribution:
The distribution of a random sample.

- Sampling Distribution of a Statistic:
A statistic is a function of the data; i.e. a formula applied to the data. The statistic becomes a random variable when the formula is applied to a random sample. The distribution of this random variable, which is inherited from the distribution of the sample, is its sampling distribution.

- Sampling Distribution of the Sample Average:
The distribution of the sample average, considered as a random variable.

- The Law of Large Numbers:
A mathematical result regarding the sampling distribution of the sample average. States that the distribution of the average of measurements is highly concentrated in the vicinity of the expectation of a measurement when the sample size is large.

- The Central Limit Theorem:
A mathematical result regarding the sampling distribution of the sample average. States that the distribution of the average is approximately Normal when the sample size is large.

### Discussion in the Forum

Limit theorems in mathematics deal with the convergence of some property to a limit as some indexing parameter goes to infinity. The Law of Large Numbers and the Central Limit Theorem are examples of limit theorems. The property they consider is the sampling distribution of the sample average. The indexing parameter that goes to infinity is the sample size \(n\).

Some people say that the Law of Large Numbers and the Central Limit Theorem are useless for practical purposes. These theorems deal with a sample size that goes to infinity. However, all sample sizes one finds in reality are necessarily finite. What is your opinion?

When forming your answer to this question you may give an example of a situation from your own field of interest in which conclusions of an abstract mathematical theory are used in order to solve a practical problem. Identify the merits and weaknesses of the application of the mathematical theory.

For example, in making statistical inference one frequently needs to make statements regarding the sampling distribution of the sample average. For instant, one may want to identify the central region that contains 95% of the distribution. The Normal distribution is used in the computation. The justification is the Central Limit Theorem.

### Summary of Formulas

- Expectation of the sample average:
\(\Expec(\bar X) = \Expec(X)\)

- Variance of the sample average:
\(\Var(\bar X) = \Var(X)/n\)

Running this simulation, and similar simulations of the same nature that will be considered in the sequel, demands more of the computer’s resources than the examples that were considered up until now. Beware that running times may be long and, depending on the strength of your computer and your patience, too long. You may save time by running less iterations, replacing, say, “

`10^5`

” by “`10^4`

”. The results of the simulation will be less accurate, but will still be meaningful.↩Mathematically speaking, the Binomial distribution is only an approximation to the sampling distribution of \(X\). Actually, the Binomial is an exact description to the distribution only in the case where each subject has the chance be represented in the sample more than once. However, only when the size of the sample is comparable to the size of the population would the Binomial distribution fail to be an adequate approximation to the sampling distribution.↩