Table of Contents

## Estimation of Proportions Using Interval

If several samples of multiple sizes are taken, it **helps in the ****computation of the proportion of successes in each sample**. In this case, there are several different answers which will be recorded from these different proportions. For this reason, sampling distributions are calculated. If it is to find out which one is the correct answer, the correct answer would be none of them.

The estimate of proportions using intervals is used to make a **measure of results for an entire population using a range of reasonable values for population proportion**. It is assumed that the intervals capture the true value of the population parameter. This interval is known as the confidence interval, which helps in measuring a suitable and average value extracted from multiple samples of a whole population. The use of intervals help in providing multiple values to estimate the population proportion.

Confidence intervals give more details compared to point estimates. A point estimate is limited to its usefulness since it does not reveal the uncertainty associated with the estimate, i.e. it is not clear how far this sample mean may be from the population mean.

**Example**: Suppose we want to measure the proportion of undergraduates who are permanently residing in southern Mexico. We take a random sample of 25 students and compute the sample proportion. Suppose it comes out that 9 out of 25 students chosen are undergraduate and belong to southern Mexico. The population proportion comes out as follows:

ƥ = 9/25

ƥ = 0.36

Due to the “Luck of the draw” factor, we are sure that the true population proportion is not exactly 0.36. If we want to find out an interval of a whole population, which draws the sample proportion probability of at least 95% confidence, we need to calculate it using the standard deviation formula.

ƥ = √(ƥ(1-ƥ))/n = 0.95

## Creation of a Confidence Interval

There are **four basic steps involved in constructing a confidence interval** given as follows:

**The first step**involves the**identification of sample statistics**. The sample statistic i.e. sample mean, the sample proportion will be used to estimate the population parameter.**The second step**involves a**selection of a confidence level**. The confidence level helps in the identification of the uncertainty level in a sampling method. Normally, 90%, 95% or 99% confidence interval is used to find out accurate results.**The third step**is to**calculate the margin of error**. The margin of error has to be calculated using the following equation:

**Margin of error = Critical value * Standard deviation of statistic**

**The fourth**and last step is to**specify the confidence interval**. The uncertainty is interlinked with the confidence level by defining the range of confidence interval by the following equation:

**Confidence interval = Sample statistic + Margin of error**

A confidence interval is calculated by using a sampling distribution data of sample proportion ƥ.

**Example**: In case we have selected 100 people who believe in the process of evolution. We currently don’t know the population proportion of the people who believe in the process of evolution. Suppose, if 47 out of 100 people say that they believe in evolution, then we get to know that our sample proportion is ƥ = 0.47. In order to find out the confidence interval for the sample proportion, we require the sampling distribution of ƥ.

Here, the problem is that the sampling distribution depends on ƥ and it is missing. First of all, we have to find out the standard deviation of ƥ. To find out the standard deviation of ƥ, we need to use the same formula:

= √(ƥ(1-ƥ))/n

= √(0.47(1-0.47)/100

= 0.0499

## Constructing the Interval

In order to construct an interval, we need to **follow the four steps** specified above.

**Example**: Suppose we need to calculate the average weight of an adult male in Baltimore, Maryland. We draw a random sample of 1,000 men from a total population of 1,000,000 men. Suppose the weighted average of samples chosen comes out at 180 pounds. Here, the standard deviation of the whole population comes out at 30 pounds.

We are required to find out confidence interval. We will calculate the confidence interval using the four steps mentioned above.

- We have identified the mean sample statistic as 180 pounds for the sample statistic.
- The confidence level selected for this case is 95%.
- The margin of error is calculated using the standard error of the mean as follows:

Standard error (SE) = s / sqrt (n) = 30 / sqrt (1000)

= 0.95

The critical value is a factor which is used to compute the margin of error here. The t score (t*) is calculated as follows:

Alpha (α): α = 1 – (confidence level / 100) = 0.05

Critical probability (p*): p* = 1 – α/2 = 1 – 0.05/2 = 0.975

Degrees of freedom (df): df = n – 1 = 1000 – 1 = 999

The critical value is the t statistics with 999 degrees of freedom and a 0.975 cumulative frequency. Using the t distribution calculator, the critical value comes out 1.96.

The margin of error is computed using the above given equation.

**Margin of error = Critical value * Standard deviation of statistic**

= 1.96 * 0.95

= 1.86

The confidence interval is specified by **sample statistic + margin of error**. Here, the uncertainty is denoted by the confidence level. We have already chosen the confidence interval of 95% which is 180 + 1.86.

## Meaning of Confidence

The confidence level is the **probability that the targeted interval actually contains the target quantity**. In case we refer to 95% confidence interval, it does not refer to 95% chance that the interval contains ƥ. The population proportion is either inside an interval or outside of it. The population proportion is a fixed quantity. If we show a confidence interval of 95%, it refers that 95% of samples in the population produce the true proportion.

It can also be stated like this: “We are 95% confident that the true proportion lies in the interval from which samples have been selected”. Here, the uncertainty refers to the probability that the selected sample belongs to the 95% of the samples of chosen interval and to the 5% that does not relate to the confidence interval.

## Margin of Error

The **margin of error **is the range of values below and above the sample statistic in a confidence interval. It represents the **amount of sampling error in a survey result**. The margin of error and confidence intervals are directly proportional to each other. If the margin of error is higher, the confidence on the sampling results is higher to show that it is close to true figures.

The **margin ****of error is half of the width of the interval**. It refers to the extent on each side of the sample proportion. For a higher level of confidence, a higher level of the margin of error is required. **Confidence and precision are inversely proportional to each other**. The higher level of confidence refers to less precision.

**Example**: The common example of the margin of error is a group of people divided into two classes. One of them prefer product A, whereas the second group prefer product B. For measuring a global margin of error which refers to the percentages from the whole population of people liking product A and B.

In case a statistic is a percentage, the maximum margin of error can be calculated using the radius of the confidence interval for 50% of the reported percentage. **Here, the margin of error is referred as absolute quantity which is equal to confidence interval radius for the statistics**. In case the true value is 50%, the confidence interval radius is 5% points, and the margin of error is 5% points.

## Critical Values

The **critical value** is a factor used to compute the margin of error. It is a point on the test distribution that is compared to the test statistic to determine whether to reject the null hypothesis.

It is the **value which splits the probability of availability or rejection region which include or exclude the targeted value in an interval**. Under the normal standard model, the critical value depicts the Z term linked with the central region. The critical value is used in case the sampling distribution is normal or close to normal. If the absolute value of your test statistic is greater than the critical value, you can declare statistical significance and reject the null hypothesis.

The critical value of Z-score is used in case the sampling distribution is close to normal. To change the confidence level, it is required to change the number of standard errors in order to extend the interval away from the sample proportion. There are **three main types of critical values including t score, Chi square and z-tests**. This number of standard error is referred to as the critical value. The critical value is calculated using the Z-table to find out z*.

The Z-table shows a value of 95% confidence interval z* = 1.96 and for 90% confidence interval = 1.645.

### Using the Normal distribution

The critical value can be **calculated using normal distribution**. The critical value at 95% confidence interval refers to two values which lie between 95%. It shows that out of the total 5%, 2.5% lies on each tail. In the Z-table, we look up for the probability of 0.4750 as 0 ≤ ≤ ∗ ≥ ∗ = 0.025, 0 ≤ ≤ ∗) = 0.4750. The value of 0.4750 in Z- table is = 1.96. Here, the cut-offs of the critical values for 95% confidence interval are 1.96 and -1.96 on both sides.

The graph of 90% of the confidence interval has been shown below:

The cut-offs in the case of the confidence interval 90 % have been shown above * = 1.645. The cut-offs, in this case, have a given value of -1.645 and 1.645 on both sides.

## The trade-off between Confidence and Precision

There is a close relationship between confidence and precision, but it is important to note that these terms are not direct complements of each other. The safest way to enhance the confidence level on the chosen sample for inclusion of the targeted value is to **choose a larger sample size**. It helps in improving the precision level as well. High levels of confidence can be achieved with wider intervals, while narrower or more precise intervals permit less confidence; thus, at a constant state, there is a trade-off between precision and confidence.

Most importantly, any statement of precision without a corresponding confidence level is termed incomplete and impossible to interpret.

The margin of error is calculated using the same formula, i.e. √(ƥ(1-ƥ))/n . For that purpose, we are required to make the margin of error larger enough to ensure the confidence interval accuracy. If we take the value of sampling proportion ƥ = 0.50, the desired sample size is calculated this way:

n = ((z* +(0.5))/ME)^2

The sample size is needed to obtain a margin of error of 0.03 for a 95% confidence interval from the whole population of people who believe in the evolution process. If it is assumed that no sample has been taken yet, the calculations are shown as follows:

Z* = 1.96

ME = 0.30

n = ((1.96 +(0.5))/((0.03)))^2

n = 1,067.11

It is not possible to choose people in a rounded figure, so it will be chosen for the nearest whole number, i.e., 1,068 samples from the whole population.

**Absolute precision**is often applied when estimating quantities such as population proportions, which exists in the form of percentages.

The standard error has the similar physical units as the estimator; hence, the absolute precision possess is the same physical units as the estimation target.

**Relative precision**is always unit-free and expressed as a percentage.

## Caution

The following points should be considered carefully in hypothesis testing:

- The population proportion in a fixed quantity does not vary.
- Different sample results in each interval don’t match with each other. They have different values; hence, each sample shows different results.
- It is not possible to be certain about a parameter, one can only be confident to a specific extent.
- The complete process revolves around estimating the population proportion instead of the sample proportion.
- It should not be stated confidently that more about the results is known than the interval actually tells.
- All intervals of samples chosen are treated equally. The values near the center of the interval are not more plausible than the values near the edges.
- The researchers should be beware of the margin of error that is too large to be useful. The margin of error between 10%—90% is not overly useful.
- The possibility of biased sampling should be taken into account.
- Consider the trials being independent.