{{ 'ml-label-loading-course' | message }}
{{ toc.name }}
{{ toc.signature }}
{{ tocHeader }} {{ 'ml-btn-view-details' | message }}
{{ tocSubheader }}
{{ 'ml-toc-proceed-mlc' | message }}
{{ 'ml-toc-proceed-tbs' | message }}
Lesson
Exercises
Recommended
Tests
An error ocurred, try again later!
Chapter {{ article.chapter.number }}
{{ article.number }}. 

{{ article.displayTitle }}

{{ article.intro.summary }}
Show less Show more expand_more
{{ ability.description }} {{ ability.displayTitle }}
Lesson Settings & Tools
{{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }}
{{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }}
{{ 'ml-lesson-time-estimation' | message }}
Due to the nature of estimations based on samples, statistics cannot be guaranteed to be true. On the other hand, how can it be determined whether a certain claim about the mean of a population is valid? This is where the two main methods of inferential statistics come in action. These are confidence intervals and hypothesis testing.

Catch-Up and Review

Here are a few recommended readings before getting started with this lesson. Background to help understand Probability

Challenge

Has the Average Age of People Eating at a Restaurant Changed?

Mark's father runs a burger restaurant. The mean age of people who visit the restaurant is years old. Mark suspects that this situation has changed during the last year. To investigate whether his suspicions were true, he surveyed customers and found a sample mean of years with a standard deviation of years.

People in the burger restaurant

If he wants to test his results with a significance, help him complete the following questions.

a Select the tests of significance needed to make a hypothesis test.
b Consider the following graphs.
Four different critical regions
Which graph represents the critical region for this hypothesis test?
c What is the value of the sample mean? Round to two decimal places.
d Which statement is more likely true about the mean age of the population of people who eat at the burger restaurant?
Discussion

Using Samples to Make Conclusions About a Population

Inferential statistics uses data from a sample to draw conclusions or test hypotheses about a population. Conclusions made from a sample are almost never accurate but can be thought of as the best guess or most probable answer. One of the main tasks of inferential statistics is to provide a confidence interval.

Concept

Maximum Error of Estimate

The maximum error of estimate, also known as the margin of error, is the maximum difference between the estimate of the population mean and its actual value. The maximum error of estimate is calculated using the following formula.

In this formula, represents the value of a certain confidence level, is the standard deviation of the sample, and is the sample size. From the formula, some conclusions can be made about the error of estimate.

  • Increasing the sample size while the standard deviation remains the same will result in a smaller margin of error.
  • Conversely, an increase in the standard deviation while the sample size remains the same will cause a bigger margin of error.
  • The greater the absolute value of — meaning an increase in the confidence level — the greater the margin of error.

The maximum error of estimate is added to and subtracted from the estimation mean to find the bounds of a confidence interval.

Grafical Representation of the Maximum Error of Estimate in Confidence Intervals
Concept

Confidence Level

A statistic is rarely equal to the population parameter. Due to this uncertainty, estimations are commonly presented as a confidence interval. This is a range of values that the actual parameter is expected to fall within with some degree of certainty. A confidence interval is found by adding and subtracting the maximum error of estimate to and from the statistic, like the sample mean
Confidence interval
The degree of certainty, or the confidence level, is usually presented as a percent value. It refers to the reliability of the analysis to produce accurate intervals. For example, if confidence intervals are produced using different samples of the same size with confidence, then out of intervals are expected to contain the actual mean.

Confidence Level and the Standard Normal Distribution

The confidence level matches the percentage of the area under the standard normal curve around the mean limited by the and values, as shown below.
Confidence level and confidence interval in a Standard Normal Distribution.
For a confidence interval, there is a probability of observing a value outside this area. Because the distribution is symmetric, half of this area will be on each tail of the distribution.

Confidence Interval for the Population Mean

A confidence interval for the population mean can be found by adding and subtracting the maximum error of estimate to and from the sample mean

It is worth noting that increasing the level of confidence results in a wider interval that is more likely to catch the true mean, but it will be less precise because it will cover a greater range of values. This means there is a trade-off between confidence and precision.
Example

Burger Prep Time

Mark's father owns a burger restaurant. He wants to implement changes to improve the customer experience. Recently he found that in a sample of burgers, on average, a burger takes minutes to be cooked and given to the customer, with a standard deviation of minutes.

A delicious burguer combo.
Use this information to calculate the maximum error of estimate with a confidence level. Round the answer to two decimal places.

Hint

Is the sample size greater than

Solution

Consider the formula for the maximum error of estimate.
In this formula, corresponds to the value of a particular confidence level, is the standard deviation of the sample, and is the sample size. To find the maximum error of estimate for this situation, the value will be determined first. Then the formula will be evaluated.

Finding the value

Since the confidence level is this portion of the area around the mean will be covered in a standard normal distribution. The area in the distribution's tails that are not in the confidence interval will be each.

Because the distribution is symmetric, the values limiting this area are opposites, so only one value needs to be found. Additionally, this value is given by the value of the upper or lower tail. One way to determine this value is to use a graphing calculator. Push then and choose the third option, invNorm(.

Graphic Calculator

Next, enter and push to get the value of the lower tail.

Graphic Calculator

The value is approximately and because of the symmetry of the distribution, this means that its additive inverse can be used to evaluate the formula.

Evaluating the Formula

It is worth noting that the formula for the maximum error can be used because the sample size is greater than Recall the standard deviation and Substitute these values into the formula to find the maximum error of estimate.
Therefore, the maximum error of estimate at a confidence level is about
Example

Amount of Soda Poured by a Dispensing Machine

The secret to the success of the burger restaurant is not only the flavor of the meat but also the soda included in the King's Combo. This soda follows a unique brewing process, and a soda dispensing machine fills the bottles that are later sold with the combos.

Soda machine

Mark wants to find the mean volume contained in the bottles that are filled by the dispensing machine. He took a sample of bottles of soda and measured their volumes. He found that the mean volume of the bottles is milliliters with a standard deviation of Which option corresponds to a confidence interval for the population mean of soda volume?

Hint

Begin by calculating the maximum error of estimate. Then add and subtract that from the sample mean to get the bounds of the confidence interval.

Solution

Determine a confidence interval for the population mean of soda volume in order to identify the right option. To do so, follow these steps.

  1. Identify the sample mean.
  2. Calculate the maximum error of estimate.

The mean volume for the sample consisting of sodas was milliliters. The maximum error of estimate will be calculated next.

Calculating the Maximum Error of Estimate

For a sample size greater than or equal to with standard deviation the maximum error of estimate at a confidence level can be calculated by the following formula.
In this formula, corresponds to the value of the confidence level. Since the confidence level is this portion of the area around the mean will be covered in a standard normal distribution. The area in the distribution's tails that are not in the confidence interval will be each.

This value is given by the value of the upper or lower tail. Because the distribution is symmetric, the values limiting this area will be opposites of each other, so only one needs to be found. In this case, a short version of the standard normal table can be used to locate the value of the lower tail, which in decimal form is

The table only contains two values close to Notice that the mean of these two values is close to
This means that the value of can be approximated by finding the mean of the two values, and
Since the values of the distribution are additive inverses, the positive value can be used to evaluate the formula and determine the maximum error of estimate. The formula can be used because the sample size is greater than Recall that the standard deviation is
Simplify right-hand side

Determining the Confidence Interval

The confidence interval for the population mean can now be calculated by adding and subtracting the maximum error of estimate from the sample mean.
Consider the positive and negative cases to determine the bounds of the confidence interval.
Therefore, with level of confidence, it can be said that the population mean is between and
Discussion

Testing an Estimation of a Population Parameter

While a confidence interval helps estimate the value of a population parameter like the mean, there is another inferential method that can help evaluate a specific claim about a population parameter. Before exploring this method, two statistical hypotheses about the population need to be identified. These are the null and alternative hypotheses.

Concept

Null Hypothesis vs. Alternative Hypothesis

The null hypothesis and alternative hypothesis are two mutually exclusive statements about the mean of a population. The null hypothesis, denoted by is a statement of equality or non-strict inequality about the population mean that is accepted as true unless strong evidence is shown against it.

Null Hypothesis

Conversely, the alternative hypothesis, denoted by or is a strict inequality statement that contradicts the null hypothesis. It is the complement of the null hypothesis and will be accepted if there is evidence in its favor.

Alternative Hypothesis

Notice that the initial claim made by the researcher is the one that sets the null and alternative hypotheses. If the claim can be written algebraically as a strict inequality, it will be part of the alternative hypothesis. Otherwise, it will be part of the null hypothesis.

Example

Suppose a school administrator at East Junior High School thinks that the mean grade point average in is greater than Note that this claim can be written as the following inequality.
Because this is a strict inequality, the claim represents the alternative hypothesis, while the null hypothesis is
Example

Determining the Null and Alternative Hypotheses From a Claim

Another characteristic of the King's Combo at Mark's father's restaurant is that customers can choose between a cookie or a soft ice cream as part of their meal. They can also pay more to get a piece of cake.

A poster showing the King's Combo with the options that a customer can add to their order.
a Mark thinks that on a typical day, less than of customers choose the cookie over the ice cream. Which of the following options describe the alternative hypothesis and the null hypothesis of this situation?
b Suppose that Mark thinks that on average, of the customers on a typical day added a piece of cake to their combo. Which of the following options describe the null and the alternative hypothesis?

Hint

a Begin by identifying the claim. Then write the claim as an algebraic expression.
b The null hypothesis is a statement of equality or a non-strict inequality. The alternative hypothesis is a statement of strict inequality or compound inequality that is the complement of the null hypothesis.

Solution

a Begin by identifying the claim and writing it as an algebraic expression. In this case, Mark suspects that the proportion of people that prefer the cookie over the ice cream is less than This can be represented as the following strict inequality.
The claim is that the mean is less than Because it is an strict inequality, it can be an alternative hypothesis. Conversely, the complement of this claim is that the mean is greater than or equal to This is the statement of equality that represents the null hypothesis.
Null Hypothesis Alternative Hypothesis
The mean is greater than or equal to
The mean is less than (claim)
b The null and alternative hypotheses can be identified in this situation by following a similar procedure. In this case, Mark thinks that of customers add a piece of cake to their combo. This claim can be represented by the following equality.
Since this is a statement of equality, it is the null hypothesis. On the other hand, the alternative hypothesis is which is the complement of
Null Hypothesis Alternative Hypothesis
The mean is equal to (claim)
The mean is not equal to
Discussion

Collecting the Tools to Make a Hypothesis Test

Once the null and alternative hypotheses have been correctly identified, they can be tested by performing a hypothesis test to see which statement is more likely true. Before the test can be performed, some information is needed.

Concept

Hypothesis Test

A hypothesis test is an inferential method that uses sample data to examine a claim about the mean of a population. Because the population mean is almost always unknown, it is common to be suspicious about the truthfulness of any assumption about its value. The following are typical claims about the mean of a population.

Typical Claims About the Mean
The mean is equal to a specific value, The mean is greater than a specific value, The mean is less than a specific value,

Before making a hypothesis test, two hypotheses need to be specified, the null hypothesis and the alternative hypothesis. These hypotheses must be mutually exclusive. The null hypothesis is assumed to be true. The hypothesis test puts the null hypothesis on trial to see if there is strong evidence against it. If so, the alternative hypothesis is accepted instead.

  • The hypothesis that is examined.
  • The hypothesis that is accepted if there is strong evidence to reject
Once the two hypotheses are set, sample data needs to be collected and analyzed to either accept or reject the null hypothesis.
Concept

Significance Level

The significance level is the probability that the results obtained in a sample are due to chance and is set in advance when making a hypothesis test. The smaller the value, the stronger the results of a sample are. These are typical values for the significance level.

Typical Significance Levels

In a standard normal distribution, the sample mean would fall around the center of the distribution if the null hypothesis were true. This means that a value in the tails of the distribution would be unusual if were true. The significance level tells how far the sample mean will lie in from the center of the distribution and whether to reject the null hypothesis and accept the alternative hypothesis

critical regions
Concept

Critical Region

The critical region, determined by the significance level is the set of values that will lead to rejecting the null hypothesis In a standard normal distribution, this region is located in the tails of the distribution. The cutoff value of the region is a critical value given by the value of The tests of significance — left, right, or two-tail — determine whether there are one or two critical regions.

Critical Values
Significance Level Left-Tail Test
Two-Tail Test
Right-Tail Test
This table shows the typical significance levels and their corresponding critical values. It is worth noting that the area of the critical region(s) is equal to the significance level
Concept

Tests of Significance

In a hypothesis test, the region where the null hypothesis is rejected is known as the critical region. The location of this region depends on the significance level and the inequality symbol of the alternative hypothesis as determined by the tests of significance. The tests of significance can be divided into the left-tailed test, the two-tailed test, and the right-tailed test.

  • Left-tailed test: The alternative hypothesis suggests that the population mean is less than the value claimed by the null hypothesis,
  • Two-tailed test: The alternative hypothesis claims that the population mean is different from the value claimed by the null hypothesis,
  • Right-tailed test: The alternative hypothesis suggests that the population mean is greater than the value claimed by the null hypothesis

The applet below shows how the critical regions vary depending on the tests of significance.

an applet showing the critical regions according to the tests of significance
Method

Making a Hypothesis Test

When making a hypothesis test, begin by identifying the claim to set the null and alternative hypotheses. Then the critical regions and the critical values are determined based on the tests of significance. Finally, the null hypothesis is rejected if the statistic falls within the critical region. To illustrate this process, consider the following situation.

A company says that each of their packages of ham contains exactly slices.

Suppose that from a sample of packages, a mean of with a standard deviation of was calculated. Use a significance level to make a hypothesis test.
1
Identify the Claim and State the Null and Alternative Hypotheses
expand_more
Identify the claim to see if it relates to the null or the alternative hypotheses. In this case, the company says that the packages contain exactly slices of ham. This is the same as stating that the population mean equals which can be written as follows.
Because this claim is a statement of equality, it should be related to the null hypothesis Conversely, the alternative hypothesis is which is the complement of
Null Hypothesis Alternative Hypothesis
The mean is equal to slices (claim).
The mean is different than slices.
2
Determine the Critical Value(s) and Critical Region(s)
expand_more

Because the sign of the alternative hypothesis is a two-tailed test of significance will be conducted. This means that there are two critical regions whose cutoffs will be given by the value of the significance level The following are the critical values for the most common values.

Critical Values
Significance Level Left-Tail Test
Right-Tail Test
Two-Tail Test

From the table, note that the critical values for a significance level are Now the critical regions and critical values can be labeled.

an applet showing the critical regions for a two tail test
3
Calculate the Statistic
expand_more
The statistic — the value of the sample mean — can be calculated using the following formula.
In this formula, is the sample mean, is the population mean, is the standard deviation of the sample, and is the sample size. For the given example, it is given that and
Evaluate right-hand side
4
Reject or Fail to Reject the Null Hypothesis
expand_more

Next, verify if the statistic falls within the critical region. If so, reject the null hypothesis. To do so, plot the statistic jointly with the critical regions to see where it falls, outside or inside the critical region.

an applet showing the critical regions of a two tail test jointly with a z-statistic that falls within the critical region

Because the statistic falls within the critical region, the null hypothesis is rejected in this case.

5
Make a Conclusion About the Claim
expand_more

Use the result of the previous step to make a conclusion about the initial claim.

A company says that each of their packages of ham contains exactly slices.

In this case, since the initial claim is related to the null hypothesis, it can be said that there is enough evidence to reject the claim that the packages of ham contain exactly slices.

It is worth noting that this test is sometimes referred to as a test because it uses the value and the statistic. However, this is not the only type of test that can be used when evaluating a claim.

Extra

Using a Calculator to Find the Critical Values

The following situations need to be considered when calculating the critical values.

  • The values on the lower half of the distribution will be negative.
  • Conversely, the values on the upper half will be positive.
  • Because the standard normal distribution is symmetric, the values of the lower percent will be opposite of the value of the upper percent

For the given example, each critical region will cover an area of Therefore, the value for the left will be found first. To do so, push then and choose the third option, invNorm(.

Now enter the desired value, which in this case is Finally, push to get the result.

Calculator view showing the z-value for 0.05

The value for the left tail is about so the value for the right tail will be A similar process is followed when performing a one-tail test.

Example

Evaluate the Weight of Chocolate Bars

While watching the Dinos and Dragons movie with his family, Mark decides to eat a bar of his favorite chocolate as a snack. After eating it, he feels slightly disappointed because the bar seemed a little smaller than the grams listed on the packages. He decides to investigate if the brand producing the chocolate bars lied about the weight of the chocolate bars.
Mark measuring the weight of the chocolate bars
To determine if what the package shows is true, Mark weighs a sample of chocolate bars and finds a sample mean of with a standard deviation of He wants to test the affirmation in the packages about the weight of chocolate bars with significance. Help him find the following information to draw a conclusion.
a Which test of significance does Mark need to use to make a hypothesis test based on his sample results?
b Investigate the following graphs.
Four different critical regions
Select the option that represents the critical region for this hypothesis test.
c What is the value of the sample mean? Round to three decimal places.
d After testing his results, which statement is most probably true about the mean weight of the population of chocolate bars?

Hint

a Listing the weight on the package is the same as stating that the mean weight of chocolate bars is
b The critical value is given by the value corresponding to half the significance level.
c Use the formula to calculate the statistic.
d Where does the statistic fall in the distribution?

Solution

a Begin by setting the null and alternative hypotheses. Then, based on the inequality sign of the alternative hypothesis, determine the test of significance. Note that the chocolate bar wrappers claim that the weight of each bar is This is the same as saying that the mean weight of the population of chocolate bars is
Because this is a statement of equality, it is related to the null hypothesis Additionally, the complement of this statement is that the mean is different than This statement corresponds to the alternative hypothesis
Null Hypothesis Alternative Hypothesis
The mean is equal to (claim)
The mean is different than

Because the sign of the alternative hypothesis is a two-tailed test of significance corresponds to this situation.

b In a two-tailed test, there are two critical regions whose critical values are given by the value of half the significance level In a graphing calculator, push then and choose the third option, invNorm(.
Graphic Calculator

Next, given that enter and push to get the result.

Graphic Calculator

This is the critical value corresponding to the critical region on the left of the standard normal distribution. Because the distribution is symmetric, the critical value for the upper tail will be the same but with the opposite sign. With this information, the critical regions can be set in the distribution.

The critical regions displayed in the standard normal distribution.

This corresponds to option A.

c Calculate the statistic using the following formula.
In this formula, is the sample mean, is the population mean, is the standard deviation of the sample, and is the sample size. In this case, and
Evaluate right-hand side
d Now, draw the statistic into the graph of the critical regions to see where it falls. If it lies in the critical region, reject the null hypothesis.
The critical regions with the z-statistic displayed in the standard normal distribution.

Note that the statistic falls outside the critical region. Therefore, the null hypothesis cannot be rejected. This means that there is not enough evidence to reject the claim about the weight of the chocolate bars. So, it is most likely true that the mean weight of the chocolate bars is

Example

Time Teens Spend Playing Sports

After enjoying the Dinos and Dragons movie with his family, Mark and his father start watching sports news. The newscaster reports that, on average, teens spend at most minutes a day playing sports. Mark wants to determine if what the news reported is accurate.

Sport News

Using a sample of teens, Mark calculates a mean of minutes and a standard deviation of minutes. Help Mark if he wants to test the news report with significance.

a Select the test of significance that Mark needs to make a hypothesis test about the results the news reported.
b Look at the following graphs.
Four different critical regions
Which of the given graphs shows the critical region(s) corresponding to this hypothesis test?
c Calculate the value corresponding to the upper of the distribution. Round the answer to three decimal places.
d After analyzing his sample results, what is more likely true about the mean time spent by teens playing sports?