| {{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }} |
| {{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }} |
| {{ 'ml-lesson-time-estimation' | message }} |
Here are a few recommended readings before getting started with this lesson. Background to help understand Probability
Mark's father runs a burger restaurant. The mean age of people who visit the restaurant is 24.3 years old. Mark suspects that this situation has changed during the last year. To investigate whether his suspicions were true, he surveyed 65 customers and found a sample mean of 25.5 years with a standard deviation of 5 years.
If he wants to test his results with a 10% significance, help him complete the following questions.
Inferential statistics uses data from a sample to draw conclusions or test hypotheses about a population. Conclusions made from a sample are almost never 100% accurate but can be thought of as the best guess or most probable answer. One of the main tasks of inferential statistics is to provide a confidence interval.
The maximum error of estimate, also known as the margin of error, is the maximum difference between the estimate of the population mean xˉ and its actual value. The maximum error of estimate E is calculated using the following formula.
E=z⋅ns,n≥30
In this formula, z represents the z-value of a certain confidence level, s is the standard deviation of the sample, and n is the sample size. From the formula, some conclusions can be made about the error of estimate.
The maximum error of estimate is added to and subtracted from the estimation mean xˉ to find the bounds of a confidence interval.
A confidence interval for the population mean can be found by adding and subtracting the maximum error of estimate E to and from the sample mean xˉ.
CI=xˉ±E
Mark's father owns a burger restaurant. He wants to implement changes to improve the customer experience. Recently he found that in a sample of 36 burgers, on average, a burger takes 22 minutes to be cooked and given to the customer, with a standard deviation of 6.2 minutes.
Use this information to calculate the maximum error of estimate with a 90% confidence level. Round the answer to two decimal places.Is the sample size greater than 30?
Since the confidence level c is 90%, this portion of the area around the mean μ will be covered in a standard normal distribution. The area in the distribution's tails that are not in the confidence interval will be (100−90)/2=5% each.
Because the distribution is symmetric, the z-values limiting this area are opposites, so only one value needs to be found. Additionally, this value is given by the z-value of the upper or lower tail. One way to determine this value is to use a graphing calculator. Push 2nd, then VARS, and choose the third option, invNorm(.
Next, enter 0.05 and push ENTER to get the z-value of the lower tail.
The z-value is approximately -1.645, and because of the symmetry of the distribution, this means that its additive inverse 1.645 can be used to evaluate the formula.
Substitute values
a⋅cb=ca⋅b
Multiply
Calculate root
Use a calculator
Round to 2 decimal place(s)
The secret to the success of the burger restaurant is not only the flavor of the meat but also the soda included in the King's Combo. This soda follows a unique brewing process, and a soda dispensing machine fills the bottles that are later sold with the combos.
Mark wants to find the mean volume contained in the bottles that are filled by the dispensing machine. He took a sample of 50 bottles of soda and measured their volumes. He found that the mean volume of the bottles is 330 milliliters with a standard deviation of 10. Which option corresponds to a 99% confidence interval for the population mean μ of soda volume?Begin by calculating the maximum error of estimate. Then add and subtract that from the sample mean to get the bounds of the confidence interval.
Determine a confidence interval for the population mean μ of soda volume in order to identify the right option. To do so, follow these steps.
The mean volume for the sample consisting of 50 sodas was 330 milliliters. The maximum error of estimate will be calculated next.
This value is given by the z-value of the upper or lower tail. Because the distribution is symmetric, the z-values limiting this area will be opposites of each other, so only one needs to be found. In this case, a short version of the standard normal table can be used to locate the z-value of the lower tail, which in decimal form is 0.005.
.0 | .1 | .2 | .3 | .4 | .5 | .6 | .7 | .8 | .9 | |
---|---|---|---|---|---|---|---|---|---|---|
-3 | .00135 | .00097 | .00069 | .00048 | .00034 | .00023 | .00016 | .00011 | .00007 | .00005 |
-2 | .02275 | .01786 | .01390 | .01072 | .00820 | .00621 | .00466 | .00347 | .00256 | .00187 |
-1 | .15866 | .13567 | .11507 | .09680 | .08076 | .06681 | .05480 | .04457 | .03593 | .02872 |
-0 | .50000 | .46017 | .42074 | .38209 | .34458 | .30854 | .27425 | .24196 | .21186 | .18406 |
0 | .50000 | .53983 | .57926 | .61791 | .65542 | .69146 | .72575 | .75804 | .78814 | .81594 |
1 | .84134 | .86433 | .88493 | .90320 | .91924 | .93319 | .94520 | .95543 | .96407 | .97128 |
2 | .97725 | .98214 | .98610 | .98928 | .99180 | .99379 | .99534 | .99653 | .99744 | .99813 |
3 | .99865 | .99903 | .99931 | .99952 | .99966 | .99977 | .99984 | .99989 | .99993 | .99995 |
Substitute values
a⋅cb=ca⋅b
Multiply
Use a calculator
Round to 2 decimal place(s)
CI=xˉ±E | |
---|---|
xˉ−E | xˉ+E |
330−3.60 | 330+3.60 |
326.40 | 333.60 |
While a confidence interval helps estimate the value of a population parameter like the mean, there is another inferential method that can help evaluate a specific claim about a population parameter. Before exploring this method, two statistical hypotheses about the population need to be identified. These are the null and alternative hypotheses.
The null hypothesis and alternative hypothesis are two mutually exclusive statements about the mean of a population. The null hypothesis, denoted by H0, is a statement of equality or non-strict inequality about the population mean that is accepted as true unless strong evidence is shown against it.
H0: Null Hypothesis
Conversely, the alternative hypothesis, denoted by Ha or H1, is a strict inequality statement that contradicts the null hypothesis. It is the complement of the null hypothesis and will be accepted if there is evidence in its favor.
Ha: Alternative Hypothesis
Notice that the initial claim made by the researcher is the one that sets the null and alternative hypotheses. If the claim can be written algebraically as a strict inequality, it will be part of the alternative hypothesis. Otherwise, it will be part of the null hypothesis.
Another characteristic of the King's Combo at Mark's father's restaurant is that customers can choose between a cookie or a soft ice cream as part of their meal. They can also pay $2 more to get a piece of cake.
Null Hypothesis | Alternative Hypothesis |
---|---|
The mean is greater than or equal to 0.60. H0:μ≥0.60 |
The mean is less than 0.60. (claim) Ha:μ<0.60 |
Null Hypothesis | Alternative Hypothesis |
---|---|
The mean is equal to 0.50. (claim) H0:μ=0.50 |
The mean is not equal to 0.50. Ha:μ=0.50 |
Once the null and alternative hypotheses have been correctly identified, they can be tested by performing a hypothesis test to see which statement is more likely true. Before the test can be performed, some information is needed.
A hypothesis test is an inferential method that uses sample data to examine a claim about the mean μ of a population. Because the population mean is almost always unknown, it is common to be suspicious about the truthfulness of any assumption about its value. The following are typical claims about the mean of a population.
Typical Claims About the Mean | ||
---|---|---|
The mean is equal to a specific value, μ=k. | The mean is greater than a specific value, μ>k. | The mean is less than a specific value, μ<k. |
Before making a hypothesis test, two hypotheses need to be specified, the null hypothesis and the alternative hypothesis. These hypotheses must be mutually exclusive. The null hypothesis H0 is assumed to be true. The hypothesis test puts the null hypothesis on trial to see if there is strong evidence against it. If so, the alternative hypothesis Ha is accepted instead.
The significance level α is the probability that the results obtained in a sample are due to chance and is set in advance when making a hypothesis test. The smaller the α value, the stronger the results of a sample are. These are typical values for the significance level.
Typical Significance Levels α | ||
---|---|---|
1% | 5% | 10% |
In a standard normal distribution, the sample mean would fall around the center of the distribution if the null hypothesis H0 were true. This means that a value in the tails of the distribution would be unusual if H0 were true. The significance level tells how far the sample mean will lie in from the center of the distribution and whether to reject the null hypothesis and accept the alternative hypothesis Ha.
The critical region, determined by the significance level α, is the set of values that will lead to rejecting the null hypothesis H0. In a standard normal distribution, this region is located in the tails of the distribution. The cutoff value of the region is a critical value given by the z-value of α. The tests of significance — left, right, or two-tail — determine whether there are one or two critical regions.
Critical Values | |||
---|---|---|---|
Significance Level | Left-Tail Test Ha:μ<k |
Two-Tail Test Ha:μ=k |
Right-Tail Test Ha:μ>k |
α=1% | -2.326 | ±2.576 | 2.326 |
α=5% | -1.645 | ±1.960 | 1.645 |
α=10% | -1.282 | ±1.645 | 1.282 |
In a hypothesis test, the region where the null hypothesis is rejected is known as the critical region. The location of this region depends on the significance level α and the inequality symbol of the alternative hypothesis as determined by the tests of significance. The tests of significance can be divided into the left-tailed test, the two-tailed test, and the right-tailed test.
The applet below shows how the critical regions vary depending on the tests of significance.
When making a hypothesis test, begin by identifying the claim to set the null and alternative hypotheses. Then the critical regions and the critical values are determined based on the tests of significance. Finally, the null hypothesis is rejected if the z-statistic falls within the critical region. To illustrate this process, consider the following situation.
A company says that each of their packages of ham contains exactly 20 slices. |
Null Hypothesis H0 | Alternative Hypothesis Ha |
---|---|
The mean is equal to 20 slices (claim). H0:μ=20 |
The mean is different than 20 slices. Ha:μ=20 |
Because the sign of the alternative hypothesis is =, a two-tailed test of significance will be conducted. This means that there are two critical regions whose cutoffs will be given by the z-value of the significance level α. The following are the critical values for the most common α values.
Critical Values | |||
---|---|---|---|
Significance Level | Left-Tail Test Ha:μ<k |
Right-Tail Test Ha:μ>k |
Two-Tail Test Ha:μ=k |
α=1% | -2.326 | 2.326 | ±2.576 |
α=5% | -1.645 | 1.645 | ±1.960 |
α=10% | -1.282 | 1.282 | ±1.645 |
From the table, note that the critical values for a 10% significance level are ±1.645. Now the critical regions and critical values can be labeled.
Substitute values
Subtract term
b/ca=ba⋅c
Calculate root
Put minus sign in front of fraction
Multiply
Calculate quotient
Next, verify if the z-statistic falls within the critical region. If so, reject the null hypothesis. To do so, plot the z-statistic jointly with the critical regions to see where it falls, outside or inside the critical region.
Because the z-statistic falls within the critical region, the null hypothesis H0 is rejected in this case.
Use the result of the previous step to make a conclusion about the initial claim.
A company says that each of their packages of ham contains exactly 20 slices. |
In this case, since the initial claim is related to the null hypothesis, it can be said that there is enough evidence to reject the claim that the packages of ham contain exactly 20 slices.
The following situations need to be considered when calculating the critical values.
For the given example, each critical region will cover an area of 5%. Therefore, the z-value for the left 0.05 will be found first. To do so, push 2nd, then VARS, and choose the third option, invNorm(.
Now enter the desired value, which in this case is 0.05. Finally, push ENTER to get the result.
The z-value for the left tail is about -1.645, so the z-value for the right tail will be 1.645. A similar process is followed when performing a one-tail test.
Dinos and Dragonsmovie with his family, Mark decides to eat a bar of his favorite chocolate as a snack. After eating it, he feels slightly disappointed because the bar seemed a little smaller than the 150 grams listed on the packages. He decides to investigate if the brand producing the chocolate bars lied about the weight of the chocolate bars.
Null Hypothesis H0 | Alternative Hypothesis Ha |
---|---|
The mean is equal to 150 g. (claim) H0:μ=150 g |
The mean is different than 150 g. Ha:μ=150 g |
Because the sign of the alternative hypothesis is =, a two-tailed test of significance corresponds to this situation.
invNorm(.
Next, given that 20.5=0.025, enter 0.025 and push ENTER to get the result.
This is the critical value corresponding to the critical region on the left of the standard normal distribution. Because the distribution is symmetric, the critical value for the upper tail will be the same but with the opposite sign. With this information, the critical regions can be set in the distribution.
This corresponds to option A.
Substitute values
Subtract term
b/ca=ba⋅c
Use a calculator
Round to 3 decimal place(s)
Note that the z-statistic falls outside the critical region. Therefore, the null hypothesis cannot be rejected. This means that there is not enough evidence to reject the claim about the weight of the chocolate bars. So, it is most likely true that the mean weight of the chocolate bars is 150 g.
After enjoying the Dinos and Dragons
movie with his family, Mark and his father start watching sports news. The newscaster reports that, on average, teens spend at most 59 minutes a day playing sports. Mark wants to determine if what the news reported is accurate.
Using a sample of 35 teens, Mark calculates a mean of 62 minutes and a standard deviation of 6 minutes. Help Mark if he wants to test the news report with 5% significance.