{{ 'ml-label-loading-course' | message }}
{{ toc.name }}
{{ toc.signature }}
{{ tocHeader }} {{ 'ml-btn-view-details' | message }}
{{ tocSubheader }}
{{ 'ml-toc-proceed-mlc' | message }}
{{ 'ml-toc-proceed-tbs' | message }}
Lesson
Exercises
Recommended
Tests
An error ocurred, try again later!
Chapter {{ article.chapter.number }}
{{ article.number }}. 

{{ article.displayTitle }}

{{ article.intro.summary }}
Show less Show more expand_more
{{ ability.description }} {{ ability.displayTitle }}
Lesson Settings & Tools
{{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }}
{{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }}
{{ 'ml-lesson-time-estimation' | message }}
When collecting real-life data, there are many cases where most of the data values cluster close to the mean of the set. Consider, for example, men's shoe sizes.
Histogram symmetric about the mean shoe size, 9.

Most of the data is grouped next to the mean value, which is Therefore, a man who wears a size shoe is more likely to be randomly selected than a man who wears a size shoe. When a data set is distributed this way and the domain of the distribution is continuous — not discrete — it is said that the data is normally distributed. This lesson explores this distribution.

Catch-Up and Review

Here are a few recommended readings before getting started with this lesson.

Challenge

Finding Probabilities of Data Normally Distributed

Kevin has a summer internship at a tech company in his town. The daily number of calls that the company receives is normally distributed with a mean of calls and a standard deviation of calls. The graph represents the distribution of the data.

Normal curve, bell-shaped, symmetric about the mean. In the horizontal axis are placed the numbers 1790, 1940, 2090, 2240, 2390, 2540, and 2690.

Looking to make improvements in the company, Kevin's boss is interested in knowing the answers to the next couple of questions.

a What is the probability that more than calls are received on a random day?
b What is the probability that between and calls are received on a random day? Round the answer to two decimal places.
Discussion

Normal Distributions and the Empirical Rule

When dealing with probability distributions, there is one type that stands out above the rest because it is very common in different real-life scenarios like people's heights, shoe sizes, birth weights, average grades, IQ levels, and many qualities. Because of this regularity, this type of distribution is called the normal distribution.

Concept

Normal Distribution

A normal distribution is a type of probability distribution where the mean, the median, and the mode are all equal to each other. The graph that represents a normal distribution is called a normal curve and it is a continuous, bell-shaped curve that is symmetric with respect to the mean of the data set.

A normal curve with mean 950 and standard deviation 25.

This type of distribution is the most common continuous probability distribution that can be observed in real life. When a normal distribution has a mean of and standard deviation of it is called a standard normal distribution.

Normal Curve with mean 0 and standard deviation 1
The total area under the normal curve is or Because of this, the area under the normal curve in a certain interval represents the percentage of data within that interval or the probability of randomly selecting a value that belongs to that interval. The Empirical Rule can be used to determine the area under the normal curve at specific intervals. It is also worth noting that not all data sets are normally distributed. If the mean and median are not equal, then the data set is skewed.
Concept

Empirical Rule

In statistics, the Empirical Rule, also known as the rule, is a shorthand used to remember the percentage of values that lie within certain intervals in a normal distribution. The rule states the following three facts.

  • About of the values lie within one standard deviation of the mean.
  • About of the values lie within two standard deviations of the mean.
  • About of the values lie within three standard deviations of the mean.
These three facts can be confirmed by observing the area under the normal curve that corresponds to a normal distribution with mean and standard deviation
empirical rule button graph
According to this rule, almost all the values observed lie within three standard deviations of the mean. For this reason, the rule is also called the three-sigma rule. It is worth noting that these facts were observed based on empirical evidence, which is why it is called the Empirical Rule.
Example

Distribution of Heights

In his spare time, Kevin works with the Less Chat, More Talk campaign to encourage people to share with their loved ones in person instead of through screens. He wants to give away T-shirts with a cool logo outside a shopping mall to help spread this message.

Two T-shirts with the logo of the campaign

Kevin is in charge of preparing the men's T-shirts, but he does not know how many of each size he should order. To figure it out, he searched the City Hall website and he found that the heights of the men in the city are normally distributed with a mean of centimeters and a standard deviation of centimeters. Along with this information, there was also a graph.

Normal Curve the axis labels
a What is the range of the heights that represent the middle of the distribution? Write the answer as a strict compound inequality.
b What percent of the surveyed men are shorter than centimeters?
c If men participated in the survey, how many of them are between and centimeters tall?

Hint

a Use the Empirical Rule to determine the corresponding percentage.
b Use the Empirical Rule.
c Use the Empirical Rule to find the percent. Then, multiply it by the total number of men surveyed.

Solution

a According to the Empirical Rule, the middle of the data in a normal distribution falls in the range that starts one standard deviation to the left of the mean and ends one standard deviation to the right of the mean.
According to the website, the mean is and the standard deviation is Therefore, and
Consequently, the middle of the distribution represents the range of heights from to centimeters.
Normal Curve labels and middle range shaded
b Start by highlighting the corresponding interval in the graph.
Normal Curve with region below 173 shaded
Now, notice that centimeters is standard deviations to the left of the mean.
According to the Empirical Rule, of the data fall between and It is known that the value of is Calculate the value of
Therefore, of the data fall between and This implies that of the data fall outside this range.
Normal Curve. The 95% of data fall between 173 and 193. The remaining 5% fall below 173 and above 193.

Due to the symmetry of the normal curve, of the data fall to the left of and of the data fall to the right of Consequently, of the men surveyed are shorter than centimeters.

c The graph below shows the percentages represented by each interval according to the Empirical Rule.
Normal Curve with all the percentages

According to the graph, of the surveyed men are between and centimeters tall.

Normal Curve. Interval from 188 to 193 shaded. The percentage is 13.5%.
To find the number of men that belong to this range, multiply the corresponding percentage by the total number of men that participated in the survey.
Multiply
Therefore, of the men surveyed, about are between and centimeters tall.
Discussion

Drawing a Normal Curve

Given a normal distribution, it can be drawn by hand. For example, consider a normally distributed data set with a mean and standard deviation
Such distribution can be drawn following the next three steps.
1
Place the Mean on a Horizontal Axis
expand_more

First, draw a horizontal axis and mark the mean of the data in the middle. In this case, the mean is

An axis with a 10 in the middle
2
Find and Add More Labels
expand_more

Find more labels to write on the axis such that each interval is one standard deviation long. In this case, the intervals must be units long. To accomplish this, add and subtract multiples of the standard deviation to and from the mean.

Labels to the Left of the Mean Labels to the Right of the Mean

Adding three labels to each side of the mean is enough.

An axis with the labels 4,6,8,10,12,14,16
3
Draw the Normal Curve
expand_more

Lastly, draw a bell-shaped curve with its peak at the mean. Remember, the curve is symmetric with respect to the mean. In this case, the peak occurs at

Normal curve
Example

Graphing a Normal Distribution

While reading some statistics about the people in the city, Kevin was surprised to learn that the weights of newborns are also normally distributed. He found the following information given by the local hospital.

Mean weight: 7 lbs; One standard deviation below the mean: 6.3 lbs; One standard deviation above the mean: 7.7 lbs.
a Graph the normal distribution labeling all the intervals and percentages.
b What percent of the newborn babies weigh pounds or more?

Answer

a
Normal curve with the percentages of each interval labeled.
b

Hint

a Start by drawing the axis and placing the mean in the middle. Determine the standard deviation. Write labels so that the length of each interval is standard deviation. Then, draw the normal curve and use the Empirical Rule to label the percents.
b Identify in the graph from Part A. Shade the region to the right of pounds and add the corresponding percentages.

Solution

a To graph a normal distribution, draw a horizontal axis and place the mean of the data in the middle. According to the given information, the mean weight is pounds.
An axis with a 7 in the middle
The picture from the hospital shows that pounds represents standard deviation below the mean and that pounds is standard deviation above the mean. With this information, the standard deviation can be found.
The standard deviation of the weights of the babies is pounds, so on the axis, write labels to the left and right of the mean such that each interval is units long.
An axis with the labels 4.9, 5.6, 6.3, 7, 7.7, 8.4, 9.1

Next, draw the normal curve — a bell-shaped curve that is symmetric with respect to the mean, where it has its peak.

Normal curve

According to the Empirical Rule, the percentages below the curve are distributed as follows.

  • About of the data fall between and pounds.
  • About of the data fall between and pounds.
  • About of the data fall between and pounds.
Normal curve with the percentages 68, 95, 99.7 labeled.

The percentages in every interval can be labeled by using the symmetry of the curve. This will complete the diagram of the distribution.

Normal curve with the percentages of each interval labeled.
b The percent of newborn babies that weigh pounds or more corresponds to the region below the normal curve that is to the right of Therefore, to calculate the percent of newborns that weigh pounds or more, highlight this part of the graph.
Normal curve with the percentages of each interval labeled.
The desired percentage is the sum of the individual percentages.
Consequently, of the newborn babies weigh pounds or more.
Discussion

The Standard Normal Table and Scores

The height of people is usually normally distributed. For example, the average height of a woman in the United States is about centimeters. Assuming a standard deviation of centimeters, the graph of this distribution looks as follows.

Normal Curve

The Empirical Rule is used to determine the percentage of data that falls between any two labels on the axis. However, what about if the endpoints of the interval are different from the labels? For example, what is the percentage of women that are shorter than centimeters?

Normal Curve. Region below 166 shaded.

To find such a percentage, the first step is converting the data value into its corresponding score.

Concept

-Score

The score, also known as the value, represents the number of standard deviations that a given value is from the mean of a data set. The following formula can be used to convert any value into its corresponding score.

Here, represents the mean and the standard deviation of the distribution. The scores can be used to standardize a normal distribution. Then, for a random value of a standard normal distribution, the Standard Normal Table can be used to determine the corresponding area under the curve.
Using the previous formula, the value can be converted into its score. In this case, and
This means that the value is standard deviations to the right of Once the corresponding score is known, the area below the curve that is to the left of this value can be found using a standard normal table.
Method

Finding the Area to the Left of a Score

Consider a standard normal distribution and a randomly chosen score. The area below the normal curve that is to the left of this score can be calculated using a standard normal table. For example, consider

Standard Normal Distribution
The percentage of data that is less than or equal to can be determined following three steps.
1
Locate the Whole Part of the Score
expand_more

In the left column of the standard normal table, locate the whole part of the score. Since is positive, look at the four bottom rows. Because the whole part of is shade the fifth row.

The probability that corresponds to a score for which the integer part is appears in the shaded row.

2
Locate the Decimal Part of the Score
expand_more

In the top row of the standard normal table, locate the decimal part of the score. Here, the decimal part is Consequently, shade the seventh column.

3
Identify the Intersecting Cell
expand_more
The shaded row and column intersect at Therefore, the percentage of data that is less than or equal to is This means that the probability that a value chosen at random is less than or equal to is
Keep in mind that the table used here works only for computing the probability that a data value is less than or equal to a specific score with only one decimal place. However, other standard normal tables give the probability of a value being greater than a specific score, and there are also tables containing scores with more than one decimal place.

Extra

Finding Other Areas

Other areas can also be found using the same standard normal table.

Area Between Two Scores

To find the area below the normal curve and between two scores, subtract the area to the left of the smaller score from the area to the left of the greater score.

Area to the Right of a Score

The area to the right of a score is the complement of the area to the left of the same score.

Since the area under the normal curve represents a probability, by the Complement Rule, these two probabilities add up to
Therefore, the area to the right of a score is the difference of and the area to the left of the score.

According to the standard normal table, the probability that a randomly selected value is less than or equal to is Therefore, about of women are shorter than or equal to centimeters.

Nine women standing next to each other. Eight of them are shorter than 166cm.
Example

Commuting Times

Kevin has become a stats fan. He has recorded the time it takes him to commute to his internship over the past few days. He observes that the times are normally distributed with a mean of minutes and a standard deviation of minutes.

Landscape with a road and a city in the background

Find the following probabilities and write them in decimal form rounded to two decimal places.

a What is the probability that Kevin's commute tomorrow will take less than minutes?
b What is the probability that Kevin's commute will take between and minutes next Monday?
c Kevin starts work every day at One day he leaves his house at What is the probability that Kevin will be late for work this day?

Hint

a Draw the normal distribution curve. If is not one of the labels in the axis, then convert it into a score. Use a standard normal table to find the probability that a random value is less than the corresponding score.
b Convert each value to its corresponding score. To find the desired probability, subtract the probability that a random value is less than the smallest score from the probability that a random value is less than the largest score.
c How much time does Kevin have on this day to get from his house to work on time? Convert that time into a score. The probability of Kevin being late is equal to the probability a random value being greater than that score.

Solution

a Start by drawing the normal distribution curve. According to Kevin, the mean time it takes him to get to work is minutes, so this value should be located in the middle of the axis. The standard deviation is minutes. The labels of the axis are found by adding and subtracting integer multiples of the standard deviation to and from the mean.
Normal Curve

The probability that Kevin spends less than minutes getting to work tomorrow is represented by the area below the curve that is to the left of

Normal Curve
Since is not a label on the axis, the Empirical Rule cannot be used. Therefore, to find the area, first convert into its corresponding score.
Evaluate right-hand side
Now, in the standard normal table, locate the row corresponding to and the column corresponding to

According to the table, the probability that tomorrow Kevin will spend less than minutes traveling to work is about

b In the graph from Part A it can be seen that neither nor are labels on the axis.
Normal Curve

Therefore, both values will need to be converted into their corresponding scores first. Recall that and

value Substitute Simplify
The shaded area represents the probability that a randomly selected value is greater than and smaller than In other words, the shaded area represents
Each of these probabilities can be found using the standard normal table.
Finally, to find the probability that Kevin will arrive within this time frame, calculate their difference and round the answer to two decimal places.
The probability that Kevin's commute will take between and minutes next Monday is about
c In order for Kevin to be on time, his commute cannot take more than minutes. In other words, he will be late for work if it takes more than minutes. This means that the probability that Kevin will be late for work that day is represented by the area below the normal curve that is to the right of
Normal Curve

Since is not a label on the axis, the Empirical Rule cannot be used. Therefore, scores must be used to find the area. In Part B it was determined the score that corresponds to is

Probability of Kevin Being Late Probability of Kevin Being on Time
Since the event of Kevin being late is the complement of the event of Kevin being on time, the sum of these probabilities is equal to
It was also determined in Part B that is Substitute this value into the equation above and solve for
The probability that Kevin will be late for work on that day is about
Example

Testing a Prototype

The company Kevin is interning with plans to release a new smartphone. He goes with the research team to a stadium with a prototype to let different people use the phone in order to determine what features and design people like.

A crowd going out from a stadium

After comparing and contrasting size preference with the ages of the participants, Kevin realizes that the data is normally distributed. Additionally, he notices that the middle of participants prefer a larger phone.

A standard normal curve with the middle 46% shaded
a Find the scores that correspond to the limits of the ages of the middle of people, those that prefer a larger phone. Write the limits from least to greatest, rounded to one decimal place.
b The mean age of the participants was years old and the standard deviation With this information, determine the range of the ages that represent the middle of the distribution. Write the answer as a strict compound inequality.

Hint

a Find the area that is to the left of the middle Use a standard normal table to find the score that produces that area. Because a normal distribution is symmetrical, the upper bound is the opposite of the lower bound.
b Convert the scores found in Part A into their original values.

Solution

a Let and be the lower and upper limits of the middle of the data. To find the corresponding values, start by finding the percentage of data outside the middle area. To do so, subtract from
The standard normal curve with the middle 46% shaded. The outer areas are also shaded and represent 54% of the data.

Due to the symmetry of the normal curve, the area to the left of is equal to the area to the right of Therefore, each portion corresponds to of the data. For the moment, focus on the area to the left of

The standard normal curve with the lower 27% shaded.

According to the last graph, the probability that a randomly chosen value is less than is In other words, Now, look for the value that produces a probability of on a standard normal table.

It is seen in the table that Again, due to symmetry, is the opposite of Therefore,

Standard normal distribution with the middle 46% shaded. The limits of this area are -0.6 and 0.6.

Therefore, the limits of the middle of the data are and

b In Part A it was determined that the limits of the middle of the data are and
Standard normal distribution with the middle 46% shaded. The limits of this area are -0.6 and 0.6.
These scores can be converted back into their original age values in order to determine the range of ages that this area represents. To do so, start by rearranging the score formula to solve for
Solve for
Kevin noted that the mean age is and the standard deviation is To find the first value of substitute these values and into the equation and simplify.
Substitute values and evaluate
The age corresponding to is Now substitute to find the second age.
Substitute values and evaluate
The age corresponding to is Therefore, the middle of the data corresponds to people between and years old.
The original normal distribution with the middle 46% of data shaded. The area goes from 16 to 22.
Based on the Kevin's data, people between the ages of and prefer a larger phone.
Discussion

Standardizing a Normal Distribution

One interesting property of a normal distribution is that it can take any value as its mean and any non-negative value as its standard deviation. Because of this, comparing two normally distributed data sets has to be done carefully. Otherwise, erroneous conclusions can be made.
Two normal curves. Their means and standard deviations can be set by moving a point
Additionally, the probability of an event happening is the area below the curve. Since there are infinitely many possible curves, the process for finding a certain probability changes between different normal distributions. However, through the use of scores, any normal distribution can be standardized, allowing the use of a standard normal table to find any probability.
Method

Standardization of Normal Distribution

Any normal distribution with mean and standard deviation can be converted into a standard normal distribution. For example, consider a normal distribution with and To standardize the distribution, all its values have to be converted into their corresponding scores.

Normal Curve with mean 35 and standard deviation 1.22
Since the domain is continuous, the conversion cannot be manually done for all the values. However, for illustrative purposes, it will be performed for the data set Two steps will be followed.
1
Subtract the Mean From Each Data Value
expand_more

First, shift all the values so that the mean of the new set is To do this, subtract the mean from each data value.

Notice that translating the values will not changed the standard deviation. The standard deviation of the new data set is still

Normal Curve with mean 0 and standard deviation 1.22

The initial data set has been converted into

2
Divide the Results by the Standard Deviation
expand_more

To obtain a data set with a standard deviation of divide the values obtained in the previous step by the standard deviation of the set.

Score

After the standardization, the new data set is Here, the mean is and the standard deviation

Standard Normal Distribution

Notice that the resulting curve has a similar shape and distribution of data values as the original.

This process is called standardization and allows objective comparison of data sets that are normally distributed but have different means and standard deviations. Furthermore, it makes it possible to use the same table — the Standard Normal Table — to calculate the probability of any normal distribution.

Extra

Graphic Illustration
The applet shows the changes of a normal distribution as it is standardized.
Standardizing a normal distribution with mean 1.7 and standard deviation 1.8
Example

Comparing Performances

Kevin's friend LaShay took the SAT and scored points on the math section. Kevin took the ACT and scored points in the math section.

Two Tests with their scores written in the top-right corner

Since these tests use different scales — the math section of the SAT scores points while the math section of the ACT scores points — they wonder who did better. They looked at the stats for each test to find out.

a Compared to their corresponding classmates, who stood out more, Kevin or LaShay?
b Kevin took the ACT with people, including himself. How many people scored higher than Kevin?
c What is the probability that a randomly chosen classmate of LaShay's has scored less than or equal to her on the SAT math section? Do not round the answer.
d The university where LaShay wants to study will accept only the top math scores. If LaShay took the SAT with people, including herself, will she be accepted?

Hint

a Use the provided mean and standard deviation to convert each score to a score. The person with the higher positive score stood out the most in their class.
b Use the score found in Part A and the standard normal table to find the percentage of people who scored lower than Kevin. Then, apply the Complement Rule. Multiply the percentage by the total number of people who took the test.
c Use the score found in Part A and the standard normal table.
d Use the probability found in Part C and the Complement Rule to determine how many people scored higher than LaShay. Are there more than people?

Solution

a Since the scores of both tests are normally distributed, to determine who did better, graph both normal distributions. According to the stats Kevin and LaShay found, the mean of the SAT is and the standard deviation is