Sign In
| 19 Theory slides |
| 10 Exercises - Grade E - A |
| Each lesson is meant to take 1-2 classroom sessions |
Here are a few recommended readings before getting started with this lesson.
Consider four different situations that involve a data set.
Situation 1 | A tech company recently launched a new smartphone model. They conduct a survey where customers rate their satisfaction with the phone on a scale of 1 to 10. |
---|---|
Situation 2 | A survey is conducted to gather data on the distribution of ages among participants in a community event. |
Situation 3 | A teacher wants to analyze the distribution of test scores in her class by finding the median and quartiles of the scores. |
Situation 4 | A study tracks the temperature variations over the course of a week in a particular city. |
Tadeo bought the latest edition of the scientific magazine Imagineer, which includes a lot of different graphs and math-related discussions on real-life topics. The first article discussed the results of a recent math competition among high school students.
Range: 8
Shape: Symmetric
Step 1 | Determine the center (median) by finding the middle data point. |
---|---|
Step 2 | Find the maximum and minimum values on the graph. Use these values to calculate the spread (range) of the data. |
Step 3 | Analyze the overall shape of the graph. Note any other interest features it may have. |
Complete each step one at a time.
First, find the total number of data points in the set by counting all the dots in the given dot plot.
There are 15 data points, which suggests that there were 15 participants in the math competition that earned a score. The median of the data set is the 8th data point because it divides the set into two even halves. Locate this value on the dot plot.
The 8th data point is a 6, which means that the median, or the center, of the data set is 6. This measure indicates the middle value of all the scores earned by the participants.
The next step is to find the maximum and minimum values on the dot plot.
The final step is to analyze the overall shape of the graph.
Notice that the left and right halves of the dot plot are not exactly identical. However, since they still resemble each other very closely, the plot can be considered symmetric in shape.
Step 1 | Title the plot based on the problem. Draw a number line to begin the dot plot, being sure to use values that are appropriate for the data set. |
---|---|
Step 2 | Determine the frequency of each value. |
Step 3 | Place dots over each number on the number line that corresponds to the frequency for each value in the data set. |
Complete each step one at a time.
Scores From Last Year.Notice that the scores range from 1 to 9.
The next step is to determine the frequency of each score earned by the participants from last year's math competition. Count how many times each score from 1 to 9 appears in the set and write that number in a table next to the score.
Grade | Frequency |
---|---|
1 | 1 |
2 | 0 |
3 | 1 |
4 | 2 |
5 | 0 |
6 | 3 |
7 | 3 |
8 | 2 |
9 | 1 |
Lastly, place dots over each score from 1 to 9 equal to the number of times it appears in the data set. This is where the frequency table is helpful.
The dot plot for the given data set was successfully created.
Dot plots are not the only type of frequency distribution.
A frequency distribution is a representation that displays the number of observations within a given interval or category. It is used to show the empirical or theoretical frequency of occurrence of each possible value in a data set, often recorded in a frequency table. Frequency distributions of categorical data are typically presented using a bar graph.
A special type of frequency distribution is the histogram.
A histogram is a graphical illustration of a frequency distribution of a data set that contains numerical data. Histograms have several defining characteristics.
The next step is to make a frequency table showing how many data points lie in each interval.
Interval | Data Points | Frequency |
---|---|---|
1−10 | 4, 8 | 2 |
11−20 | 11, 11, 13, 15, 17, 19 | 6 |
21−30 | 21, 25, 26 | 3 |
31−40 | 37 | 1 |
The histogram can be constructed by drawing a bar over each interval with a height corresponding to the frequency listed in the table.
Another article in Imagineer focused on the upcoming opening of a new screen room in the local theater Movieton. The theater has had two screens for the last 10 years. The bar graph in the article shows the average number of tickets sold on each day of the week throughout the year 2023.
Dependent Variable: Number of tickets sold
Most Ticket Sales: Saturday with 342 tickets
Step 1 | Identify the independent and dependent variables. |
---|---|
Step 2 | List the frequency in each bar. |
Step 3 | Interpret the data and describe the bar graph's shape. Use the interpretation to answer any questions about the data. |
Complete each step one at a time.
First, the independent and dependent variables need to be identified. The horizontal line lists the days of a week, while the vertical line represents the average number of tickets sold.
The article analyzes how many tickets are sold on average on different days of a week and which day has the most sales. This means that the independent variable is the day of the week and the dependent variable is the average number of tickets sold on that day.
Next, the frequency in each bar should be listed and interpreted. Use the values given in the bar graph to indicate the height of each bar. Remember, the vertical line represents the average number of tickets sold on each day.
Day | Frequency |
---|---|
Monday | 64 |
Tuesday | 70 |
Wednesday | 62 |
Thursday | 137 |
Friday | 295 |
Saturday | 342 |
Sunday | 260 |
Lastly, interpret the data.
Fridays and Saturdays have the highest numbers of average tickets sold, 295 and 342 respectively. The greatest number of tickets tends to be sold on Saturdays, when an average of 342 tickets are sold.
Step 1 | Choose the number of intervals. |
---|---|
Step 2 | Determine the size of the intervals. |
Step 3 | Make a frequency table. |
Step 4 | Draw the histogram. |
Complete each step one at a time.
The next step is to make a frequency table showing how many data points lie in each interval.
Interval | Data Points | Frequency |
---|---|---|
1−15 | 7, 9, 11, 13, 15 | 5 |
16−30 | 17, 18, 19, 20, 21, 22, 23, 24, 24, 24, 25, 26, 27, 28, 30 | 15 |
31−45 | 32, 35, 37, 39, 42, 45 | 6 |
46−60 | 49, 55, 60 | 3 |
61−75 | 70 | 1 |
The histogram can be constructed by drawing a bar over each interval with a height corresponding to the frequency listed in the table.
Some statistical displays show the spread of a data set instead of its frequency.
A box plot, or box and whisker plot, can be used to illustrate the distribution of a data set. A box plot has three parts.
If a data set has outliers, they are marked as separate points to the left and/or right of the whiskers. A box plot is a scaled figure and is usually presented above a number line. The set of numbers used to draw the box plot is called the five-number summary of the data set. Each of the five numbers is labeled below.
One type of statistical display that is often used for illustrating data sets is the box plot.
The first and third quartiles are marked as the left and right sides of the box plot.
The box plot can be completed by drawing a box between the first and third quartiles Q1 and Q3. Then, the left and right borders of the box are connected by horizontal segments to the minimum and maximum.
The box plot is now complete.
The next article in Imagineer focused on an analysis of air quality data collected from different urban and rural areas around the world. It included a box plot to visualize variations in pollution levels.
Maximum: 98
Median: 63.5
First Quartile: 39
Third Quartile: 83
Now consider the given box plot again.
Concept | Value | Meaning |
---|---|---|
Minimum | 24 | The lowest level of pollution in the analyzed areas earned 24 out of 100 points. |
Maximum | 98 | The highest level of pollution in the analyzed areas earned 98 out of 100 points. |
Median | 63.5 | The average pollution score in the analyzed areas is 63.5 out of 100. |
First Quartile | 39 | A quarter of the analyzed areas have a pollution score of 39 or lower. |
Third Quartile | 83 | A quarter of the analyzed areas have a pollution score of 83 or higher. |
Step 1 | Order the data set from least to greatest. Identify the minimum and maximum values. |
---|---|
Step 2 | Determine the median. |
Step 3 | Determine the first and third quartiles. |
Step 4 | Draw the box plot. |
Complete each step one at a time.
Next, mark the median with a vertical line segment inside the range above the number line. Remember that the line for the median falls inside the box.
The first and third quartiles are marked as the left and right sides of the box plot. The box plot can be completed by drawing a box between the quartiles and two horizontal segments between the left and right sides of the box and the minimum and maximum values.
The box plot is complete.
The distribution of a data set shows the arrangement of data values. Here are a few concepts that can be used to describe a distribution.
Concept | Definition |
---|---|
Cluster | Data values that are grouped closely together |
Gap | Numbers that have no data values |
Peak | The most frequently occurring values, or the mode |
Symmetry | How the left side of the distribution looks compared to the right side |
Outlier | A data value that does not seem to fit with the rest of the set |
Consider a distribution displayed with the following dot plot.
Since the data is evenly distributed between the left and right sides, it is a symmetric distribution. It has a cluster of several data values within the 5−9 interval. There are gaps at 4 and 10 because there are no data values at these points on the number line. The value 7 is a peak because it is the most frequently occurring value. There appears to be an outlier at 15.
There are different measures of center and spread available for describing a data distribution. For example, measures of center are mean and median, while measures of spread are interquartile range and mean absolute deviation. Consider the following diagram to determine which measures to use to describe a .
Tadeo was especially interested in an article about Internet usage among teenagers. Curious to learn more, he decided to check out a website referenced in the article.
Choose the appropriate measures to describe the center and spread of the distribution. Describe the shape of the distribution.
Measure of Center: Mean
Measure of Spread: Mean absolute deviation
Measure of Center: Median
Measure of Spread: Interquartile range
The data is fairly evenly distributed between the left and right side, so it is a roughly symmetric distribution. The value 5 is a peak because it is the most frequently occurring value. The data values are clustered around 5.
Next, decide which measures of center and spread are the most appropriate based on the symmetry of the distribution. Remember which measures can be used in each situation.
Measure of Center | Measure of Spread | |
---|---|---|
Symmetric Distribution | Mean | Mean absolute deviation |
Non-Symmetric Distribution | Median | Interquartile range |
Since this distribution is symmetric, it is best to use the mean as the measure of center and the mean absolute deviation as the measure of spread.
Here, the left side of the data is different than the right side, so the distribution is not symmetric. There are two clusters of data values within the intervals 17−20 and 22−24, separated by a gap at 21. The peak of the data set is at 23.
The most appropriate measures of center and spread can be determined by looking at the symmetry of the distribution. When the distribution is not symmetric, as in this case, it is best to use the median as the measure of center and the interquartile range as the measure of spread.
Analyzing data sets can involve more than just considering the frequency of the data values or analyzing their distribution. There is a statistical display that shows the relationship between two values.
A line graph is used to show how a set of data changes with respect to another quantity, often a period of time. To make a line graph, a scale and intervals for the coordinate axes are chosen. The data points are then graphed and a line connecting the points drawn. Consider a table of values that represents the growth of a plant over several weeks.
Plant Growth | |||||
---|---|---|---|---|---|
Week | 1 | 2 | 3 | 4 | 5 |
Height (in.) | 1.5 | 2.3 | 4 | 6.2 | 8 |
The height data includes values from 1.5 to 8, so a scale from 0 to 10 inches with an interval of 1 inch is reasonable. The horizontal axis can represent time in weeks and the vertical axis can represent the plant height in inches. Now the points can be plotted on a coordinate plane and connected with a line.
Hour | Distance Traveled (mi) |
---|---|
1 | 70 |
2 | 135 |
3 | 203 |
4 | 278 |
5 | 348 |
The horizontal axis can represent time in hours and the vertical axis can represent the distance traveled in miles. The distance data includes values from 70 to 348, so a scale from 0 to 420 miles with an interval of 70 miles is reasonable. Now the points can be plotted on a coordinate plane and connected with line segments.
The line graph representing the distance traveled by Tadeo's family is complete.
Notice that the graph shows an upward slant of the line with a steady increase from 1 to 5 hours. To predict about how long the drive to Tadeo's grandparents will take, follow the trend of the line to extend the graph to a distance of 420 miles.
It can be predicted Tadeo's family will reach their destination after a total of about 6 hours of driving.
This lesson has presented several statistical displays that can be used to represent a data set. To determine which display to choose, consider the following table.
Type of Display | Best Used to... |
---|---|
Bar Graph | … show values corresponding to specific categories |
Box Plot | … show measures of spread for a data set |
Dot Plot | … show how many times each value occurs in the set |
Histogram | … show the frequency of data divided into equal intervals |
Line Graph | … show change over a period of time or in respect to a different quantity |
Tadeo's grandparents are 83 and 85 years old. Over the years, they have told Tadeo many fascinating stories about their lives. He began to wonder how his grandparents got to live such long and wonderful lives. Later he started wondering about the life expectancy in different countries, so he did a little investigation.
Country | Life Expectancy |
---|---|
United States | 76.3 |
Japan | 84.5 |
Germany | 80.9 |
Brazil | 77.3 |
China | 78.2 |
India | 68.3 |
Australia | 83.3 |
South Africa | 62.4 |
Type of Display | Best Used to... |
---|---|
Bar Graph | … show values corresponding to specific categories |
Box Plot | … show measures of spread for a data set |
Dot Plot | … show how many times each value occurs in the set |
Histogram | … show the frequency of data divided into equal intervals |
Line Graph | … show change over a period of time or in respect to a different quantity |
The given data set lists countries and their corresponding life expectancies. After careful consideration of these types of displays, a bar graph looks like the best choice because it can show data that corresponds to specific categories — in this case, countries and life expectancies.
Country | Life Expectancy |
---|---|
United States | 76.3 |
Japan | 84.5 |
Germany | 80.9 |
Brazil | 77.3 |
China | 78.2 |
India | 68.3 |
Australia | 83.3 |
South Africa | 62.4 |
The countries can be marked on the horizontal axis and the life expectancy can be marked on the vertical axis. Next, draw bars with heights equal to the life expectancies.
Recall the four situations mentioned at the beginning of the lesson.
Situation 1 | A tech company recently launched a new smartphone model. They conduct a survey where customers rate their satisfaction with the phone on a scale of 1 to 10. |
---|---|
Situation 2 | A survey is conducted to gather data on the distribution of ages among participants in a community event. |
Situation 3 | A teacher wants to analyze the distribution of test scores in her class by finding the median and quartiles of the scores. |
Situation 4 | A study tracks the temperature variations over the course of a week in a particular city. |
In the first situation involves a survey where a group of customers rate their satisfaction with the phone on a scale of 1 to 10. This situation can be visualized with a diagram where the horizontal axis shows ratings from 1 to 10 with dots placed over the numbers corresponding to the frequency of each given rating. This description fits a dot plot.
In the second situation, data on the distribution of ages of participants in a community event is collected. This data can be illustrated by showing the ages in different intervals on the horizontal axis and the number of people in the corresponding age interval on the vertical axis. This matches the description of a histogram.
The third situation involves analyzing the distribution of test scores in a class. The two remaining statistical displays are a box plot and a line graph. Recall what they are best used for.
Type of Display | Best Ysed to... |
---|---|
Box Plot | … measures of spread for a data set |
Line Graph | … show change over a period of time or in respect to a different quantity |
Since the distribution of the test scores is to be illustrated, not the changes in scores over a period of time, a box plot seems like the best-fitting display for the situation.
The last situation includes tracking temperature variations over the course of a week. This data can be demonstrated with a line graph where the horizontal axis represents the days of the week and the vertical axis represents the temperature.
Let's try drawing the dot plot to make it easier to find the correct one. To draw a dot plot, follow these three steps.
Step 1 | Choose a title for the plot and draw a number line that encompasses all values in the data set. |
---|---|
Step 2 | Determine the frequency for each value in the data set. |
Step 3 | Place dots over each number on the number line to correspond to the frequency for each value. |
Let's complete the steps one at a time.
Let's start with the title of the plot. The given data set lists the numbers of songs in the students' favorite playlists, so we can name our dot plot Number of Songs in Playlist.
Next, we draw a number line that covers all the values in the set. Let's rearrange the set so that the values are in sequential order.
27, 36, 38, 38, 39, 39, 39, 40, 40,
40, 40, 42, 42, 42, 45, 47, 47, 50
The values range from 27 to 50, so we will draw a number line with the numbers from 27 to 50.
The next step is to determine the frequency of each number of songs in the playlists. Let's make a table that shows how often each value appears in the data set.
Number of Songs | Frequency |
---|---|
27 | 1 |
36 | 1 |
38 | 2 |
39 | 3 |
40 | 4 |
42 | 3 |
45 | 1 |
47 | 2 |
50 | 1 |
Lastly, place the same number of dots over each number of songs as the number of times it appears in the data set. We can use the counts from the frequency table as a guide.
This plot matches the one in option C, so the correct answer is C.
A group of people was asked how many miles they travel each day to work. Here are the answers they gave.
Distance to Work | |||
---|---|---|---|
35 | 15 | 32 | 50 |
28 | 12 | 42 | 45 |
20 | 25 | 40 | 18 |
48 | 10 | 38 | 30 |
Which histogram fits the data?
Let's try drawing the histogram to make it easier to find the correct one. To draw a histogram, follow these four steps.
Step 1 | Choose the number of intervals. |
---|---|
Step 2 | Determine the size of the intervals. |
Step 3 | Make a frequency table. |
Step 4 | Draw the histogram. |
Let's complete each step one at a time.
The first step is determining how many intervals the histogram will have. Remember that each interval must be the same length and all data points must lie within the intervals. First, count the values in the data set.
Distance to Work | |||
---|---|---|---|
1 35 | 2 15 | 3 32 | 4 50 |
5 28 | 6 12 | 7 42 | 8 45 |
9 20 | 10 25 | 11 40 | 12 18 |
13 48 | 14 10 | 15 38 | 16 30 |
There are 16 data points. Now, to find a suitable number of intervals, we take the square root of the number of data points. Let's find the square root of 16. sqrt(16)=4 Our histogram should have four intervals.
Next, we need to determine the size of the intervals. We can do this by identifying the least and greatest data values in the set. Least:& 10 Greatest:& 50 The lowest data value in the set is 10 and the highest is 50. If we use four intervals with a range of 15, our histogram will cover all numbers from 1 to 60, which will encompass all the data points. The intervals of our histogram will be as follows. 1 - 15, 16 - 30, 31 - 45, 46 - 60
The next step is to make a frequency table showing how many data points belong to each interval.
Interval | Data Points | Frequency |
---|---|---|
1-15 | 10, 12, 15 | 3 |
16-30 | 18, 20, 25, 28, 30 | 5 |
31-45 | 32, 35, 38, 40, 42, 45 | 6 |
46-60 | 48, 50 | 2 |
Finally, we can create our histogram by drawing a bar over each interval with a height corresponding to the frequency in the table.
The histogram matches option B!
We can draw a box plot by following four steps.
Step 1 | Order the data set from least to greatest value. Find the minimum and maximum. |
---|---|
Step 2 | Determine the median. |
Step 3 | Determine the first and third quartiles. |
Step 4 | Use the five-number summary to draw the box plot. |
Let's complete each step one at a time.
We will start by ordering the given data set from least to greatest value to identify the minimum and maximum values. 25, 28, 30, 30, 35, 40, 42, 45, 47, 50, 52, 55, 58, 60 As we can see, the minimum is 25 and the maximum is 60.
To find the median of the data set, we first need to count how many values there are in the set. 1 25, 2 28, 3 30, 4 30, 5 35, 6 40, 7 42, 8 45, 9 47, 10 50, 11 52, 12 55, 13 58, 14 60 To find the median, look for the value that lies in the middle of the data set. Since there are 14 values in this set, the median is the mean of the numbers in the 7th and 8th positions. 1 25, 2 28, 3 30, 4 30, 5 35, 6 40, 7 42, 8 45, 9 47, 10 50, 11 52, 12 55, 13 58, 14 60 ⇓ Median=42+45/2=43.5 The median is 43.5.
The next step is to find the first and third quartiles of the data set. The median divides the set into two smaller sets with 7 values each. Set1 1 25, 2 28, 3 30, 4 30, 5 35, 6 40, 7 42 [0.5em] Median: 43.5 [0.5em] 1 45, 2 47, 3 50, 4 52, 5 55, 6 58, 7 60 Set2 The first quartile is the median of the first half of the data set, the half with the smaller values. For our set, it will be the 4th value, 30. Set1 1 25, 2 28, 3 30, 4 30, 5 35, 6 40, 7 42 The third quartile is the median of the second half of the data set, the half with the greater values. This will be the 4th value of Set 2, or 52. 1 45, 2 47, 3 50, 4 52, 5 55, 6 58, 7 60 Set2 Therefore, the first quartile is 30 and the third quartile is 52. Q_1=30 Q_3=52
To draw the box plot, we will use the five-number summary we have created. Minimum&= 25 Maximum&= 60 Median&= 43.5 Q_1&= 30 Q_3&= 52 First, mark the values of minimum and maximum with two vertical segments above a number line that is long enough to encompass all values in the data set. These lines will mark the range of the box plot.
Next, mark the median with another vertical line segment in the range above the number line. This line will fall inside the box.
The first and third quartiles will mark the left and right sides of the box — draw vertical lines at these values, then connect them with horizontal segments at the top and bottom to draw the box. The box plot can be completed by drawing two horizontal segments between the left and right sides of the box and the marks for minimum and maximum values.
Our box plot matches option D!
Ali's father really enjoys shallow water scuba diving. The table shows his depth in feet as time passed during his latest dive.
Time (min) | Depth Below Sea Level (ft) |
---|---|
5 | 10 |
10 | 16 |
15 | 21 |
20 | 28 |
25 | 35 |
30 | 30 |
35 | 38 |
40 | 42 |
Draw a line graph for the given data set. Which graph matches the data?
We need to make a line graph for the given situation. First, consider the given table of values.
Time (min) | Depth Below Sea Level (ft) |
---|---|
5 | 10 |
10 | 16 |
15 | 21 |
20 | 28 |
25 | 35 |
30 | 30 |
35 | 38 |
40 | 42 |
The depth data includes values from 10 to 42, so a scale from 0 to 50 feet with an interval of 4 feet is reasonable. The horizontal axis can represent the time in minutes and the vertical axis can represent the depth below sea level in feet. Now we can plot the points on a coordinate plane and connect them with segments.
We have drawn a line graph that represents Ali's father's latest shallow dive. This graph corresponds to option C.