Sign In
| 19 Theory slides |
| 10 Exercises - Grade E - A |
| Each lesson is meant to take 1-2 classroom sessions |
Here are a few recommended readings before getting started with this lesson.
Consider four different situations that involve a data set.
Situation 1 | A tech company recently launched a new smartphone model. They conduct a survey where customers rate their satisfaction with the phone on a scale of 1 to 10. |
---|---|
Situation 2 | A survey is conducted to gather data on the distribution of ages among participants in a community event. |
Situation 3 | A teacher wants to analyze the distribution of test scores in her class by finding the median and quartiles of the scores. |
Situation 4 | A study tracks the temperature variations over the course of a week in a particular city. |
Tadeo bought the latest edition of the scientific magazine Imagineer, which includes a lot of different graphs and math-related discussions on real-life topics. The first article discussed the results of a recent math competition among high school students.
Range: 8
Shape: Symmetric
Step 1 | Determine the center (median) by finding the middle data point. |
---|---|
Step 2 | Find the maximum and minimum values on the graph. Use these values to calculate the spread (range) of the data. |
Step 3 | Analyze the overall shape of the graph. Note any other interest features it may have. |
Complete each step one at a time.
First, find the total number of data points in the set by counting all the dots in the given dot plot.
There are 15 data points, which suggests that there were 15 participants in the math competition that earned a score. The median of the data set is the 8th data point because it divides the set into two even halves. Locate this value on the dot plot.
The 8th data point is a 6, which means that the median, or the center, of the data set is 6. This measure indicates the middle value of all the scores earned by the participants.
The next step is to find the maximum and minimum values on the dot plot.
The final step is to analyze the overall shape of the graph.
Notice that the left and right halves of the dot plot are not exactly identical. However, since they still resemble each other very closely, the plot can be considered symmetric in shape.
Step 1 | Title the plot based on the problem. Draw a number line to begin the dot plot, being sure to use values that are appropriate for the data set. |
---|---|
Step 2 | Determine the frequency of each value. |
Step 3 | Place dots over each number on the number line that corresponds to the frequency for each value in the data set. |
Complete each step one at a time.
Scores From Last Year.Notice that the scores range from 1 to 9.
The next step is to determine the frequency of each score earned by the participants from last year's math competition. Count how many times each score from 1 to 9 appears in the set and write that number in a table next to the score.
Grade | Frequency |
---|---|
1 | 1 |
2 | 0 |
3 | 1 |
4 | 2 |
5 | 0 |
6 | 3 |
7 | 3 |
8 | 2 |
9 | 1 |
Lastly, place dots over each score from 1 to 9 equal to the number of times it appears in the data set. This is where the frequency table is helpful.
The dot plot for the given data set was successfully created.
Dot plots are not the only type of frequency distribution.
A frequency distribution is a representation that displays the number of observations within a given interval or category. It is used to show the empirical or theoretical frequency of occurrence of each possible value in a data set, often recorded in a frequency table. Frequency distributions of categorical data are typically presented using a bar graph.
A special type of frequency distribution is the histogram.
A histogram is a graphical illustration of a frequency distribution of a data set that contains numerical data. Histograms have several defining characteristics.
The next step is to make a frequency table showing how many data points lie in each interval.
Interval | Data Points | Frequency |
---|---|---|
1−10 | 4, 8 | 2 |
11−20 | 11, 11, 13, 15, 17, 19 | 6 |
21−30 | 21, 25, 26 | 3 |
31−40 | 37 | 1 |
The histogram can be constructed by drawing a bar over each interval with a height corresponding to the frequency listed in the table.
Another article in Imagineer focused on the upcoming opening of a new screen room in the local theater Movieton. The theater has had two screens for the last 10 years. The bar graph in the article shows the average number of tickets sold on each day of the week throughout the year 2023.
Dependent Variable: Number of tickets sold
Most Ticket Sales: Saturday with 342 tickets
Step 1 | Identify the independent and dependent variables. |
---|---|
Step 2 | List the frequency in each bar. |
Step 3 | Interpret the data and describe the bar graph's shape. Use the interpretation to answer any questions about the data. |
Complete each step one at a time.
First, the independent and dependent variables need to be identified. The horizontal line lists the days of a week, while the vertical line represents the average number of tickets sold.
The article analyzes how many tickets are sold on average on different days of a week and which day has the most sales. This means that the independent variable is the day of the week and the dependent variable is the average number of tickets sold on that day.
Next, the frequency in each bar should be listed and interpreted. Use the values given in the bar graph to indicate the height of each bar. Remember, the vertical line represents the average number of tickets sold on each day.
Day | Frequency |
---|---|
Monday | 64 |
Tuesday | 70 |
Wednesday | 62 |
Thursday | 137 |
Friday | 295 |
Saturday | 342 |
Sunday | 260 |
Lastly, interpret the data.
Fridays and Saturdays have the highest numbers of average tickets sold, 295 and 342 respectively. The greatest number of tickets tends to be sold on Saturdays, when an average of 342 tickets are sold.
Step 1 | Choose the number of intervals. |
---|---|
Step 2 | Determine the size of the intervals. |
Step 3 | Make a frequency table. |
Step 4 | Draw the histogram. |
Complete each step one at a time.
The next step is to make a frequency table showing how many data points lie in each interval.
Interval | Data Points | Frequency |
---|---|---|
1−15 | 7, 9, 11, 13, 15 | 5 |
16−30 | 17, 18, 19, 20, 21, 22, 23, 24, 24, 24, 25, 26, 27, 28, 30 | 15 |
31−45 | 32, 35, 37, 39, 42, 45 | 6 |
46−60 | 49, 55, 60 | 3 |
61−75 | 70 | 1 |
The histogram can be constructed by drawing a bar over each interval with a height corresponding to the frequency listed in the table.
Some statistical displays show the spread of a data set instead of its frequency.
A box plot, or box and whisker plot, can be used to illustrate the distribution of a data set. A box plot has three parts.
If a data set has outliers, they are marked as separate points to the left and/or right of the whiskers. A box plot is a scaled figure and is usually presented above a number line. The set of numbers used to draw the box plot is called the five-number summary of the data set. Each of the five numbers is labeled below.
One type of statistical display that is often used for illustrating data sets is the box plot.
The first and third quartiles are marked as the left and right sides of the box plot.
The box plot can be completed by drawing a box between the first and third quartiles Q1 and Q3. Then, the left and right borders of the box are connected by horizontal segments to the minimum and maximum.
The box plot is now complete.
The next article in Imagineer focused on an analysis of air quality data collected from different urban and rural areas around the world. It included a box plot to visualize variations in pollution levels.
Maximum: 98
Median: 63.5
First Quartile: 39
Third Quartile: 83
Now consider the given box plot again.
Concept | Value | Meaning |
---|---|---|
Minimum | 24 | The lowest level of pollution in the analyzed areas earned 24 out of 100 points. |
Maximum | 98 | The highest level of pollution in the analyzed areas earned 98 out of 100 points. |
Median | 63.5 | The average pollution score in the analyzed areas is 63.5 out of 100. |
First Quartile | 39 | A quarter of the analyzed areas have a pollution score of 39 or lower. |
Third Quartile | 83 | A quarter of the analyzed areas have a pollution score of 83 or higher. |
Step 1 | Order the data set from least to greatest. Identify the minimum and maximum values. |
---|---|
Step 2 | Determine the median. |
Step 3 | Determine the first and third quartiles. |
Step 4 | Draw the box plot. |
Complete each step one at a time.
Next, mark the median with a vertical line segment inside the range above the number line. Remember that the line for the median falls inside the box.
The first and third quartiles are marked as the left and right sides of the box plot. The box plot can be completed by drawing a box between the quartiles and two horizontal segments between the left and right sides of the box and the minimum and maximum values.
The box plot is complete.
The distribution of a data set shows the arrangement of data values. Here are a few concepts that can be used to describe a distribution.
Concept | Definition |
---|---|
Cluster | Data values that are grouped closely together |
Gap | Numbers that have no data values |
Peak | The most frequently occurring values, or the mode |
Symmetry | How the left side of the distribution looks compared to the right side |
Outlier | A data value that does not seem to fit with the rest of the set |
Consider a distribution displayed with the following dot plot.
Since the data is evenly distributed between the left and right sides, it is a symmetric distribution. It has a cluster of several data values within the 5−9 interval. There are gaps at 4 and 10 because there are no data values at these points on the number line. The value 7 is a peak because it is the most frequently occurring value. There appears to be an outlier at 15.
There are different measures of center and spread available for describing a data distribution. For example, measures of center are mean and median, while measures of spread are interquartile range and mean absolute deviation. Consider the following diagram to determine which measures to use to describe a .
Tadeo was especially interested in an article about Internet usage among teenagers. Curious to learn more, he decided to check out a website referenced in the article.
Choose the appropriate measures to describe the center and spread of the distribution. Describe the shape of the distribution.
Measure of Center: Mean
Measure of Spread: Mean absolute deviation
Measure of Center: Median
Measure of Spread: Interquartile range
The data is fairly evenly distributed between the left and right side, so it is a roughly symmetric distribution. The value 5 is a peak because it is the most frequently occurring value. The data values are clustered around 5.
Next, decide which measures of center and spread are the most appropriate based on the symmetry of the distribution. Remember which measures can be used in each situation.
Measure of Center | Measure of Spread | |
---|---|---|
Symmetric Distribution | Mean | Mean absolute deviation |
Non-Symmetric Distribution | Median | Interquartile range |
Since this distribution is symmetric, it is best to use the mean as the measure of center and the mean absolute deviation as the measure of spread.
Here, the left side of the data is different than the right side, so the distribution is not symmetric. There are two clusters of data values within the intervals 17−20 and 22−24, separated by a gap at 21. The peak of the data set is at 23.
The most appropriate measures of center and spread can be determined by looking at the symmetry of the distribution. When the distribution is not symmetric, as in this case, it is best to use the median as the measure of center and the interquartile range as the measure of spread.
Analyzing data sets can involve more than just considering the frequency of the data values or analyzing their distribution. There is a statistical display that shows the relationship between two values.
A line graph is used to show how a set of data changes with respect to another quantity, often a period of time. To make a line graph, a scale and intervals for the coordinate axes are chosen. The data points are then graphed and a line connecting the points drawn. Consider a table of values that represents the growth of a plant over several weeks.
Plant Growth | |||||
---|---|---|---|---|---|
Week | 1 | 2 | 3 | 4 | 5 |
Height (in.) | 1.5 | 2.3 | 4 | 6.2 | 8 |
The height data includes values from 1.5 to 8, so a scale from 0 to 10 inches with an interval of 1 inch is reasonable. The horizontal axis can represent time in weeks and the vertical axis can represent the plant height in inches. Now the points can be plotted on a coordinate plane and connected with a line.
Hour | Distance Traveled (mi) |
---|---|
1 | 70 |
2 | 135 |
3 | 203 |
4 | 278 |
5 | 348 |
The horizontal axis can represent time in hours and the vertical axis can represent the distance traveled in miles. The distance data includes values from 70 to 348, so a scale from 0 to 420 miles with an interval of 70 miles is reasonable. Now the points can be plotted on a coordinate plane and connected with line segments.
The line graph representing the distance traveled by Tadeo's family is complete.
Notice that the graph shows an upward slant of the line with a steady increase from 1 to 5 hours. To predict about how long the drive to Tadeo's grandparents will take, follow the trend of the line to extend the graph to a distance of 420 miles.
It can be predicted Tadeo's family will reach their destination after a total of about 6 hours of driving.
This lesson has presented several statistical displays that can be used to represent a data set. To determine which display to choose, consider the following table.
Type of Display | Best Used to... |
---|---|
Bar Graph | … show values corresponding to specific categories |
Box Plot | … show measures of spread for a data set |
Dot Plot | … show how many times each value occurs in the set |
Histogram | … show the frequency of data divided into equal intervals |
Line Graph | … show change over a period of time or in respect to a different quantity |
Tadeo's grandparents are 83 and 85 years old. Over the years, they have told Tadeo many fascinating stories about their lives. He began to wonder how his grandparents got to live such long and wonderful lives. Later he started wondering about the life expectancy in different countries, so he did a little investigation.
Country | Life Expectancy |
---|---|
United States | 76.3 |
Japan | 84.5 |
Germany | 80.9 |
Brazil | 77.3 |
China | 78.2 |
India | 68.3 |
Australia | 83.3 |
South Africa | 62.4 |
Type of Display | Best Used to... |
---|---|
Bar Graph | … show values corresponding to specific categories |
Box Plot | … show measures of spread for a data set |
Dot Plot | … show how many times each value occurs in the set |
Histogram | … show the frequency of data divided into equal intervals |
Line Graph | … show change over a period of time or in respect to a different quantity |
The given data set lists countries and their corresponding life expectancies. After careful consideration of these types of displays, a bar graph looks like the best choice because it can show data that corresponds to specific categories — in this case, countries and life expectancies.
Country | Life Expectancy |
---|---|
United States | 76.3 |
Japan | 84.5 |
Germany | 80.9 |
Brazil | 77.3 |
China | 78.2 |
India | 68.3 |
Australia | 83.3 |
South Africa | 62.4 |
The countries can be marked on the horizontal axis and the life expectancy can be marked on the vertical axis. Next, draw bars with heights equal to the life expectancies.
Recall the four situations mentioned at the beginning of the lesson.
Situation 1 | A tech company recently launched a new smartphone model. They conduct a survey where customers rate their satisfaction with the phone on a scale of 1 to 10. |
---|---|
Situation 2 | A survey is conducted to gather data on the distribution of ages among participants in a community event. |
Situation 3 | A teacher wants to analyze the distribution of test scores in her class by finding the median and quartiles of the scores. |
Situation 4 | A study tracks the temperature variations over the course of a week in a particular city. |
In the first situation involves a survey where a group of customers rate their satisfaction with the phone on a scale of 1 to 10. This situation can be visualized with a diagram where the horizontal axis shows ratings from 1 to 10 with dots placed over the numbers corresponding to the frequency of each given rating. This description fits a dot plot.
In the second situation, data on the distribution of ages of participants in a community event is collected. This data can be illustrated by showing the ages in different intervals on the horizontal axis and the number of people in the corresponding age interval on the vertical axis. This matches the description of a histogram.
The third situation involves analyzing the distribution of test scores in a class. The two remaining statistical displays are a box plot and a line graph. Recall what they are best used for.
Type of Display | Best Ysed to... |
---|---|
Box Plot | … measures of spread for a data set |
Line Graph | … show change over a period of time or in respect to a different quantity |
Since the distribution of the test scores is to be illustrated, not the changes in scores over a period of time, a box plot seems like the best-fitting display for the situation.
The last situation includes tracking temperature variations over the course of a week. This data can be demonstrated with a line graph where the horizontal axis represents the days of the week and the vertical axis represents the temperature.
Consider a dot plot showing the average number of hours that a group of surveyed people slept.
Let's begin by recalling the definition of the median of a data set.
Median |- The median of a list of values is the value appearing at the center of a sorted version of the list, or the mean of the two central values if the list contains an even number of values.
First, use the given dot plot to write the ordered data set. 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 9 There are 15 values in the data set. This means that the median is the middle, or 8th, data point. 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 9 The median is 6. Now let's find the mode of the data set.
Mode |- The mode of a data set is the number or numbers that occur most often in the set.
We can use the dot plot to identify these values in our data set. We simply need to identify which value has the most dots marked above it!
The value that occurs most often in our data set is 6. Therefore, the mode of the set is 6.
Let's find the range of the given data values.
Range |- The range of a data set is the difference between the greatest and least data values in the set.
To find the range, we need to find the difference between the largest and the smallest values.
Ordered Data Set | 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 9 |
---|---|
Range | 9- 5 = 4 |
Our data values are between 5 and 9, so the range of the set is 4. Now let's remember the definition of an outlier in a data set.
Outlier |- An outlier is a data point that is significantly different from the other values in a data set. It can be significantly larger or significantly smaller than the others.
From the dot plot, we can see that no single value lies significantly far from the rest of the data set. This means that there are no outliers in the data set.
Consider a histogram that illustrates the women's cycling time trials for the Olympics.
Let's start by describing the given histogram.
The height of each bar represents the number of cyclists that finished the race in that time interval. We can find the total number of participants in the trials by adding together the heights of the bars. 9 + 7 + 5 + 3 = 24 There are 24 cyclists who participated in the time trials. There is no bar for the time between 40 and 44 minutes, which means that no women finished the race in between 40 and 44 minutes. Most of the cyclists finished the race in 45 to 49 minutes. To find the interval that has 5 cyclists, we will look for the bar that has a height of 5.
We can see that the third bar has a height of 5. The corresponding interval is 55 - 59 minutes.
We want to use the given histogram to find how many cyclists had a time of less than 60 minutes. Let's look at the intervals on the histogram and consider the bars that correspond to times of less than 60 minutes.
Let's add up the heights of the bars in the first four intervals to find the number of cyclists who finished the race in less than 60 minutes. 0+9 + 7 + 5 = 21 We found that 21 cyclists finished the trials in less than 60 minutes.
The number of calories in a serving of certain foods that Ali eats is displayed in the box plot.
We want to find the median of the given box plot. Recall that the median is represented by the vertical segment inside the box.
The vertical segment is right above 100, so the median is 100.
To identify the quartiles, we look at the beginning and the end of the box. The beginning is the first quartile and the end is the third quartile.
The first point is at 50, so the first quartile is 50. The third quartile looks to be halfway between 150 and 200, so it equals 175.
The interquartile range is the length of the box. We can find it by subtracting the first quartile from the third quartile. IQR = 175 - 50 ⇒ IQR = 125 The interquartile range is 125. Finally, let's find the range of the data set. To do so, we start by identifying the least and the greatest values on the box plot.
The lowest value is 25 and the greatest value is 375. Let's subtract the lowest value from the greatest to find the range. 375 - 25 = 350 Therefore, the range is 350.
Remember that an outlier is a data point that is significantly different from the other values in the data set.
There is one value that lies far apart from the rest of the data points and is marked with an asterisk. This means that there is an outlier at 375.
Consider the line graph representing the world's tropical rainforests across the decades.
Let's analyze the given line graph to describe the trend in the world's rainforests.
Because the line graph is slanted downward as it goes from left to right, we can say that the area of tropical rainforests decreases each decade. At first, the area decreased slowly, but from 1980 the remaining rainforest area began to decrease dramatically. This means that the area of the world's rainforests decreases at a slower rate from 1950 to 1980 than from 1980 to 2020.
From Part A, we know that the area of the world's tropical rainforests decreases each decade. We can predict that the remaining rainforest area will continue to decrease from 2020 to 2030. Let's extend the line graph to estimate our prediction.
We can predict that in 2030 there will be about 200 million acres of rainforests.