Sign In
| 12 Theory slides |
| 9 Exercises - Grade E - A |
| Each lesson is meant to take 1-2 classroom sessions |
Here are a few recommended readings before getting started with this lesson.
Tadeo and Ramsha are foodies and love to explore the restaurants that pop up in their neighborhood. They have recorded data about the average price of the main dishes in each restaurant using a table of values.
Average Main Dish Price (Dollars) | ||||
---|---|---|---|---|
10.12 | 9.29 | 8.29 | 9.78 | 10.69 |
9.68 | 12.09 | 8.94 | 10.81 | 8.62 |
11.39 | 12.62 | 8.71 | 10.74 | 10.52 |
10.77 | 10.15 | 9.18 | 8.45 | 9.52 |
11.89 | 9.77 | 9.44 | 13.24 | 11.01 |
10.62 | 9.38 | 12.15 | 9.68 | 9.60 |
10.32 | 11.31 | 11.41 | 8.62 | 9.27 |
10.96 | 9.18 | 10.28 | 10.71 | 10.02 |
They would like to draw some conclusions from this data. However, they are not entirely sure how to proceed with this task. Find the following information to help these curious connoisseurs!
A frequency distribution, sometimes called a histogram distribution, is a representation that displays the number of observations within a given interval. It is used to show the empirical or theoretical frequency of occurrence of each possible value in a data set, often recorded in a frequency table. Frequency distributions of categorical data are typically presented using a bar graph.
In the case of numerical data, the graphical representation of a frequency distribution is called a histogram.
A symmetric frequency distribution is a distribution in which the data are evenly distributed around the mean and the bars on each side of the middle bar are approximately the same height.
A skewed frequency distribution is a distribution in which the data is not spread evenly — rather, the data is clustered at one end. In this case, the mean and the median are not equal, causing the data set to be skewed. A skewed distribution is neither symmetric nor normal. In general, there are two types of skewed frequency distributions.
Skewed Distribution | Description |
---|---|
Skewed Left / Negatively Skewed | The distribution has a long left tail and the median is greater than the mean. |
Skewed Right / Positively Skewed | The distribution has a long right tail and the median is less than the mean. |
Tadeo and Ramsha are having fun learning about frequency distributions. Last weekend, two of the most exciting cricket games this season took place. Tadeo and Ramsha recorded the runs scored by the 22 players in each match. The data for Games 1 and 2 are shown in the table.
Game 1 | Game 2 | ||||||
---|---|---|---|---|---|---|---|
32 | 21 | 27 | 46 | 114 | 87 | 96 | 92 |
9 | 16 | 19 | 19 | 101 | 111 | 80 | 106 |
40 | 28 | 42 | 36 | 85 | 112 | 117 | 94 |
11 | 38 | 23 | 28 | 62 | 43 | 106 | 66 |
8 | 18 | 26 | 59 | 104 | 51 | 76 | 91 |
62 | 40 | 111 | 78 |
Cricket Runs Scored in Game 1 | |
---|---|
Number of Runs Scored | Frequency |
0−9 | 2 |
10−19 | 5 |
20−29 | 6 |
30−39 | 3 |
40−49 | 4 |
50−59 | 1 |
60−69 | 1 |
Number of Runs Scoredand the vertical axis the
Frequency.Then the bars will be drawn to represent the frequency of each interval.
Cricket Runs Scored in Game 2 | |
---|---|
Number of Runs Scored | Frequency |
40−49 | 1 |
50−59 | 1 |
60−69 | 2 |
70−79 | 2 |
80−89 | 3 |
90−99 | 4 |
100−109 | 4 |
110−119 | 5 |
Tadeo and Ramsha are amazed by how data is presented everywhere and how knowing the distribution of the data helps to interpret that data. Besides cricket, they also love watching NFL games and are fans of Peyton Manning, the famous quarterback who retired at age 40. Now they want to analyze the retirement ages of NFL players by collecting some data.
Retirement Age of NFL Players | |
---|---|
Age | Frequency |
25−26 | 33 |
27−28 | 67 |
29−30 | 93 |
31−32 | 109 |
33−34 | 127 |
35−36 | 114 |
37−38 | 80 |
39−40 | 59 |
41−42 | 43 |
Based on this frequency table, Tadeo and Ramsha created a histogram and in order to draw some conclusions about the data.
Frequencyand the horizontal axis the
Age.Next, bars will be plotted to represent the frequency of data points falling in each interval.
It can be seen that the mean of the distribution is in the interval of 33−34. This means that a typical NFL player is much more likely to retire at around age 33 or 34.
There are special distributions that less common than skewed and symmetric distributions. These distributions may appear in situations such as an experiment where each event has the same probability or a sample taken from two separate populations. These are the uniform and bimodal distributions.
A uniform frequency distribution, sometimes called a flat distribution, is a type of distribution where all the bars are about the same height. This type of distribution arises in scenarios where all the possible outcomes are equally likely. A uniform distribution is also symmetric.
A bimodal distribution is a data distribution with a range of values near two individual values or two intervals, separating the data into two clusters. This causes the histogram of the data to have two peaks. The mean and the median of a bimodal distribution are near the center of the distribution.
The given distribution indicates that the sampling was likely made from two different populations. The term bimodal refers to the peaks of the distribution, which differs from the mode when intervals are used to make the data display. It is worth mentioning that a bimodal distribution whose bars are about the same height on each side of the peaks is also symmetric.
Consider a histogram that shows the attendance per hour at a local restaurant.
Color | Frequency |
---|---|
Red | 164 |
Blue | 168 |
Yellow | 168 |
Pink | 166 |
Green | 165 |
Orange | 169 |
Tadeo collected the other data set from a survey about the exam scores of their classmates.
Exam Score | Frequency |
---|---|
70−71 | 3 |
72−73 | 8 |
74−75 | 11 |
76−77 | 9 |
78−79 | 4 |
80−81 | 2 |
82−83 | 2 |
84−85 | 4 |
86−87 | 8 |
88−89 | 11 |
90−91 | 9 |
92−93 | 7 |
94−95 | 2 |
They now have two data sets and would like to display the data in histograms to analyze them.
peaksdoes the histogram have? Draw a vertical line through the middle of the distribution.
Colorand the vertical axis
Frequency.Then draw the bars to represent the frequency of each color outcome.
Test Scoreand the vertical axis will be the
Frequency.
peaks.Additionally, these two peaks split data into two clusters. This means that the data follows a bimodal distribution. Moreover, suppose a vertical line is drawn around the halfway line of the distribution. The data on the left is an approximate mirror image of the data on the right.
This means that the distribution is bimodal and symmetric. A possible explanation for this data is that it comes from two groups, one group of students who did not study for the exam (the first peak
on the left) and one group that did study for the exam (the second peak
on the right).
A box plot is another data display that allows one to see the shape of a frequency distribution. The length of the whiskers
and the position of the median tell whether the distribution is skewed or symmetric.
During their fantastic journey exploring the restaurants in their neighborhood, Tadeo and Ramsha found a fabulous Italian restaurant. While eating their food, they observed the people who entered the restaurant.
The two are curious about the average age of people eating at this restaurant. Therefore, they decide to collect data on the ages of people who enter the restaurant during a typical day.
Ages of People Who Enter the Italian Restaurant on a Typical Day | |||||
---|---|---|---|---|---|
15 | 53 | 55 | 60 | 38 | 56 |
62 | 14 | 44 | 24 | 32 | 10 |
42 | 54 | 47 | 67 | 60 | 50 |
61 | 30 | 30 | 62 | 62 | 65 |
56 | 52 | 35 | 25 | 34 | 32 |
They now want to draw some conclusions from this data set by displaying it in a box plot.
Five Number Summary | |
---|---|
Minimum Value | 10 |
First Quartile | 32 |
Median | 48.5 |
Third Quartile | 60 |
Maximum Value | 67 |
Now, draw a box from the first quartile to the third quartile. Then, draw a line through the median and the whiskers from the box to the minimum and maximum values.
Notice that this plot corresponds to option A.
Therefore, the option that states that 50% of the people who enter the Italian restaurant in a regular day are between 32 and 60 years old is the right one.
This gives the two correct statements about the girls' data set.
Appropriate Measures of Center | |
---|---|
Girls' Data Set | Boys' Data Set |
Mean | Median |
To identify who is more likely to spend more time on video games, compare these measures of center. Since the girls' distribution is symmetric, its mean is probably in the 6.1-8 interval, the center of the distribution.
Conversely, because in a skewed distribution, the median is righter to the center and closer to the peaks of the distribution, it is probably in the 8.1−10 interval.
Ramsha and Tadeo took a survey of their classmates about the number of hours spent playing video games on the weekends. These results made Ramsha more interested in finding other everyday situations with remarkable differences due to gender. She decided to search the web for similar examples.
During her investigation, Ramsha found a peculiar table showing the results of a survey comparing the amount of money that men and women usually spend on clothes per month.
Women | Men | |
---|---|---|
Survey Size | 100 | 100 |
Minimum | $18 | $8 |
Maximum | $60 | $28 |
1st Quartile | $30 | $14 |
Median | $34 | $18 |
3rd Quartile | $40 | $22 |
Mean | $36 | $18 |
Standard Deviation | $8 | $4 |
Next, draw the box for each plot using the first and third quartiles. Finally, draw a line through the median and the whiskers from the box to the minimum and maximum values of each data set.
Notice that this corresponds to the box-plot in option D.
Standard Deviation | Interquartile Range | |
---|---|---|
Women | $8 | 60−18=$42 |
Men | $4 | 28−8=$20 |
Both the standard deviation and the interquartile range are greater for women. This means that there is more variability in the amount of money spent by women.
All the pieces to analyzing data using histograms have been covered. This method of displaying data makes it easier to find the data distribution and determine the best measures of center and variation to describe the data set. Recall the data Tadeo and Ramsha recorded about the main dishes of the restaurants in their neighborhood at the beginning of the lesson.
Average Main Dish Price (Dollars) | ||||
---|---|---|---|---|
10.12 | 9.29 | 8.29 | 9.78 | 10.69 |
9.68 | 12.09 | 8.94 | 10.81 | 8.62 |
11.39 | 12.62 | 8.71 | 10.74 | 10.52 |
10.77 | 10.15 | 9.18 | 8.45 | 9.52 |
11.89 | 9.77 | 9.44 | 13.24 | 11.01 |
10.62 | 9.38 | 12.15 | 9.68 | 9.60 |
10.32 | 11.31 | 11.41 | 8.62 | 9.27 |
10.96 | 9.18 | 10.28 | 10.71 | 10.02 |
The two students wanted to draw some insights and conclusions based on this data. However, they were not entirely sure how to proceed with this task. Find the following information to help these curious connoisseurs!
Average Main Dish Price (Dollars) | |
---|---|
Price Range | Frequency |
8.00−8.99 | 6 |
9.00−9.99 | 12 |
10.00−10.99 | 13 |
11.00−11.99 | 5 |
12.00−12.99 | 3 |
13.00−13.99 | 1 |
Price Rangeand the vertical axis the
Frequency.Then, draw the bars to represent the frequency of each interval.
Distribution | Measure of Center | Measure of Variation |
---|---|---|
Symmetric | Mean | Standard deviation |
Skewed | Median | Five-number summary |
Because in this situation the distribution of the data is skewed, the median and the five-number summary best describe the center and variation of the data, respectively.
Find the distribution of each histogram.
We will begin by looking at the distribution and where its tail extends.
Notice that the tail extends to the left of the distribution and most data is on the right. This means that the distribution skews to the left.
We will draw a line in the middle of the distribution.
We can see that the line divides the distribution into two approximately mirror images. Therefore, the distribution is symmetric.
Now, let's look at the histogram.
In this case, the tail of the distribution extends to the right and most of the data is on the left. Therefore, the distribution is skewed right.
Look at the given histograms and determine their distributions.
The following histogram describes the arrival of flights per hour at an airport on a typical day.
The lifetime of a sample of 500 neon lamps was measured. The results are displayed in a histogram.
We will determine the shape of the given distribution. Begin by looking at the given histogram.
We can see that the histogram has two peaks. Moreover, notice that the distribution can be split into two clusters. Let's draw a vertical line around the middle of the distribution.
Given the two clusters and the two peaks, the data follows a bimodal distribution. Moreover, the data on the left of the halfway line is an approximate mirror image of the data on the right, this means that the distribution is also symmetric.
Following a similar procedure, let's look at the given histogram.
Notice that the bars look about the same height. Let's draw a horizontal line above and close to the highest bar.
Given that the bars are approximately the same height, the data follows a uniform distribution. Moreover, a uniform distribution is also symmetric.
In a frequency table, Emily recorded the data from a survey about the number of hours the neighbors of her apartment complex spend online in a week.
Hours Online | Frequency |
---|---|
0−2 | 4 |
3−5 | 8 |
6−8 | 13 |
9−11 | 16 |
12−14 | 20 |
15−17 | 24 |
18−20 | 30 |
21−23 | 55 |
24−26 | 70 |
In a similar survey, Emily recorded the number of times the neighbors order food via an app delivery per month.
Number of Orders | Frequency |
---|---|
0−3 | 80 |
4−7 | 50 |
8−11 | 25 |
12−15 | 8 |
16−19 | 15 |
20−23 | 9 |
24−27 | 2 |
We are given the data collected in a survey about the hours the neighbors spend online per week. By displaying data in a histogram, its distribution can be found. To do so, the vertical axis will be the Frequency
and the horizontal axis the Hours.
Next, bars will be plotted to represent the frequency of data points falling in each interval.
Notice that the tail of the histogram extends to the left and most of the data is on the right. This means that the data obtained in the survey is skewed left.
Following a similar procedure, we can determine the distribution of the data about the number of times people order food via an app delivery. In this situation, the vertical axis will be the Frequency
and the horizontal axis the Number of Orders.
In this case, the tail of the distribution extends to the right and most of the data is on the left. Therefore, the data about the number of times people order food via an app delivery skews to the right.
Consider the number of runs scored by a softball team in 30 games.
Number of Runs | ||||
---|---|---|---|---|
8 | 5 | 7 | 12 | 2 |
4 | 4 | 4 | 10 | 7 |
11 | 9 | 2 | 10 | 6 |
7 | 1 | 4 | 6 | 16 |
17 | 10 | 14 | 6 | 6 |
12 | 16 | 4 | 14 | 6 |
We are given the number of runs scored by the softball team. We will use a box plot to display the data and identify its distribution. To do so, we will use a graphing calculator. First, let's input the data to the calculator. Push STAT, chose Edit,
and enter the values in the first column.
Now, to get the box plot, push 2nd and Y=, and choose one of the plots in the list. Make sure you turn the plot ON,
set the Type
to box-and-whiskers plot, and assign L1 as XList.
To graph the box plot, push GRAPH. Note that we may need to change the window size so that it spans the length of the box-and-whiskers plot. To do so, push WINDOW.
Notice that the right whisker is longer than the left and the median is closer to the left whisker than it is to the right. This means that the data is skewed right.