{{ 'ml-label-loading-course' | message }}

{{ tocSubheader }}

{{ 'ml-toc-proceed-mlc' | message }}

{{ 'ml-toc-proceed-tbs' | message }}

An error ocurred, try again later!

Chapter {{ article.chapter.number }}

{{ article.number }}. # {{ article.displayTitle }}

{{ article.intro.summary }}

Show less Show more Lesson Settings & Tools

| {{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }} |

| {{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }} |

| {{ 'ml-lesson-time-estimation' | message }} |

Statistical displays are useful for identifying the key features of a data set. When given multiple data sets, a method for comparing them is required. This lesson will discuss how to compare multiple data sets when given appropriate statistical displays.
### Catch-Up and Review

**Here is a recommended reading before getting started with this lesson.**

Challenge

It is a Saturday morning, and Ali is lying on the couch watching a movie. Ali's mom calls from afar, "Hey! Get up and do something productive." Ali responds, It is the weekend, and I should be free to dedicate time to my hobby — watching movies at home!

Ali is certain that everyone in his class spends a lot of time on their hobbies during weekends. He wants to prove this to his mom and decides to conduct a survey of his classmates. He sends a text message to all of his classmates and uses his gathered data to draw the following box plot.

After seeing the box plot, Ali's mother agrees that it is common among his classmates to spend a lot of time on hobbies during weekends. She has a request, "Actually, Ali please run another survey asking about time spent on hobbies during weekdays." Ali takes on the challenge and runs another survey. He then makes the following box plot.

Is there an easy way to compare the two box plots to draw out a conclusion about the surveys?Discussion

Inferences about two populations can be drawn by comparing their respective statistical displays. Using either a double box plot or a double dot plot are great methods. A double box plot consists of two box plots aligned on the same number line.

Likewise, a double dot plot is made from two dot plots aligned on the same number line. For these, a second number line is drawn to differentiate the data sets.

Depending on whether the data sets are symmetric or not, different measures of center and variation should be used. The following table summarizes the most appropriate measures to compare two data sets.

Both Data Sets are Symmetric | Only One Data Set is Symmetric | Neither Data Set is Symmetric | |
---|---|---|---|

Measure of Center | Mean | Median | Median |

Measure of Variation | Mean Absolute Deviation | Interquartile Range | Interquartile Range |

Data Set A is symmetric, while Data Set B is not. This means that the appropriate measures to compare them are the median and the interquartile range.

Example

Like most students, Ali wakes up very early on weekdays so he can get ready for school. On the weekends, though, he loves sleeping in. After all, there are no classes on the weekends!

Ali wonders if his classmates have the same habit — he predicts that they do. To test this thought, Ali runs another survey asking his classmates at which hour they wake up during weekdays. He then makes the following box plot.

Following that question, he asks his classmates what time they wake up on weekends. He then creates another box plot.

a Which measure of center is more appropriate to compare both data sets?

{"type":"choice","form":{"alts":["Median","Mean"],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":1}

b Which measure of variation is more appropriate to compare both data sets?

{"type":"choice","form":{"alts":["Mean Absolute Deviation","Interquartile Range"],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":0}

c Is there enough information to find the mean of both data sets given just the box plots?

{"type":"choice","form":{"alts":["Yes","No"],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":1}

a Make a double box plot of the given data. Are both plots symmetric?

b Make a double box plot of the given data. Are both plots symmetric?

c Is the center of the box plot the median or the mean of the data set?

a Begin by making a double box plot out of the given box plots. This can be done by placing one on top of the other aligned on the same number line.

Notice that for each box plot, the median is in the middle of the box and both sides are mirror images of each other. This means that **both** box plots are **symmetric**. Therefore, the **mean** is the most appropriate measure to use when comparing the data sets, as shown in the table.

Both Data Sets are Symmetric | Only One Data Set is Symmetric | Neither Data Set is Symmetric | |
---|---|---|---|

Measure of Center | Mean | Median | Median |

Measure of Variation | Mean Absolute Deviation | Interquartile Range | Interquartile Range |

b It was found in Part A that both data sets are symmetric, so the appropriate measure of variation is the **mean absolute deviation**.

Both Data Sets are Symmetric | Only One Data Set is Symmetric | Neither Data Set is Symmetric | |
---|---|---|---|

Measure of Center | Mean | Median | Median |

Measure of Variation | Mean Absolute Deviation | Interquartile Range | Interquartile Range |

Example

Ali has become even more curious about his classmates schedules. He wonders if there is a difference between the times his classmates have dinner on weekdays compared to weekends. He runs a survey asking his classmates at which hour they have their dinner. He obtains the following dot plot.

What about weekends? He runs another survey and makes the following dot plot.

a Which measure of center is more appropriate to compare both data sets?

{"type":"choice","form":{"alts":["Median","Mean"],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":0}

b Which measure of spread is more appropriate to compare both data sets?

{"type":"choice","form":{"alts":["Mean Absolute Deviation","Interquartile Range"],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":1}

c Is there enough information to find the interquartile range of both data sets given just the dot plots?

{"type":"choice","form":{"alts":["No","Yes"],"noSort":false},"formTextBefore":"","formTextAfter":"","answer":1}

a Make a double box plot of the given data. Are both plots symmetric?

b Make a double box plot of the given data. Are both plots symmetric?

c A dot plot represents every data point of a survey.

a Begin by making a double dot plot out of the given dot plots. This can be done by aligning both on top of each other with their respective number lines.

Note that the weekends dot plot is symmetric, but the weekdays is not. This means only one of the data sets is symmetric. That means the appropriate measure of center to compare the sets is the **median**.

Both Data Sets are Symmetric | Only One Data Set is Symmetric | Neither Data Set is Symmetric | |
---|---|---|---|

Measure of Center | Mean | Median | Median |

Measure of Variation | Mean Absolute Deviation | Interquartile Range | Interquartile Range |

b It was found in Part A that only one data set is symmetric, so the appropriate measure of variation is the **interquartile range**.

Both Data Sets are Symmetric | Only One Data Set is Symmetric | Neither Data Set is Symmetric | |
---|---|---|---|

Measure of Center | Mean | Median | Median |

Measure of Variation | Mean Absolute Deviation | Interquartile Range | Interquartile Range |

c A dot plot displays every data point of a survey. Since both data sets are not that large, it is possible to calculate the interquartile ranges by first finding the quartiles by hand. This means the answer is **Yes.**

It was mentioned that the data sets were small enough to find the quartiles by hand, but what would happen if they were too large?

While it is technically possible to find quartiles from a data set of even thousands of points, it is not practical at all. To solve this problem, large surveys are typically ran through the help of statistical software which can calculate all the key measures of data sets in an instant.

Pop Quiz

Two box plots are given in the following applet. Select the most appropriate measure to compare them.

Discussion

A pie chart is a circular chart used to represent the relative frequencies of a data set. It is also called a **circle chart.** These charts are divided into several slices — each representing a group of the whole data set. The following characteristics are typical of pie charts.

- Each slice is drawn using a different color to differentiate the groups.
- The central angle of each group, as well as its area, is proportional to its relative frequency.

A pie chart allows the visualization of each individual data group when compared to the whole. Alone, however, the chart does not give information about the frequency of each group.

Pie charts might also include the relative frequency of each group written as a percentage. It is also possible to include labels to represent each group with matching colors.

Discussion

A pie chart is an effective way of visualizing the proportions of different groups of data when compared to a whole. Pie charts are divided into slices, each of which represents a group and its relative frequency. The central angle of each slice can be found by multiplying each relative frequency by $360_{∘}.$
*expand_more*

*expand_more*

*expand_more*
*expand_more*
*expand_more*

$Central Angle=Relative Frequency⋅360_{∘} $

As an example, consider a survey of a group of people's favorite ice cream flavors. Suppose that $20$ people took part in the survey. Eight people responded that they prefer chocolate, six prefer vanilla, and six prefer other flavors. The following steps can be used to create a pie chart representing this survey.
1

Make a Frequency Table

Begin by identifying each group of the survey. In this case, there are three different flavor groups: chocolate, vanilla, and other. These three groups have eight, six, and six people, respectively.

Flavor | Frequency |
---|---|

Chocolate | $8$ |

Vanilla | $6$ |

Other | $6$ |

2

Find the Relative Frequency of Each Group

A total of $20$ people took part of the survey, so each frequency will be divided by this value to find the relative frequency of each group.

Flavor | Frequency | Relative Frequency |
---|---|---|

Chocolate | $8$ | $208 =0.4$ |

Vanilla | $6$ | $206 =0.3$ |

Other | $6$ | $206 =0.3$ |

3

Find the Central Angle of Each Slice

To find the central angle of each slice multiply the relative frequency of its group by $360_{∘}.$

$Central Angle=Relative Frequency⋅360_{∘} $

This needs to be done for every different group of the survey. Flavor | Frequency | Relative Frequency | Central Angle |
---|---|---|---|

Chocolate | $8$ | $0.4$ | $0.4⋅360_{∘}=144_{∘}$ |

Vanilla | $6$ | $0.3$ | $0.3⋅360_{∘}=108_{∘}$ |

Other | $6$ | $0.3$ | $0.3⋅360_{∘}=108_{∘}$ |

4

Draw the Slices

Begin by drawing a circle to represent the whole population of the survey.

Next, draw a radius to select a starting point. This can be any radius of the circle.

Align the protractor with the starting radius and mark the central angle corresponding to the first group, which is $144_{∘}.$

Draw the radius that passes through the previous mark. The slice of the first group is now ready.

To draw the slice of the next group place the protractor at the end of the previous group and mark the next central angle.

Draw the radius that passes through this mark to obtain the second slice. Repeat this process until every slice is drawn.

Make sure that the sum of the central angles is equal to $360_{∘}.$$144_{∘}+108_{∘}+108_{∘}=360_{∘}✓ $

Be aware that the central angles are not typically shown on the final diagram. 5

Add the Details to the Graph

Finally, color and label each slice of the graph.

Side labels and relative frequencies written as percentages might also be added in this step.

Example

Ali is really enjoying making surveys to learn about the habits of his classmates. Previously, he asked them how much time they spend doing their hobbies. Now he wants to know what those hobbies are.

Hobby | Frequency |
---|---|

Watching Movies/Series | $12$ |

Videogames | $9$ |

Reading | $6$ |

Sports | $3$ |

Add a relative frequency column to the given table. Find the central angle of each slice by multiplying $360_{∘}$ and the relative frequency of each group.

A table with relative frequencies is required in order to make a pie chart. Add a relative frequency column to the given table to do so.

Hobby | Frequency | Relative Frequency |
---|---|---|

Watching Movies/Series | $12$ | |

Videogames | $9$ | |

Reading | $6$ | |

Sports | $3$ |

$12+9+6+3=30 $

Divide each frequency by this total to find each relative frequency. Hobby | Frequency | Relative Frequency |
---|---|---|

Watching Movies/Series | $12$ | $3012 =0.4$ |

Videogames | $9$ | $309 =0.3$ |

Reading | $6$ | $306 =0.2$ |

Sports | $3$ | $303 =0.1$ |

Since there are four types of hobbies, the pie chart will be divided into four slices. To find the central angle of each slice, the relative frequency of each group will be multiplied by $360_{∘}.$ Insert another column called central angle. This helps see the information in a clear and organized way.

Hobby | Frequency | Relative Frequency | Central Angle |
---|---|---|---|

Watching Movies/Series | $12$ | $0.4$ | $0.4⋅360_{∘}=144_{∘}$ |

Videogames | $9$ | $0.3$ | $0.3⋅360_{∘}=108_{∘}$ |

Reading | $6$ | $0.2$ | $0.2⋅360_{∘}=72_{∘}$ |

Sports | $3$ | $0.1$ | $0.1⋅360_{∘}=36_{∘}$ |

Now that every central angle has been found, proceed to draw the graph. Make a circle using a compass and select a starting radius.

Align the protractor with the starting radius and mark the central angle corresponding to the first group, which is $144_{∘}.$

Draw a radius that passes through this mark. This slice corresponds to the people who like watching movies or series.

Repeat this process for each group. Make sure that the zero of the protractor is placed on the previously drawn radius.
Finally, add colors and label each slice of the chart.

Discussion

Pie charts are useful for comparing parts of the data that fall under the same group versus the whole data. The following table summarizes the main types of data displays and what are their best uses.

Data Display | Best Uses |
---|---|

Bar Graph | Show the frequency of categorical data |

Box Plot | Show the measures of variation of a data set |

Pie Chart | Compare different groups of the data to the whole |

Histogram | Divide numerical data into intervals and show their frequency |

Line Graph | Show how data changes over a period of time |

Dot Plot | Display the frequency of numerical data |

Example

Ali is so stoked to see that so many of his classmates share the same hobby of watching movies. He goes ahead and makes a follow-up survey asking his classmates what their favorite movie genre is. He is able to make the following frequencies table.

Favorite Genre | Frequency |
---|---|

Action | $10$ |

Drama | $7$ |

Thriller | $7$ |

Comedy | $3$ |

Horror | $2$ |

Fantasy | $1$ |

{"type":"multichoice","form":{"alts":["Box Plot","Pie Chart","Bar Chart","Dot Plot","Histogram"],"noSort":true},"formTextBefore":"","formTextAfter":"","answer":[0,3,4]}

Is Ali working with categorical or numerical data?

Begin by making sense of what type of data is Ali working with. Since movie genres are used as categories to describe movies, Ali deals with **categorical data**. Now examine the table, which shows data displays as well as the associated data types they are used with.

Data Display | Type of Data |
---|---|

Box Plot | Numerical |

Pie Chart | Categorical |

Bar Chart | Categorical |

Dot Plot | Numerical |

Histogram | Numerical |

Among the data display options, *box plots*, *dot plots*, and *histograms* are designed for representing numerical data. This means that these three types of data displays **cannot** be used to visualize Ali's survey data.

Closure

Ali ran a survey asking about the amount of time his classmates spend on their hobbies. He created the following box plot concerning weekends.

He made a different box plot for time spent on hobbies during weekdays.

Recall how to compare the box plots. It can be done by drawing one of them on top of the other, aligned with the same number line.

Note that the box plot corresponding to weekends is **not** symmetric, while the box plot for weekdays is symmetric. In other words, only one of the data sets is symmetric. Considering the following table, the median and the interquartile range are the appropriate measures for comparing the data.

Both Data Sets are Symmetric | Only One Data Set is Symmetric | Neither Data Set is Symmetric | |
---|---|---|---|

Measure of Center | Mean | Median | Median |

Measure of Variation | Mean Absolute Deviation | Interquartile Range | Interquartile Range |

Remember that in box plots, the vertical line within the rectangular box represents the location of the median. The median for the weekdays is then $2.5$ and the median for weekend is $5.$

Ali and his classmates spend about $5$ hours on their hobbies during the weekend. This is twice as much as the $2.5$ hours they spend during weekdays. They can spend more time on their hobbies because they do not have to go to school on weekends.

The interquartile range for the weekdays is $1$ hour, and for the weekends it is $3.5$ hours.

There is lesser variation in the data for weekdays. This could suggest that most of his classmates spend less time on hobbies during the weekdays, which could mean they have more scheduled activities. It is important to note that not everyone spends all their time on hobbies during the weekends. They might have other things to do!Loading content