| {{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }} |
| {{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }} |
| {{ 'ml-lesson-time-estimation' | message }} |
Here is a recommended reading before getting started with this lesson.
It is a Saturday morning, and Ali is lying on the couch watching a movie. Ali's mom calls from afar, "Hey! Get up and do something productive." Ali responds, It is the weekend, and I should be free to dedicate time to my hobby — watching movies at home!
Ali is certain that everyone in his class spends a lot of time on their hobbies during weekends. He wants to prove this to his mom and decides to conduct a survey of his classmates. He sends a text message to all of his classmates and uses his gathered data to draw the following box plot.
After seeing the box plot, Ali's mother agrees that it is common among his classmates to spend a lot of time on hobbies during weekends. She has a request, "Actually, Ali please run another survey asking about time spent on hobbies during weekdays." Ali takes on the challenge and runs another survey. He then makes the following box plot.
Is there an easy way to compare the two box plots to draw out a conclusion about the surveys?Inferences about two populations can be drawn by comparing their respective statistical displays. Using either a double box plot or a double dot plot are great methods. A double box plot consists of two box plots aligned on the same number line.
Likewise, a double dot plot is made from two dot plots aligned on the same number line. For these, a second number line is drawn to differentiate the data sets.
Depending on whether the data sets are symmetric or not, different measures of center and variation should be used. The following table summarizes the most appropriate measures to compare two data sets.
Both Data Sets are Symmetric | Only One Data Set is Symmetric | Neither Data Set is Symmetric | |
---|---|---|---|
Measure of Center | Mean | Median | Median |
Measure of Variation | Mean Absolute Deviation | Interquartile Range | Interquartile Range |
Like most students, Ali wakes up very early on weekdays so he can get ready for school. On the weekends, though, he loves sleeping in. After all, there are no classes on the weekends!
Ali wonders if his classmates have the same habit — he predicts that they do. To test this thought, Ali runs another survey asking his classmates at which hour they wake up during weekdays. He then makes the following box plot.
Following that question, he asks his classmates what time they wake up on weekends. He then creates another box plot.
Notice that for each box plot, the median is in the middle of the box and both sides are mirror images of each other. This means that both box plots are symmetric. Therefore, the mean is the most appropriate measure to use when comparing the data sets, as shown in the table.
Both Data Sets are Symmetric | Only One Data Set is Symmetric | Neither Data Set is Symmetric | |
---|---|---|---|
Measure of Center | Mean | Median | Median |
Measure of Variation | Mean Absolute Deviation | Interquartile Range | Interquartile Range |
Both Data Sets are Symmetric | Only One Data Set is Symmetric | Neither Data Set is Symmetric | |
---|---|---|---|
Measure of Center | Mean | Median | Median |
Measure of Variation | Mean Absolute Deviation | Interquartile Range | Interquartile Range |
Ali has become even more curious about his classmates schedules. He wonders if there is a difference between the times his classmates have dinner on weekdays compared to weekends. He runs a survey asking his classmates at which hour they have their dinner. He obtains the following dot plot.
What about weekends? He runs another survey and makes the following dot plot.
Note that the weekends dot plot is symmetric, but the weekdays is not. This means only one of the data sets is symmetric. That means the appropriate measure of center to compare the sets is the median.
Both Data Sets are Symmetric | Only One Data Set is Symmetric | Neither Data Set is Symmetric | |
---|---|---|---|
Measure of Center | Mean | Median | Median |
Measure of Variation | Mean Absolute Deviation | Interquartile Range | Interquartile Range |
Both Data Sets are Symmetric | Only One Data Set is Symmetric | Neither Data Set is Symmetric | |
---|---|---|---|
Measure of Center | Mean | Median | Median |
Measure of Variation | Mean Absolute Deviation | Interquartile Range | Interquartile Range |
It was mentioned that the data sets were small enough to find the quartiles by hand, but what would happen if they were too large?
While it is technically possible to find quartiles from a data set of even thousands of points, it is not practical at all. To solve this problem, large surveys are typically ran through the help of statistical software which can calculate all the key measures of data sets in an instant.
Two box plots are given in the following applet. Select the most appropriate measure to compare them.
A pie chart is a circular chart used to represent the relative frequencies of a data set. It is also called a circle chart. These charts are divided into several slices — each representing a group of the whole data set. The following characteristics are typical of pie charts.
A pie chart allows the visualization of each individual data group when compared to the whole. Alone, however, the chart does not give information about the frequency of each group.
Pie charts might also include the relative frequency of each group written as a percentage. It is also possible to include labels to represent each group with matching colors.
Begin by identifying each group of the survey. In this case, there are three different flavor groups: chocolate, vanilla, and other. These three groups have eight, six, and six people, respectively.
Flavor | Frequency |
---|---|
Chocolate | 8 |
Vanilla | 6 |
Other | 6 |
A total of 20 people took part of the survey, so each frequency will be divided by this value to find the relative frequency of each group.
Flavor | Frequency | Relative Frequency |
---|---|---|
Chocolate | 8 | 208=0.4 |
Vanilla | 6 | 206=0.3 |
Other | 6 | 206=0.3 |
Flavor | Frequency | Relative Frequency | Central Angle |
---|---|---|---|
Chocolate | 8 | 0.4 | 0.4⋅360∘=144∘ |
Vanilla | 6 | 0.3 | 0.3⋅360∘=108∘ |
Other | 6 | 0.3 | 0.3⋅360∘=108∘ |
Begin by drawing a circle to represent the whole population of the survey.
Next, draw a radius to select a starting point. This can be any radius of the circle.
Align the protractor with the starting radius and mark the central angle corresponding to the first group, which is 144∘.
Draw the radius that passes through the previous mark. The slice of the first group is now ready.
To draw the slice of the next group place the protractor at the end of the previous group and mark the next central angle.
Draw the radius that passes through this mark to obtain the second slice. Repeat this process until every slice is drawn.
Make sure that the sum of the central angles is equal to 360∘.Finally, color and label each slice of the graph.
Side labels and relative frequencies written as percentages might also be added in this step.
Ali is really enjoying making surveys to learn about the habits of his classmates. Previously, he asked them how much time they spend doing their hobbies. Now he wants to know what those hobbies are.
Hobby | Frequency |
---|---|
Watching Movies/Series | 12 |
Videogames | 9 |
Reading | 6 |
Sports | 3 |
Add a relative frequency column to the given table. Find the central angle of each slice by multiplying 360∘ and the relative frequency of each group.
A table with relative frequencies is required in order to make a pie chart. Add a relative frequency column to the given table to do so.
Hobby | Frequency | Relative Frequency |
---|---|---|
Watching Movies/Series | 12 | |
Videogames | 9 | |
Reading | 6 | |
Sports | 3 |
Hobby | Frequency | Relative Frequency |
---|---|---|
Watching Movies/Series | 12 | 3012=0.4 |
Videogames | 9 | 309=0.3 |
Reading | 6 | 306=0.2 |
Sports | 3 | 303=0.1 |
Since there are four types of hobbies, the pie chart will be divided into four slices. To find the central angle of each slice, the relative frequency of each group will be multiplied by 360∘. Insert another column called central angle. This helps see the information in a clear and organized way.
Hobby | Frequency | Relative Frequency | Central Angle |
---|---|---|---|
Watching Movies/Series | 12 | 0.4 | 0.4⋅360∘=144∘ |
Videogames | 9 | 0.3 | 0.3⋅360∘=108∘ |
Reading | 6 | 0.2 | 0.2⋅360∘=72∘ |
Sports | 3 | 0.1 | 0.1⋅360∘=36∘ |
Now that every central angle has been found, proceed to draw the graph. Make a circle using a compass and select a starting radius.
Align the protractor with the starting radius and mark the central angle corresponding to the first group, which is 144∘.
Draw a radius that passes through this mark. This slice corresponds to the people who like watching movies or series.
Repeat this process for each group. Make sure that the zero of the protractor is placed on the previously drawn radius.
Pie charts are useful for comparing parts of the data that fall under the same group versus the whole data. The following table summarizes the main types of data displays and what are their best uses.
Data Display | Best Uses |
---|---|
Bar Graph | Show the frequency of categorical data |
Box Plot | Show the measures of variation of a data set |
Pie Chart | Compare different groups of the data to the whole |
Histogram | Divide numerical data into intervals and show their frequency |
Line Graph | Show how data changes over a period of time |
Dot Plot | Display the frequency of numerical data |
Ali is so stoked to see that so many of his classmates share the same hobby of watching movies. He goes ahead and makes a follow-up survey asking his classmates what their favorite movie genre is. He is able to make the following frequencies table.
Favorite Genre | Frequency |
---|---|
Action | 10 |
Drama | 7 |
Thriller | 7 |
Comedy | 3 |
Horror | 2 |
Fantasy | 1 |
Is Ali working with categorical or numerical data?
Begin by making sense of what type of data is Ali working with. Since movie genres are used as categories to describe movies, Ali deals with categorical data. Now examine the table, which shows data displays as well as the associated data types they are used with.
Data Display | Type of Data |
---|---|
Box Plot | Numerical |
Pie Chart | Categorical |
Bar Chart | Categorical |
Dot Plot | Numerical |
Histogram | Numerical |
Among the data display options, box plots, dot plots, and histograms are designed for representing numerical data. This means that these three types of data displays cannot be used to visualize Ali's survey data.
Ali ran a survey asking about the amount of time his classmates spend on their hobbies. He created the following box plot concerning weekends.
He made a different box plot for time spent on hobbies during weekdays.
Recall how to compare the box plots. It can be done by drawing one of them on top of the other, aligned with the same number line.
Note that the box plot corresponding to weekends is not symmetric, while the box plot for weekdays is symmetric. In other words, only one of the data sets is symmetric. Considering the following table, the median and the interquartile range are the appropriate measures for comparing the data.
Both Data Sets are Symmetric | Only One Data Set is Symmetric | Neither Data Set is Symmetric | |
---|---|---|---|
Measure of Center | Mean | Median | Median |
Measure of Variation | Mean Absolute Deviation | Interquartile Range | Interquartile Range |
Remember that in box plots, the vertical line within the rectangular box represents the location of the median. The median for the weekdays is then 2.5 and the median for weekend is 5.
Ali and his classmates spend about 5 hours on their hobbies during the weekend. This is twice as much as the 2.5 hours they spend during weekdays. They can spend more time on their hobbies because they do not have to go to school on weekends.
The interquartile range for the weekdays is 1 hour, and for the weekends it is 3.5 hours.
There is lesser variation in the data for weekdays. This could suggest that most of his classmates spend less time on hobbies during the weekdays, which could mean they have more scheduled activities. It is important to note that not everyone spends all their time on hobbies during the weekends. They might have other things to do!