Expand menu menu_open Minimize Start chapters Home History history History expand_more
{{ item.displayTitle }}
navigate_next
No history yet!
Progress & Statistics equalizer Progress expand_more
Student
navigate_next
Teacher
navigate_next
{{ filterOption.label }}
{{ item.displayTitle }}
{{ item.subject.displayTitle }}
arrow_forward
No results
{{ searchError }}
search
menu_open
{{ courseTrack.displayTitle }}
{{ statistics.percent }}% Sign in to view progress
{{ printedBook.courseTrack.name }} {{ printedBook.name }}
search Use offline Tools apps
Login account_circle menu_open

Describing Distributions of Data Sets

Depending on how a data set is distributed, its histogram can take on different shapes. There are two main types.

Concept

Symmetric Distribution

If the data is distributed somewhat evenly around the mean, the bars on each side of the middle bar will have roughly the same height. This is called a symmetric distribution.

In a symmetric histogram, the mean is typically found in the tallest bar.

Concept

Skewed Distribution

When the data is not evenly distributed around the mean the histogram becomes skewed.

For skewed histograms, the mean is usually not found in the tallest bar.
Theory

Choosing Statistics Based on the Distribution

Based on the distribution of a data set, different types of statistics are more suitable than others.

Theory

Symmetric distribution

For a symmetric distribution, all values are spread out somewhat evenly around the center. Since both the mean and standard deviation take into account the actual values of all data points, these statistics are usually more suitable when describing this distribution.

Theory

Skewed distribution

When a distribution is skewed, there can be outliers. A small number of extreme values unevenly distributed can affect both the mean and standard deviation significantly, leading to a misrepresentation of the data set. Since the median and IQR don't take into account the actual values of all points, but rather the number of data points, these statistics are usually preferable for skewed distributions. Any outliers do not affect these statistics.
fullscreen
Exercise

Two ketchup companies investigate the weight of their -ounce bottles. Below are the results for both companies.

Analyze the results for both companies and determine which statistics are more suitable.

Show Solution
Solution
Example

Shapes and statistics

The general shape of the first histogram is symmetric, since the bars on opposite sides of the middle bar are roughly the same height. Therefore, Red Queen should use the mean and standard deviation to describe the results of their survey. The results for Red Rising are, however, skewed to the right. Therefore, they should use the median and interquartile range.

Example

The results

Since the histogram for Red Queen is symmetric, we can assume they fill their bottles more evenly than Red Rising. However, the amount of ketchup in each bottle is more for Red Rising. Red Rising should probably consider adopting more consistent practices.

{{ 'mldesktop-placeholder-grade-tab' | message }}
{{ 'mldesktop-placeholder-grade' | message }} {{ article.displayTitle }}!
{{ grade.displayTitle }}
{{ exercise.headTitle }}
{{ 'ml-tooltip-premium-exercise' | message }}
{{ 'ml-tooltip-programming-exercise' | message }} {{ 'course' | message }} {{ exercise.course }}
Test
{{ 'ml-heading-exercise' | message }} {{ focusmode.exercise.exerciseName }}
{{ 'ml-btn-previous-exercise' | message }} arrow_back {{ 'ml-btn-next-exercise' | message }} arrow_forward