Start chapters home Start History history History expand_more Community
Community expand_more
menu_open Close
{{ filterOption.label }}
{{ item.displayTitle }}
{{ item.subject.displayTitle }}
No results
{{ searchError }}
Expand menu menu_open home
{{ courseTrack.displayTitle }}
{{ statistics.percent }}% Sign in to view progress
{{ }} {{ }}
search Use offline Tools apps
Login account_circle menu_open
close expand

Describing Distributions of Data Sets


Histogram Distribution

Depending on how a data set is distributed, its histogram can take on different shapes. There are two main types.

Symmetric Distribution

If the data is distributed somewhat evenly around the mean, the bars on each side of the middle bar will have roughly the same height. This is called a symmetric distribution.

In a symmetric histogram, the mean is typically found in the tallest bar.

Skewed Distribution

When the data is not evenly distributed around the mean the histogram becomes skewed.

For skewed histograms, the mean is usually not found in the tallest bar.


Choosing Statistics Based on the Distribution

Based on the distribution of a data set, different types of statistics are more suitable than others.


Symmetric distribution

For a symmetric distribution, all values are spread out somewhat evenly around the center. Since both the mean and standard deviation take into account the actual values of all data points, these statistics are usually more suitable when describing this distribution.


Skewed distribution

When a distribution is skewed, there can be outliers. A small number of extreme values unevenly distributed can affect both the mean and standard deviation significantly, leading to a misrepresentation of the data set. Since the median and IQR don't take into account the actual values of all points, but rather the number of data points, these statistics are usually preferable for skewed distributions. Any outliers do not affect these statistics.

Two ketchup companies investigate the weight of their 64-ounce bottles. Below are the results for both companies.

Analyze the results for both companies and determine which statistics are more suitable.

Show Solution


Shapes and statistics

The general shape of the first histogram is symmetric, since the bars on opposite sides of the middle bar are roughly the same height. Therefore, Red Queen should use the mean and standard deviation to describe the results of their survey. The results for Red Rising are, however, skewed to the right. Therefore, they should use the median and interquartile range.


The results

Since the histogram for Red Queen is symmetric, we can assume they fill their bottles more evenly than Red Rising. However, the amount of ketchup in each bottle is more for Red Rising. Red Rising should probably consider adopting more consistent practices.
{{ 'mldesktop-placeholder-grade-tab' | message }}
{{ 'mldesktop-placeholder-grade' | message }} {{ article.displayTitle }}!
{{ grade.displayTitle }}
{{ 'ml-tooltip-premium-exercise' | message }}
{{ 'ml-tooltip-programming-exercise' | message }} {{ 'course' | message }} {{ exercise.course }}
{{ focusmode.exercise.exerciseName }}
{{ 'ml-btn-previous-exercise' | message }} arrow_back {{ 'ml-btn-next-exercise' | message }} arrow_forward
arrow_left arrow_right