Insights into Comparing Data Sets: Histograms and Standard Deviation

	Range	Standard Deviation
City A	55-36=19	6.7
City B	66-18=48	16.4

Range

Standard Deviation

City A

55-36=19

6.7

City B

66-18=48

16.4

	Low	Mean	High	Standard Deviation
APDN	$2.52	$6.89	$15.21	$2.24
DSS	$4.04	$6.90	$10.89	$1.69

Low

Mean

High

Standard Deviation

APDN

$2.52

$6.89

$15.21

$2.24

DSS

$4.04

$6.90

$10.89

$1.69

	Mean Length to Height Ratio
Abramis Bjorkna	2.55
Leuciscus Rutilus	3.75
Osmerus Eperlanus	5.95
Esox Lucius	6.33

Mean Length to Height Ratio

Abramis Bjorkna

2.55

Leuciscus Rutilus

3.75

Osmerus Eperlanus

5.95

Esox Lucius

6.33

	Length to Height Ratio (Images)
Pasuri	361/120≈ 3.01
Särki	393/106≈ 3.71
Hauki	358/59≈ 6.07
Norssi	358/55≈ 6.51

Length to Height Ratio (Images)

Pasuri

361/120≈ 3.01

Särki

393/106≈ 3.71

Hauki

358/59≈ 6.07

Norssi

358/55≈ 6.51

	Latin Name (Data Set)	Finnish Name (Images)
Longer Fishes	Osmerus eperlanus and Esox lucius	Hauki and Norssi
Taller Fishes	Abramis bjorkna and Leuciscus rutilus	Pasuri and Särki

Latin Name (Data Set)

Finnish Name (Images)

Longer Fishes

Osmerus eperlanus and Esox lucius

Hauki and Norssi

Taller Fishes

Abramis bjorkna and Leuciscus rutilus

Pasuri and Särki

Latin Name	Finnish Name	English Name
Abramis Bjorkna	Pasuri	Bream
Leuciscus Rutilus	Särki	Roach
Osmerus Eperlanus	Norssi	Smelt
Esox Lucius	Hauki	Pike

Latin Name

Finnish Name

English Name

Abramis Bjorkna

Pasuri

Bream

Leuciscus Rutilus

Särki

Roach

Osmerus Eperlanus

Norssi

Smelt

Esox Lucius

Hauki

Pike

A manager can choose between two machines producing gadgets. To help the manager decide which to choose, each machine has run for 30 days producing gadgets. The following box plot represents the data of what was produced.

In general, which machine produces the most gadgets per day?

Which machine has the greatest IQR?

Which machine has the smallest standard deviation?

If we look at the box plots, we can see that Machine 2 has a median that is above the median of Machine 1.

This means that, in general, Machine 2 produced more gadgets per day. Be aware that it does not mean that Machine 2 produces more gadgets every single day. The median just allows comparing the estimated numbers for the whole 30-day period.

The interquartile range (IQR) is the difference between Q_3 and Q_1.

From the diagram, we see that the interquartile range is greater for Machine 2.

Standard deviation measures how spread out the observations are from the mean. Observing the given diagram, we see that the box plot for Machine 1 is more compact. However, this does not necessarily mean that the standard deviation is smaller. We would have to know the individual observations to determine the standard deviation. Therefore, the answer cannot be determined. Which machine has the smallest [-0.1cm] standard deviation? [0.15cm] Cannot be determined.

The box plots show how well two equally sized classes did on an English test with a maximum score of 15 points.

Which of the following statements are true? A. & Class2 did better on the test. B. & Class1 has the smaller standard deviation. C. & Class2 has the greater range. D. & Class1 has the greater median.

Let's go through the statements one at the time.

Statement A

Examining the box plot, we see that Class 2 has one person who scored 13 out of 15 points, which was the best score out of both classes. However, this does not mean that Class 2 did better overall. To determine this, we would have to compare the mean score, because it takes all observations into account. Therefore, we cannot be certain that this statement is true.

Statement B

The standard deviation tells us how spread out the observations are from the mean. If we look at the box plot, we see that Class 1 is more compact. However, this does not necessarily mean that the standard deviation is smaller. We would have to know the individual observations to determine the standard deviation. Therefore, we cannot say whether this statement is true or not.

Consider the following example scores. |c|c| Class1 & Class2 2,2,2,2,2,2 & 0,2,2,2,2,2,2 5,5,5,5,5,5,5 & 3,3,3,3,3,3 6,7,7,7,7,7,7,7 & 3,3,3,3,3,3,3 10,10,10,10,10,10 & 7,7,7,7,7,7,13 These are possible scores for the given box plots. Here the standard deviation of Class 1, which is is about 2.76, is greater than the standard deviation of Class 2, which is about 2.64. Even though the data for Class 1 is more compact, the standard deviation is greater for Class 1.

Statement C

The range is the difference between the maximum value and the minimum value. From the diagram, we see that Class 2 has a lower minimum and a greater maximum compared to Class 1. Therefore, it is true that Class 2 has the greater range.

Statement D

The median is the vertical bar inside the box of a box plot.

As we can see, Class 1 has the greater median. Therefore, this statement is true.

During the last year Diego and Emily have been running 10 kilometer races every week. They both applied for the track and field team this year. The school coach asks the assistant coach to present the runners' results with two box plots.

Two box plots showing the running time of Emily and Diego

Based on the box plots, which runner is likely the best choice for the track and field team?

Practice makes perfect. Over time, as with everything, we tend to get better if we practice. Examining the box plots, it would appear that Diego is doing better overall as he is a more consistent runner. However, Emily has a personal best of 41 minutes while Diego's personal best is 42 minutes.

Very likely, they achieved these personal bests during their most recent runs. Therefore, Emily is likely the best candidate for the track and field team.

In a factory with 2000 employees, management would like to lower production time. The mean today is 37 minutes per unit. To lower the time, management would like to improve working conditions for their employees.

Workers asking for either more breaks or shorter working days

Half of the employees which work morning shifts are given extra breaks during the day. The other half, who works the day shift, gets to go home earlier. Two months later, management measures the average production time again for the two teams during twelve consecutive days. |c|c| Shift & Minutes/unit & 28, 33, 28, 30 Morning & 27, 29, 28, 33 & 26, 28, 29, 30 & 21, 29, 24, 31 Day & 25, 30, 26, 22 & 25, 27, 22, 23

What is the mean of the team who had the best efficiency increase? Round the answer to one decimal place.

What is the standard deviation of the team that had the most consistent results? Round the answer to one decimal place.

What seems to be the best strategy? Give people more breaks or shorten the working day?

Let's calculate the mean production time for each shift.

Morning Shift

To find the mean using our graphing calculator, we must first enter the values into lists. To do this, we press STAT and choose Edit. Then, we enter the values in the first two columns.

Next, we press STAT and scroll right until we reach CALC. There we choose the first option, 1-Var Stats. By default, the calculator will use List 1 when calculating these statistics, so we just have to press ENTER until we get a result.

The new mean for the morning shift is about 29.1 minutes per unit.

Day Shift

To calculate the mean for the day shift, remember that we already entered the data into List 2. Press STAT and choose 1-Var Stats. Having chosen 1-Var Stats, let's switch from L_1 to L_2 by pressing 2nd and 2. Then, we continue pushing ENTER until we get a result.

The day shift team has a mean of 25.4 minutes per unit.

Conclusion

As we can see, both strategies managed to lower the mean production time. However, shortening the working day produced the lowest mean of minutes per unit. This means the day shift had the best efficiency increase to 25.4 minutes per minute. What is the mean of the team who had [-0.1cm] the best efficiency increase? 25.4 minutes/unit

Based on the context, the smaller the standard deviation, the more consistent the production times. Let's compare the standard deviations of both shifts. Notice that we already found the standard deviation in Part A — it is denoted as σ x. Let's show those summaries side by side and mark the standard deviations.

As we can see, the morning shift has the smallest standard deviation. Therefore, they had more consistent results. What is the standard deviation of the team [-0.1cm] that had the most consistent results? 2.1 minutes

Notice that the standard deviation describes the spread of the values around the mean. However, it is the mean that tells us how efficient the team is on average.

	Mean	Standard Deviation	Strategy
Morning Shift	29.1	2.1	Give additional breaks
Day Shift	25.4	3.1	Shorten the working day

Therefore, management would get better results if they let their employees go home earlier.

Diego loves a particular type of candy. However, some bags he finishes very fast and others take longer. He wonders why this is. Each bag is supposed to contain 40 pieces of candy and he does not eat very fast.

After buying fifteen bags, Diego counted the number of candies they each contained and obtained the following data. 38, 45, 31, 43, 53 32, 39, 37, 34, 45 38, 42, 45, 33, 41 Calculate the standard deviation for the number of candies in the fifteen bags. Round to one decimal.

On the company's website, Diego complained about this and the company promised to get better. Half a year later, Diego bought fifteen new bags and counted the number of candies. 40, 40, 42, 42, 41 41, 39, 42, 44, 40 39, 42, 41, 40, 41 What is the new standard deviation? Round to one decimal.

Based on Diego's surveys, has the company improved?

To find the standard deviation we will use a graphing calculator. First, we enter the values into a list. To do this we press STAT, choose Edit, and enter the values in the first column.

Next, we press STAT and choose CALC. There we select the first option, 1-Var Stats. The calculator will automatically use List 1 when calculating the statistics. Finally, we press ENTER until we get a result.

The standard deviation is about 5.8. This is why there is such big difference in the number of candies per bag.

To calculate the standard deviation for the 15 new bags, we begin by entering these values in the first column.

Finally, we go to STAT and choose CALC. There we select the first option, 1-Var Stats, and we get the standard deviation.

The new standard deviation is about 1.3.

Since the standard deviation of the second data set is much lower, the company is now more consistent and has improved.

Comparing Data Sets

Catch-Up and Review

Analyzing Data Sets of Finnish Fish Species

Pairing Teams to Sports Using Median Heights

Hint

Solution

Extra

Analyzing a Data Set to Determine Geographical Locations

Hint

Solution

Comparing Means in Histograms

Mean Absolute Deviation and Standard Deviation

Standard Deviation

Extra

Using Measures of Spread in Geography

Hint

Solution

Stock Price Volatility

Hint

Solution

Extra

Visually Inspecting the Standard Deviation in Histograms

Analyzing Histograms by Their Shapes

Hint

Solution

Applying Various Methods to Analyze Data

Statement A

Statement B

Statement C

Statement D

Morning Shift

Day Shift

Conclusion

Comparing Data Sets

Recommended exercises

	11 Theory slides
	9 Exercises - Grade E - A
	Each lesson is meant to take 1-2 classroom sessions