{{ 'ml-label-loading-course' | message }}

{{ tocSubheader }}

{{ 'ml-toc-proceed-mlc' | message }}

{{ 'ml-toc-proceed-tbs' | message }}

An error ocurred, try again later!

Chapter {{ article.chapter.number }}

{{ article.number }}. # {{ article.displayTitle }}

{{ article.intro.summary }}

Show less Show more Lesson Settings & Tools

| {{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }} |

| {{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }} |

| {{ 'ml-lesson-time-estimation' | message }} |

Data comes in a variety of ways such as in surveys where observations yield different types of data. If the data can be described using numbers, it is called *numerical data*. If it is better to describe the variables by their characteristics, this is called *categorical data*. This lesson will focus on the later, and how to distinguish the two.
### Catch-Up and Review

**Here are a few recommended readings before getting started with this lesson.**

Challenge

It is Friday afternoon, and Tiffaniqua just finished finals week at school. It is time to relax! As she is leaving school, she thinks of a nice bath with a bath bomb. Just thinking of how the water changes as the bath bomb dissolves already places Tiffaniqua in a state of relaxation!
In general, which is the most popular scent when taking into account all the products sold in the store?

Her aunt happens to own a beauty store, so she goes there to look for the bath bomb. As she is there, Tiffaniqua is amazed by how each individual product comes in different scents. Her aunt knows that Tiffaniqua loves statistics, so she shows her a table with information on how many of their different products were sold in the last month.
According to the table, which scent of bath bomb is the most popular?

{"type":"choice","form":{"alts":["Lavender","Citrus","Vanilla","Rose","Jasmine"],"noSort":true},"formTextBefore":"","formTextAfter":"","answer":0}

{"type":"choice","form":{"alts":["Lavender","Citrus","Vanilla","Rose","Jasmine"],"noSort":true},"formTextBefore":"","formTextAfter":"","answer":2}

Discussion

Categorical data, also called qualitative data, is data that can be split into groups. Categorical data belongs to one or more categories that have a fixed number of possible outcomes or values. Human blood groups are one example of categorical data.

This classification is based on whether certain antigens are present or absent on the surface of red blood cells. A person's blood type can becategorizedinto one of these groups.

Example

The beauty store has plenty to offer, and Tiffaniqua wonders how her aunt manages such a wide variety of products. Auntie has a database where she stores information about each individual product for sale. For example, consider one particular bottle of rose hand lotion.

Auntie stores in her database the information about the type of product (hand lotion), the scent (rose), the content volume ( $10.1$fl. oz.), and the price ($$5$). Which of these variables are categorical?{"type":"multichoice","form":{"alts":["Type of Product","Scent","Content Volume","Price"],"noSort":true},"formTextBefore":"","formTextAfter":"","answer":[0,1]}

What variables are **not** described using numbers?

The store sells different types of different scented products. Consider the given bottle of hand lotion.

There are four variables present in the given picture. These are the type of product, the scent, the content volume, and the price. Notice that the **type of product** is hand lotion

, and the **scent** is rose

. These variables are described using words, rather than numbers. This means that these variables correspond to categorical data.

On the other hand, the content volume is given in fluid ounces, and the price in dollars. Since these variables are described using numbers, they correspond to numerical data.

Discussion

Exploring the proportion of occurrences of a group, value, or set of values in a data set provides valuable insights. This is called the relative frequency, which is given by the ratio of an observed category or value's frequency to the total number of observations in a data set.

Relative frequency $=Number of observationsFrequency $

For example, suppose categorical data is explored in a survey made on a classroom about the favorite color of the students.

Color | Frequency |
---|---|

Blue | $9$ |

Red | $7$ |

Green | $5$ |

Yellow | $4$ |

Purple | $3$ |

Other | $2$ |

Knowing that there are $30$ students in the classroom, the relative frequency of each category can be found by dividing the frequency of each category by $30.$

Color | Frequency | Relative Frequency |
---|---|---|

Blue | $9$ | $309 =0.3$ |

Red | $7$ | $307 =0.23$ |

Green | $5$ | $305 =0.17$ |

Yellow | $4$ | $304 =0.13$ |

Purple | $3$ | $303 =0.1$ |

Other | $2$ | $302 =0.07$ |

Relative frequencies are typically written as percentages.

Color | Frequency | Relative Frequency |
---|---|---|

Blue | $9$ | $30%$ |

Red | $7$ | $23%$ |

Green | $5$ | $17%$ |

Yellow | $4$ | $13%$ |

Purple | $3$ | $10%$ |

Other | $2$ | $7%$ |

Example

Outstandingly, Tiffaniqua's auntie hand makes the candles herself! She buys a huge block of soy wax and uses it to make different scented candles.

To give each candle their characteristic scent, she needs to add essential oils. It is vital that she knows in advance how much of each oil to buy. Otherwise, she ends up with too much. Consider her previous month's sales sheet.

Scent | Number of Candles Sold |
---|---|

Lavender | $15$ |

Citrus | $5$ |

Vanilla | $30$ |

Rose | $20$ |

Jasmine | $10$ |

{"type":"pair","form":{"alts":[[{"id":0,"text":"Lavender"},{"id":1,"text":"Citrus"},{"id":2,"text":"Vanilla"},{"id":3,"text":"Rose"},{"id":4,"text":"Jasmine"}],[{"id":0,"text":"<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.80556em;vertical-align:-0.05556em;\"><\/span><span class=\"mord\">1<\/span><span class=\"mord\">8<\/span><span class=\"mord\">.<\/span><span class=\"mord\">7<\/span><span class=\"mord\">5<\/span><span class=\"mspace\" style=\"margin-right:0.16666666666666666em;\"><\/span><span class=\"mord\">%<\/span><\/span><\/span><\/span>"},{"id":1,"text":"<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.80556em;vertical-align:-0.05556em;\"><\/span><span class=\"mord\">6<\/span><span class=\"mord\">.<\/span><span class=\"mord\">2<\/span><span class=\"mord\">5<\/span><span class=\"mspace\" style=\"margin-right:0.16666666666666666em;\"><\/span><span class=\"mord\">%<\/span><\/span><\/span><\/span>"},{"id":2,"text":"<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.80556em;vertical-align:-0.05556em;\"><\/span><span class=\"mord\">3<\/span><span class=\"mord\">7<\/span><span class=\"mord\">.<\/span><span class=\"mord\">5<\/span><span class=\"mspace\" style=\"margin-right:0.16666666666666666em;\"><\/span><span class=\"mord\">%<\/span><\/span><\/span><\/span>"},{"id":3,"text":"<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.80556em;vertical-align:-0.05556em;\"><\/span><span class=\"mord\">2<\/span><span class=\"mord\">5<\/span><span class=\"mspace\" style=\"margin-right:0.16666666666666666em;\"><\/span><span class=\"mord\">%<\/span><\/span><\/span><\/span>"},{"id":4,"text":"<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.80556em;vertical-align:-0.05556em;\"><\/span><span class=\"mord\">1<\/span><span class=\"mord\">2<\/span><span class=\"mord\">.<\/span><span class=\"mord\">5<\/span><span class=\"mspace\" style=\"margin-right:0.16666666666666666em;\"><\/span><span class=\"mord\">%<\/span><\/span><\/span><\/span>"}]],"lockLeft":true,"lockRight":false},"formTextBefore":"","formTextAfter":"","answer":[[0,1,2,3,4],[0,1,2,3,4]]}

Begin by finding how many candles were sold. Divide each frequency by the total number of candles sold. Write each relative frequency as a percentage.

In order to find the relative frequency of the sales of each different scent of candle, begin by finding how many candles were sold last month. To do so, add the frequencies of each scent.

$15+5+30+20+10=80 $

This means that $80$ candles were sold last month. Next, find the relative frequency by dividing the respective frequency by this total. Since $15$ lavender candles were sold, dividing $15$ by $80$ yields the relative frequency of lavender candles.
$8015 =0.1875 $

Next, this number will be written as a percentage in order to compare it to the others.
$0.1875=18.75% $

Do the same for the rest of the scents. Scent | Number of Candles Sold | Divide | Relative Frequency |
---|---|---|---|

Lavender | $15$ | $8015 $ | $0.1875=18.75%$ |

Citrus | $5$ | $805 $ | $0.0625=6.25%$ |

Vanilla | $30$ | $8030 $ | $0.375=37.5%$ |

Rose | $20$ | $8020 $ | $0.25=25%$ |

Jasmine | $10$ | $8010 $ | $0.125=12.5%$ |

Knowing this, each scent can be paired with its relative frequency.

Scent | Relative Frequency |
---|---|

Lavender | $18.75%$ |

Citrus | $6.25%$ |

Vanilla | $37.5%$ |

Rose | $25%$ |

Jasmine | $12.5%$ |

Tiffaniqua's auntie can now use these relative frequency percentages to prioritize how much of each oil to purchase!

Discussion

A two-way frequency table, also known as a **two-way table**, displays categorical data that can be grouped into two categories. One of the categories is represented in the rows of the table, the other in the columns. For example, the table below shows the results of a survey where $100$ participants were asked if they have a driver's license and if they own a car.

carand

driver's license.Both have possible responses of

yesand

no.The numbers in the table are called joint frequencies. Also, two-way frequency tables often include the total of the rows and columns — these are called marginal frequencies. Select any frequency in the table below to display more information.

The sum of the

Totalrow and the

Totalcolumn, which in this case is $100,$ equals the sum of all joint frequencies. This is called the

Discussion

Organizing data in a two-way frequency table can help with visualization, which in turn makes it easier to analyze and present the data. To draw a two-way frequency table, three steps must be followed.

- Determine the categories.
- Fill the table with the given data.
- Determine if there are any missing frequencies. If so, find those.

Suppose that $53$ people took part in an online survey, where they were asked whether they prefer top hats or berets. Out of the $18$ males that participated, $12$ prefer berets. Also, $15$ of the females chose top hats as their preference. The steps listed above will now be used to analyze and present the data.

1

Determine the Categories

First, the two categories of the table must be determined, after which the table can be drawn without frequencies. Here, the participants gave their hat preference and their gender, which are the two categories. Hat preference can be further divided into top hat and beret, and gender into female and male.

The total row and total column are included to write the marginal frequencies.

2

Fill the Table With Given Data

The given joint and marginal frequencies can now be added to the table.

3

Find Any Missing Frequencies

Using the given frequencies, more information can potentially be found by reasoning. For instance, because $12$ out of the $18$ males prefer berets, the number of males who prefer top hats is equal to the difference between these two values.

$18−12=6 $

Therefore, there are $6$ males who prefer top hats. Since there are $15$ females who prefer top hats, the number of participants who prefer this type of hat is the sum of these two values.
$6+15=21 $

It has been found that $21$ participants prefer top hats. Continuing with this reasoning, the entire table can be completed. Example

The hand lotion sold at auntie's store comes in two sizes: large and small. She actually buys the lotion in bulk. Then, she fills it in their respective bottles, one by one. Auntie needs to restock on vanilla and rose hand lotions, and Tiffaniqua will help!

Out of a total of $25$ vanilla bottles, they filled $18$ small ones. They also filled $12$ large rose bottles. In total, $60$ bottles where filled.

a How many rose bottles in total were filled?

{"type":"text","form":{"type":"math","options":{"comparison":"1","nofractofloat":false,"keypad":{"simple":true,"useShortLog":false,"variables":[],"constants":[]}},"text":"<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><\/span><\/span>"},"formTextBefore":null,"formTextAfter":null,"answer":{"text":["35"]}}

b How many small rose bottles were filled?

{"type":"text","form":{"type":"math","options":{"comparison":"1","nofractofloat":false,"keypad":{"simple":true,"useShortLog":false,"variables":[],"constants":[]}},"text":"<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><\/span><\/span>"},"formTextBefore":null,"formTextAfter":null,"answer":{"text":["23"]}}

c How many large bottles in total were filled?

{"type":"text","form":{"type":"math","options":{"comparison":"1","nofractofloat":false,"keypad":{"simple":true,"useShortLog":false,"variables":[],"constants":[]}},"text":"<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><\/span><\/span>"},"formTextBefore":null,"formTextAfter":null,"answer":{"text":["19"]}}

a Make a two-way frequency table with the given information. Include the known marginal frequency and the grand total. Use the information in the table to find the marginal frequency corresponding to rose bottles.

b Make a two-way frequency table with the given information. Include the known marginal frequency and the grand total. Use the answer from Part A to find the joint frequency corresponding to small rose bottles.

c Make a two-way frequency table with the given information. Include the known marginal frequencies and the grand total. Use the information in the table to find the marginal frequency corresponding to large bottles.

a Begin by making a two way table. There are two variables in this case: the scent, and the bottle size.

Next, add a row and a column to include the marginal frequencies, which correspond to the totals of each individual category.

It is given that out of $25$ vanilla bottles, Tiffaniqua filled $18$ small ones. This means that the marginal frequency of vanilla bottles is $25$ and that the joint frequency of small vanilla bottles is $18.$ Auntie also filled $12$ large rose bottles, which is also a joint frequency. The grand total corresponds to $60$ bottles. Add all this information to the table.

There is now enough information to fill the table. To find how many rose bottles there are note that out of the $60$ bottles, $25$ are vanilla, so subtract $25$ from $60.$$60−25=35 $

There are $35$ rose bottles.
b It was found in Part B that there are $35$ rose bottles, $12$ of which are large. Find the number of small rose bottles by subtracting $12$ from $35.$

$35−12=23 $

This means they filled $23$ small rose bottles.
c Finally, to find how many large bottles were filled begin by noting that there are $18+23=41$ small bottles. This means that the number of large bottles can be found by subtracting $41$ from $60.$

$60−41=19 $

Finally, while not necessary, find how many large vanilla bottles there are by subtracting $12$ from $19.$
$19−12=7 $

With this information, the table can now be completed.
Discussion

In a two-way frequency table, a joint relative frequency is the ratio of a joint frequency to the grand total. Similarly, a marginal relative frequency is the ratio of a marginal frequency to the grand total. Consider the following example of a two-way table.

Here, the grand total is $100.$ The joint and marginal frequencies can now be divided by $100$ to obtain the $joint$ and $marginal$ *relative* frequencies. Clicking in each cell will display its interpretation.

Relative frequencies can also be made either by columns or by rows. In the case it is made by rows, each joint frequency is divided by the marginal frequency of its corresponding row.

On the other hand, if the table is to be made by columns, each joint frequency needs to be divided by the marginal frequency of its respective column.

Note that when a relative frequencies table is made by rows, the values in the columnsExample

After filling up the bottles, auntie was left with some leftover lotion of vanilla and rose. She recalled that there was some leftover wax as well. She decides to give free samples to Tiffaniqua to share with friends. The following two-way table summarizes the items auntie gave to Tiffaniqua.

a What percentage of the samples are rose candles?

{"type":"text","form":{"type":"math","options":{"comparison":"1","nofractofloat":false,"keypad":{"simple":true,"useShortLog":false,"variables":[],"constants":[]}},"text":"<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><\/span><\/span>"},"formTextBefore":null,"formTextAfter":"<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.80556em;vertical-align:-0.05556em;\"><\/span><span class=\"mspace\" style=\"margin-right:0.16666666666666666em;\"><\/span><span class=\"mord\">%<\/span><\/span><\/span><\/span>","answer":{"text":["35"]}}

b What percentage of the samples are vanilla scented?

{"type":"text","form":{"type":"math","options":{"comparison":"1","nofractofloat":false,"keypad":{"simple":true,"useShortLog":false,"variables":[],"constants":[]}},"text":"<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><\/span><\/span>"},"formTextBefore":null,"formTextAfter":"<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.80556em;vertical-align:-0.05556em;\"><\/span><span class=\"mspace\" style=\"margin-right:0.16666666666666666em;\"><\/span><span class=\"mord\">%<\/span><\/span><\/span><\/span>","answer":{"text":["50"]}}

c What percentage of the samples are hand lotion bottles?

{"type":"text","form":{"type":"math","options":{"comparison":"1","nofractofloat":false,"keypad":{"simple":true,"useShortLog":false,"variables":[],"constants":[]}},"text":"<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><\/span><\/span>"},"formTextBefore":null,"formTextAfter":"<span class=\"katex\"><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.80556em;vertical-align:-0.05556em;\"><\/span><span class=\"mspace\" style=\"margin-right:0.16666666666666666em;\"><\/span><span class=\"mord\">%<\/span><\/span><\/span><\/span>","answer":{"text":["45"]}}

a Add all the frequencies to find the grand total. Find the relative frequency of rose candles by dividing the number of rose candles by the grand total.

b Add the frequencies of the vanilla scented products to find the marginal frequency of the vanilla scent. Divide it by the grand total to find its corresponding marginal relative frequency.

c Add the frequencies of the hand lotions to find the marginal frequency of the hand lotion product. Divide it by the grand total to find its corresponding relative marginal frequency.

a Begin by adding all the frequencies in the table to find the grand total. This corresponds to the number of items auntie gave to Tiffaniqua.

$6+3+4+7=20 $

Auntie gave Tiffaniqua a total of $20$ items. To find what percentage of the samples are rose candles divide its corresponding frequency by the grand total. $207 =0.35 $

This corresponds to the joint relative frequency of rose candles. This can now be written as a percentage.
$0.35=35% $

b The percentage of vanilla scented samples corresponds to the marginal relative frequency of the vanilla items. Begin by adding the marginal frequencies to the given two-way table.

$2010 =0.5 $

Do not forget to write it as a percentage.
$0.5=50% $

Tiffaniqua found that $50%$ of the samples have a vanilla scent.
c Consider the two-way table with marginal frequencies obtained in Part B.

$209 =0.45 $

The marginal relative frequency is $0.45,$ or $45%.$ This means that $45%$ of the samples are hand lotion bottles.
Closure

Consider the two-way table given at the start of the lesson.

To determine which scent of bath bomb is the most popular focus on the bath bomb row. Look for which bath bomb scent has the greatest joint frequency.

In order to determine which scent is the most popular in general, the marginal frequencies need to be added to the table.

From here, it can be seen that the vanilla scent has the greatest marginal frequency.

This means that the most popular bath bomb is lavender, but the most popular scent in general is vanilla. Vanilla might be an all-time favorite, but it seems that a lavender bath is the best bet when it comes to relaxation!Loading content