{{ 'ml-label-loading-course' | message }}
{{ toc.name }}
{{ toc.signature }}
{{ tocHeader }} {{ 'ml-btn-view-details' | message }}
{{ tocSubheader }}
{{ 'ml-toc-proceed-mlc' | message }}
{{ 'ml-toc-proceed-tbs' | message }}
Lesson
Exercises
Recommended
Tests
An error ocurred, try again later!
Chapter {{ article.chapter.number }}
{{ article.number }}. 

{{ article.displayTitle }}

{{ article.intro.summary }}
Show less Show more expand_more
{{ ability.description }} {{ ability.displayTitle }}
Lesson Settings & Tools
{{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }}
{{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }}
{{ 'ml-lesson-time-estimation' | message }}


Data comes in a variety of ways such as in surveys where observations yield different types of data. If the data can be described using numbers, it is called numerical data. If it is better to describe the variables by their characteristics, this is called categorical data. This lesson will focus on the later, and how to distinguish the two.

Catch-Up and Review

Here are a few recommended readings before getting started with this lesson.

Challenge

A Cozy Beauty Store

It is Friday afternoon, and Tiffaniqua just finished finals week at school. It is time to relax! As she is leaving school, she thinks of a nice bath with a bath bomb. Just thinking of how the water changes as the bath bomb dissolves already places Tiffaniqua in a state of relaxation!
Her aunt happens to own a beauty store, so she goes there to look for the bath bomb. As she is there, Tiffaniqua is amazed by how each individual product comes in different scents. Her aunt knows that Tiffaniqua loves statistics, so she shows her a table with information on how many of their different products were sold in the last month.
According to the table, which scent of bath bomb is the most popular?
In general, which is the most popular scent when taking into account all the products sold in the store?
Discussion

Categorical Data

Categorical data, also called qualitative data, is data that can be split into groups. Categorical data belongs to one or more categories that have a fixed number of possible outcomes or values. Human blood groups are one example of categorical data.

This classification is based on whether certain antigens are present or absent on the surface of red blood cells. A person's blood type can be categorized into one of these groups.
Example

Is it Categorical or Numerical?

The beauty store has plenty to offer, and Tiffaniqua wonders how her aunt manages such a wide variety of products. Auntie has a database where she stores information about each individual product for sale. For example, consider one particular bottle of rose hand lotion.

Auntie stores in her database the information about the type of product (hand lotion), the scent (rose), the content volume ( fl. oz.), and the price (). Which of these variables are categorical?

Hint

What variables are not described using numbers?

Solution

The store sells different types of different scented products. Consider the given bottle of hand lotion.

There are four variables present in the given picture. These are the type of product, the scent, the content volume, and the price. Notice that the type of product is hand lotion, and the scent is rose. These variables are described using words, rather than numbers. This means that these variables correspond to categorical data.

On the other hand, the content volume is given in fluid ounces, and the price in dollars. Since these variables are described using numbers, they correspond to numerical data.

Discussion

Relative Frequency

Exploring the proportion of occurrences of a group, value, or set of values in a data set provides valuable insights. This is called the relative frequency, which is given by the ratio of an observed category or value's frequency to the total number of observations in a data set.

Relative frequency

For example, suppose categorical data is explored in a survey made on a classroom about the favorite color of the students.

Color Frequency
Blue
Red
Green
Yellow
Purple
Other

Knowing that there are students in the classroom, the relative frequency of each category can be found by dividing the frequency of each category by

Color Frequency Relative Frequency
Blue
Red
Green
Yellow
Purple
Other

Relative frequencies are typically written as percentages.

Color Frequency Relative Frequency
Blue
Red
Green
Yellow
Purple
Other
From the table it can be seen that blue is the most popular color, as it has the highest relative frequency. Notice that the relative frequencies add up to
Example

Relaxing Artisan Candles

Outstandingly, Tiffaniqua's auntie hand makes the candles herself! She buys a huge block of soy wax and uses it to make different scented candles.

To give each candle their characteristic scent, she needs to add essential oils. It is vital that she knows in advance how much of each oil to buy. Otherwise, she ends up with too much. Consider her previous month's sales sheet.

Scent Number of Candles Sold
Lavender
Citrus
Vanilla
Rose
Jasmine
Match each scent with its corresponding relative frequency.

Hint

Begin by finding how many candles were sold. Divide each frequency by the total number of candles sold. Write each relative frequency as a percentage.

Solution

In order to find the relative frequency of the sales of each different scent of candle, begin by finding how many candles were sold last month. To do so, add the frequencies of each scent.
This means that candles were sold last month. Next, find the relative frequency by dividing the respective frequency by this total. Since lavender candles were sold, dividing by yields the relative frequency of lavender candles.
Next, this number will be written as a percentage in order to compare it to the others.
Do the same for the rest of the scents.
Scent Number of Candles Sold Divide Relative Frequency
Lavender
Citrus
Vanilla
Rose
Jasmine

Knowing this, each scent can be paired with its relative frequency.

Scent Relative Frequency
Lavender
Citrus
Vanilla
Rose
Jasmine

Tiffaniqua's auntie can now use these relative frequency percentages to prioritize how much of each oil to purchase!

Discussion

Two-Way Frequency Table

A two-way frequency table, also known as a two-way table, displays categorical data that can be grouped into two categories. One of the categories is represented in the rows of the table, the other in the columns. For example, the table below shows the results of a survey where participants were asked if they have a driver's license and if they own a car.

Two-way table
Here, the two categories are car and driver's license. Both have possible responses of yes and no. The numbers in the table are called joint frequencies. Also, two-way frequency tables often include the total of the rows and columns — these are called marginal frequencies. Select any frequency in the table below to display more information.
two-way table
The sum of the Total row and the Total column, which in this case is equals the sum of all joint frequencies. This is called the grand total. A joint frequency of shows that people have a driver's license and own a car. A marginal frequency of shows that people do not have a car. The rest of the numbers from the table can also be interpreted.
Discussion

Drawing a Two-Way Frequency Table

Organizing data in a two-way frequency table can help with visualization, which in turn makes it easier to analyze and present the data. To draw a two-way frequency table, three steps must be followed.

  1. Determine the categories.
  2. Fill the table with the given data.
  3. Determine if there are any missing frequencies. If so, find those.

Suppose that people took part in an online survey, where they were asked whether they prefer top hats or berets. Out of the males that participated, prefer berets. Also, of the females chose top hats as their preference. The steps listed above will now be used to analyze and present the data.

1
Determine the Categories
expand_more

First, the two categories of the table must be determined, after which the table can be drawn without frequencies. Here, the participants gave their hat preference and their gender, which are the two categories. Hat preference can be further divided into top hat and beret, and gender into female and male.

two-way table

The total row and total column are included to write the marginal frequencies.

2
Fill the Table With Given Data
expand_more

The given joint and marginal frequencies can now be added to the table.

two-way table
3
Find Any Missing Frequencies
expand_more
Using the given frequencies, more information can potentially be found by reasoning. For instance, because out of the males prefer berets, the number of males who prefer top hats is equal to the difference between these two values.
Therefore, there are males who prefer top hats. Since there are females who prefer top hats, the number of participants who prefer this type of hat is the sum of these two values.
It has been found that participants prefer top hats. Continuing with this reasoning, the entire table can be completed.
two-way table
Example

Filling the Hand Lotion Bottles

The hand lotion sold at auntie's store comes in two sizes: large and small. She actually buys the lotion in bulk. Then, she fills it in their respective bottles, one by one. Auntie needs to restock on vanilla and rose hand lotions, and Tiffaniqua will help!

Out of a total of vanilla bottles, they filled small ones. They also filled large rose bottles. In total, bottles where filled.

a How many rose bottles in total were filled?
b How many small rose bottles were filled?
c How many large bottles in total were filled?

Hint

a Make a two-way frequency table with the given information. Include the known marginal frequency and the grand total. Use the information in the table to find the marginal frequency corresponding to rose bottles.
b Make a two-way frequency table with the given information. Include the known marginal frequency and the grand total. Use the answer from Part A to find the joint frequency corresponding to small rose bottles.
c Make a two-way frequency table with the given information. Include the known marginal frequencies and the grand total. Use the information in the table to find the marginal frequency corresponding to large bottles.

Solution

a Begin by making a two way table. There are two variables in this case: the scent, and the bottle size.

Next, add a row and a column to include the marginal frequencies, which correspond to the totals of each individual category.

It is given that out of vanilla bottles, Tiffaniqua filled small ones. This means that the marginal frequency of vanilla bottles is and that the joint frequency of small vanilla bottles is Auntie also filled large rose bottles, which is also a joint frequency. The grand total corresponds to bottles. Add all this information to the table.

There is now enough information to fill the table. To find how many rose bottles there are note that out of the bottles, are vanilla, so subtract from
There are rose bottles.
b It was found in Part B that there are rose bottles, of which are large. Find the number of small rose bottles by subtracting from
This means they filled small rose bottles.
c Finally, to find how many large bottles were filled begin by noting that there are small bottles. This means that the number of large bottles can be found by subtracting from
Finally, while not necessary, find how many large vanilla bottles there are by subtracting from
With this information, the table can now be completed.
Discussion

Joint and Marginal Relative Frequencies

In a two-way frequency table, a joint relative frequency is the ratio of a joint frequency to the grand total. Similarly, a marginal relative frequency is the ratio of a marginal frequency to the grand total. Consider the following example of a two-way table.

two-way table

Here, the grand total is The joint and marginal frequencies can now be divided by to obtain the and relative frequencies. Clicking in each cell will display its interpretation.

two-way table

Relative frequencies can also be made either by columns or by rows. In the case it is made by rows, each joint frequency is divided by the marginal frequency of its corresponding row.

On the other hand, if the table is to be made by columns, each joint frequency needs to be divided by the marginal frequency of its respective column.

Note that when a relative frequencies table is made by rows, the values in the columns do not add up to Similarly, the values in the rows of a relative frequencies table made by columns do not add up to
Example

Advertising with Free Samples

After filling up the bottles, auntie was left with some leftover lotion of vanilla and rose. She recalled that there was some leftover wax as well. She decides to give free samples to Tiffaniqua to share with friends. The following two-way table summarizes the items auntie gave to Tiffaniqua.

a What percentage of the samples are rose candles?
b What percentage of the samples are vanilla scented?
c What percentage of the samples are hand lotion bottles?

Hint

a Add all the frequencies to find the grand total. Find the relative frequency of rose candles by dividing the number of rose candles by the grand total.
b Add the frequencies of the vanilla scented products to find the marginal frequency of the vanilla scent. Divide it by the grand total to find its corresponding marginal relative frequency.
c Add the frequencies of the hand lotions to find the marginal frequency of the hand lotion product. Divide it by the grand total to find its corresponding relative marginal frequency.

Solution

a Begin by adding all the frequencies in the table to find the grand total. This corresponds to the number of items auntie gave to Tiffaniqua.
Auntie gave Tiffaniqua a total of items. To find what percentage of the samples are rose candles divide its corresponding frequency by the grand total.
This corresponds to the joint relative frequency of rose candles. This can now be written as a percentage.
b The percentage of vanilla scented samples corresponds to the marginal relative frequency of the vanilla items. Begin by adding the marginal frequencies to the given two-way table.
A total of vanilla samples were given to Tiffaniqua. Find the relative marginal frequency by dividing this number by the grand total.
Do not forget to write it as a percentage.
Tiffaniqua found that of the samples have a vanilla scent.
c Consider the two-way table with marginal frequencies obtained in Part B.
The marginal frequency corresponding to hand lotion bottles is Find the marginal relative frequency by dividing this number by the grand total.
The marginal relative frequency is or This means that of the samples are hand lotion bottles.
Closure

The Most Relaxing Scent

Consider the two-way table given at the start of the lesson.

To determine which scent of bath bomb is the most popular focus on the bath bomb row. Look for which bath bomb scent has the greatest joint frequency.

In order to determine which scent is the most popular in general, the marginal frequencies need to be added to the table.

From here, it can be seen that the vanilla scent has the greatest marginal frequency.

This means that the most popular bath bomb is lavender, but the most popular scent in general is vanilla. Vanilla might be an all-time favorite, but it seems that a lavender bath is the best bet when it comes to relaxation!
Loading content