| {{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }} |
| {{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }} |
| {{ 'ml-lesson-time-estimation' | message }} |
Here are a few recommended readings before getting started with this lesson.
Categorical data, also called qualitative data, is data that can be split into groups. Categorical data belongs to one or more categories that have a fixed number of possible outcomes or values. Human blood groups are one example of categorical data.
This classification is based on whether certain antigens are present or absent on the surface of red blood cells. A person's blood type can becategorizedinto one of these groups.
The beauty store has plenty to offer, and Tiffaniqua wonders how her aunt manages such a wide variety of products. Auntie has a database where she stores information about each individual product for sale. For example, consider one particular bottle of rose hand lotion.
Auntie stores in her database the information about the type of product (hand lotion), the scent (rose), the content volume ( 10.1fl. oz.), and the price ($5). Which of these variables are categorical?What variables are not described using numbers?
The store sells different types of different scented products. Consider the given bottle of hand lotion.
There are four variables present in the given picture. These are the type of product, the scent, the content volume, and the price. Notice that the type of product is hand lotion
, and the scent is rose
. These variables are described using words, rather than numbers. This means that these variables correspond to categorical data.
On the other hand, the content volume is given in fluid ounces, and the price in dollars. Since these variables are described using numbers, they correspond to numerical data.
Exploring the proportion of occurrences of a group, value, or set of values in a data set provides valuable insights. This is called the relative frequency, which is given by the ratio of an observed category or value's frequency to the total number of observations in a data set.
Relative frequency =Number of observationsFrequency
For example, suppose categorical data is explored in a survey made on a classroom about the favorite color of the students.
Color | Frequency |
---|---|
Blue | 9 |
Red | 7 |
Green | 5 |
Yellow | 4 |
Purple | 3 |
Other | 2 |
Knowing that there are 30 students in the classroom, the relative frequency of each category can be found by dividing the frequency of each category by 30.
Color | Frequency | Relative Frequency |
---|---|---|
Blue | 9 | 309=0.3 |
Red | 7 | 307=0.23 |
Green | 5 | 305=0.17 |
Yellow | 4 | 304=0.13 |
Purple | 3 | 303=0.1 |
Other | 2 | 302=0.07 |
Relative frequencies are typically written as percentages.
Color | Frequency | Relative Frequency |
---|---|---|
Blue | 9 | 30% |
Red | 7 | 23% |
Green | 5 | 17% |
Yellow | 4 | 13% |
Purple | 3 | 10% |
Other | 2 | 7% |
Outstandingly, Tiffaniqua's auntie hand makes the candles herself! She buys a huge block of soy wax and uses it to make different scented candles.
To give each candle their characteristic scent, she needs to add essential oils. It is vital that she knows in advance how much of each oil to buy. Otherwise, she ends up with too much. Consider her previous month's sales sheet.
Scent | Number of Candles Sold |
---|---|
Lavender | 15 |
Citrus | 5 |
Vanilla | 30 |
Rose | 20 |
Jasmine | 10 |
Begin by finding how many candles were sold. Divide each frequency by the total number of candles sold. Write each relative frequency as a percentage.
Scent | Number of Candles Sold | Divide | Relative Frequency |
---|---|---|---|
Lavender | 15 | 8015 | 0.1875=18.75% |
Citrus | 5 | 805 | 0.0625=6.25% |
Vanilla | 30 | 8030 | 0.375=37.5% |
Rose | 20 | 8020 | 0.25=25% |
Jasmine | 10 | 8010 | 0.125=12.5% |
Knowing this, each scent can be paired with its relative frequency.
Scent | Relative Frequency |
---|---|
Lavender | 18.75% |
Citrus | 6.25% |
Vanilla | 37.5% |
Rose | 25% |
Jasmine | 12.5% |
Tiffaniqua's auntie can now use these relative frequency percentages to prioritize how much of each oil to purchase!
A two-way frequency table, also known as a two-way table, displays categorical data that can be grouped into two categories. One of the categories is represented in the rows of the table, the other in the columns. For example, the table below shows the results of a survey where 100 participants were asked if they have a driver's license and if they own a car.
Here, the two categories arecarand
driver's license.Both have possible responses of
yesand
no.The numbers in the table are called joint frequencies. Also, two-way frequency tables often include the total of the rows and columns — these are called marginal frequencies. Select any frequency in the table below to display more information.
Totalrow and the
Totalcolumn, which in this case is 100, equals the sum of all joint frequencies. This is called the grand total. A joint frequency of 43 shows that 43 people have a driver's license and own a car. A marginal frequency of 53 shows that 53 people do not have a car. The rest of the numbers from the table can also be interpreted.
Organizing data in a two-way frequency table can help with visualization, which in turn makes it easier to analyze and present the data. To draw a two-way frequency table, three steps must be followed.
Suppose that 53 people took part in an online survey, where they were asked whether they prefer top hats or berets. Out of the 18 males that participated, 12 prefer berets. Also, 15 of the females chose top hats as their preference. The steps listed above will now be used to analyze and present the data.
First, the two categories of the table must be determined, after which the table can be drawn without frequencies. Here, the participants gave their hat preference and their gender, which are the two categories. Hat preference can be further divided into top hat and beret, and gender into female and male.
The total row and total column are included to write the marginal frequencies.
The given joint and marginal frequencies can now be added to the table.
The hand lotion sold at auntie's store comes in two sizes: large and small. She actually buys the lotion in bulk. Then, she fills it in their respective bottles, one by one. Auntie needs to restock on vanilla and rose hand lotions, and Tiffaniqua will help!
Out of a total of 25 vanilla bottles, they filled 18 small ones. They also filled 12 large rose bottles. In total, 60 bottles where filled.
Next, add a row and a column to include the marginal frequencies, which correspond to the totals of each individual category.
It is given that out of 25 vanilla bottles, Tiffaniqua filled 18 small ones. This means that the marginal frequency of vanilla bottles is 25 and that the joint frequency of small vanilla bottles is 18. Auntie also filled 12 large rose bottles, which is also a joint frequency. The grand total corresponds to 60 bottles. Add all this information to the table.
There is now enough information to fill the table. To find how many rose bottles there are note that out of the 60 bottles, 25 are vanilla, so subtract 25 from 60.In a two-way frequency table, a joint relative frequency is the ratio of a joint frequency to the grand total. Similarly, a marginal relative frequency is the ratio of a marginal frequency to the grand total. Consider the following example of a two-way table.
Here, the grand total is 100. The joint and marginal frequencies can now be divided by 100 to obtain the joint and marginal relative frequencies. Clicking in each cell will display its interpretation.
Relative frequencies can also be made either by columns or by rows. In the case it is made by rows, each joint frequency is divided by the marginal frequency of its corresponding row.
On the other hand, if the table is to be made by columns, each joint frequency needs to be divided by the marginal frequency of its respective column.
Note that when a relative frequencies table is made by rows, the values in the columns do not add up to 100%. Similarly, the values in the rows of a relative frequencies table made by columns do not add up to 100%.After filling up the bottles, auntie was left with some leftover lotion of vanilla and rose. She recalled that there was some leftover wax as well. She decides to give free samples to Tiffaniqua to share with friends. The following two-way table summarizes the items auntie gave to Tiffaniqua.
Consider the two-way table given at the start of the lesson.
To determine which scent of bath bomb is the most popular focus on the bath bomb row. Look for which bath bomb scent has the greatest joint frequency.
In order to determine which scent is the most popular in general, the marginal frequencies need to be added to the table.
From here, it can be seen that the vanilla scent has the greatest marginal frequency.
This means that the most popular bath bomb is lavender, but the most popular scent in general is vanilla. Vanilla might be an all-time favorite, but it seems that a lavender bath is the best bet when it comes to relaxation!