A frequency table is a type of table that is used to present categorical data from some kind of measurement. It lists the possible values or outcomes of the category, and how many times each value or outcome has been measured. The results of asking a group of ten people if they prefer cats, dogs, quokkas, or turtles can be presented as follows.
Preference | Frequency |
---|---|
Cats | $4$ |
Dogs | $4$ |
Quokkas | $1$ |
Turtles | $1$ |
For a school project, Petrus asked some of his classmates how many pets they've had in total in their life. Unsorted, his results were $3,5,0,2,0,7,1,1,0,0,0,12,4and3.$ Dividing his data into the groups $"0,"$ $"1−2,"$ $"3−5,"$ and $"6+,"$ how should his frequency table look?
To begin, we can set up the table. The right column should show frequency, so the left column is "number of pets." Let's add the groups to the left column.
Number of pets | Frequency |
---|---|
$0$ | |
$1−2$ | |
$3−5$ | |
$6+$ |
We can now look at the data and count how many times an answer was given in each group. For instance, five of the classmates answered $"0,"$ and three gave an answer of $1$ or $2.$ Filling in the entire table like this gives us the desired frequency table.
Number of pets | Frequency |
---|---|
$0$ | $5$ |
$1−2$ | $3$ |
$3−5$ | $4$ |
$6+$ | $2$ |
When categorical data belongs to two categories, such as if people are asked whether they own a car and whether they have a driver's license, it can be presented in a two-way frequency table. One of the categories is represented by the rows of the table, and the other by the columns. The above survey, with $100$ participants, could result in the following answers.
Driver's license | |||
Yes | No | ||
Car | Yes | $43$ | $4$ |
No | $24$ | $29$ |
The two categories are then "car" and "driver's license," both with the possible answers "yes" and "no." The entries in the table are called joint frequencies. Often, two-way frequency tables include the total of the rows and columns. These totals are called marginal frequencies. The sum of the "total" row and "total" column are each equal to the sum of all joint frequencies, $100$ in this case.
Driver's license | ||||
Yes | No | Total | ||
Car | Yes | $43$ | $4$ | $47$ |
No | $24$ | $29$ | $53$ | |
Total | $67$ | $33$ | $100$ |
Organizing data in a two-way frequency table can help with visualization, which in turn makes it easier to analyze and present the data. Consider the following survey.
$53$ people took part in an online survey, where they got to choose their preferred hat, top hat or beret. Out of the $18$ males that participated, twelve of them prefer a beret. Fifteen of the females chose top hat as their preference.
First, determine the two categories of the table and draw it without frequencies. Here, the participants gave their hat preference and their gender, which are then the two categories. Hat preference can be further divided into top hat and beret, and gender into female and male. This gives the following table.
Hat preference | ||||
Top hat | Beret | Total | ||
Gender | Male | |||
Female | ||||
Total |
The "total" row and columns are included to make room for the marginal frequencies.
The given joint and marginal frequencies can now be added to the table.
Hat preference | ||||
Top hat | Beret | Total | ||
Gender | Male | $12$ | $18$ | |
Female | $15$ | |||
Total | $53$ |
Using the given frequencies, more information can potentially be found by reasoning. For instance, $12$ out of the $18$ males prefers berets, which means that $18−12=6$
males prefer top hats. Thus, there are $6$ males and $15$ females who prefer top hats, making a total of
$6+15=21$
participants that prefer top hats. Continuing this reasoning, the entire table can be completed.
Hat preference | ||||
Top hat | Beret | Total | ||
Gender | Male | $6$ | $12$ | $18$ |
Female | $15$ | $20$ | $35$ | |
Total | $21$ | $32$ | $53$ |
A joint relative frequency is the ratio of a joint frequency and the total number of values or observations. Similarly, a marginal relative frequency is the ratio of a marginal frequency and the total. For the example above, the joint and marginal relative frequencies are found by dividing the frequencies by $53,$ the number of participants.
Hat preference | ||||
Top hat | Beret | Total | ||
Gender | Male | $536 ≈0.11$ | $5312 ≈0.23$ | $5318 ≈0.34$ |
Female | $5315 ≈0.28$ | $5320 ≈0.38$ | $5335 ≈0.66$ | |
Total | $5321 ≈0.40$ | $5332 ≈0.60$ | $1$ |
A conditional relative frequency is the ratio of a joint frequency and either of its corresponding two marginal frequencies. Alternatively, it can be calculated using relative joint and marginal frequencies. As an example, the following data will be used.
Driver's license | ||||
Yes | No | Total | ||
Car | Yes | $43$ | $4$ | $47$ |
No | $24$ | $29$ | $53$ | |
Total | $67$ | $33$ | $100$ |
Using the column totals, the left column of joint frequencies should be divided by $67,$ and the right column by $33.$ Since the column totals are used, the sum of the conditional relative frequencies will be $1.$
Driver's license | |||
Yes | No | ||
Car | Yes | $6743 ≈0.64$ | $334 ≈0.12$ |
No | $6724 ≈0.36$ | $3329 ≈0.88$ |
Continuing on the example above, it can be seen that among people with a driver's license, having a car is common, and among those without a license, owning a car is uncommon. Thus, it can be reasoned that there is an association between having a driver's license and owning a car. Finding the conditional relative frequencies using the row totals instead, gives a slightly different result.
Driver's license | |||
Yes | No | ||
Car | Yes | $4743 ≈0.91$ | $474 ≈0.09$ |
No | $5324 ≈0.45$ | $5329 ≈0.55$ |
Here, it is shown that among car owners, almost everyone has a driver's license, but among those without a car, roughly half have a driver's license. This isn't as obvious, but it shows a tendency of relating car ownership with having a driver's license, which further confirms the association. In some cases, it is obvious that answers in one category might be the result of the other category, such as in the following example.
Bed time | |||
Before 9.30 a.m. | After 9.30 a.m. | ||
Age | 10-12 | $17$ | $4$ |
13-15 | $23$ | $27$ | |
16-18 | $2$ | $17$ |
A person's bed time might be dependent on their age, but their age is not dependent on their bed time. Because of this, it is recommended to use the age totals when finding the conditional relative frequencies. This gives the distribution of bed time given a certain age span, which will clearly show any association.
Bed time | |||
Before 9.30 a.m. | After 9.30 a.m. | ||
Age | 10-12 | $2117 ≈0.81$ | $214 ≈0.19$ |
13-15 | $5023 =0.46$ | $5027 =0.54$ | |
16-18 | $192 ≈0.11$ | $1917 ≈0.89$ |
Eugenia is passionate about two things in particular, hot air balloons and forks. Lately, she's run an online survey, where people answer if they have ever flown a hot air balloon and how many forks they have, urging all her friends to share the link to it. She's now finally made a post of the results:
"Thank you, all $1105$ participants. More than I predicted, $75$ of you, have flown in a hot air balloon. Out of these $75,$ $44$ have between eleven and twenty forks, and $22$ have between six and ten forks. In total, $312$ people have between six and ten forks, and $583$ people have never flown in a hot air balloon and have between eleven and twenty forks."
Help her visualize the data by drawing a two-way frequency table including all joint and marginal frequency. Then, draw a two-way table with joint relative and marginal relative frequencies. Finally, find and use the conditional relative frequencies to determine if there are any apparent associations in the data.
To begin, we'll establish the different categories for this data set. Based on Eugenia's questions, we can sort the data into two categories: "hot air balloon" and "forks." Next, we'll draw a two-way frequency table that organizes Eugenia's results.
Hot air balloon | ||||
Yes | No | Total | ||
Forks | 0-5 | |||
6-10 | 22 | 312 | ||
11-20 | 44 | 583 | ||
Total | 75 | 1105 |
Notice that the "Yes" column, the "11-20" row, the "Total" row, and the "6-10" row each have only one cell missing. Thus, we can complete each by reasoning.
Hot air balloon | ||||
Yes | No | Total | ||
Forks | 0-5 | $9$ | ||
6-10 | 22 | $290$ | 312 | |
11-20 | 44 | 583 | $627$ | |
Total | 75 | $1030$ | 1105 |
The remaining two cells can be found by reasoning in the same way. First, we'll find the number of people who have not ridden in a hot air balloon and own 0-5 forks, then we'll find the remaining total.
Hot air balloon | ||||
Yes | No | Total | ||
Forks | 0-5 | 9 | $157$ | $166$ |
6-10 | 22 | 290 | 312 | |
11-20 | 44 | 583 | 627 | |
Total | 75 | 1030 | 1105 |
Now that we have complete two-way table, we can see the joint and marginal frequencies for Eugenia's data. To find the joint relative and marginal relative frequencies, we'll divide each frequency by the total number of participants, $1105.$
Hot air balloon | ||||
Yes | No | Total | ||
Forks | 0-5 | $11059 ≈0.01$ | $1105157 ≈0.14$ | $1105166 ≈0.15$ |
6-10 | $110522 ≈0.02$ | $1105290 ≈0.26$ | $1105312 ≈0.28$ | |
11-20 | $110544 ≈0.04$ | $1105583 ≈0.53$ | $1105627 ≈0.57$ | |
Total | $110575 ≈0.07$ | $11051030 ≈0.93$ | $1$ |
From the relative frequencies above, we can notice trends in Eugenia's data. For instance, only $7%$ of participants have ridden in a hot air balloon, and $57%$ own between $11$ and $20$ forks. Lastly, we can calculate the conditional relative frequencies using either the row or the column totals. Here, we'll arbitrarily use the column totals.
Hot air balloon | |||
Yes | No | ||
Forks | 0-5 | $759 ≈0.12$ | $1030157 ≈0.15$ |
6-10 | $7522 ≈0.29$ | $1030290 ≈0.28$ | |
11-20 | $7544 ≈0.59$ | $1030583 ≈0.57$ |
For both groups of people, those who have and have not ridden in a hot air balloon, few have between $0$ and $5$ forks, while more than half have between $11$ and $20$ forks.