Consider two events $A$ and $B.$ The probability that $A$ or $B$ will occur is the probability of the union of $A$ and $B,$ and can be found using the Addition Rule of Probability.
$P(A \text{ or }B) = P(A \cup B)$
For two mutually exclusive events $A$ and $B,$ the probability that $A$ or $B$ occur in one trial is the sum of the individual probability of each event.
For example, consider rolling a standard six-sided die. Let $A$ be the event that a $3$ is rolled, and $B$ be the event that a $4$ is rolled. The probaility of $A$ or $B$ can be found by adding the individual probabilities. $P(3 \text{ or }4) = P(3)+P(4) = \frac{1}{6} + \frac{1}{6}\\ \Downarrow\\ P(3 \text{ or }4)=\frac{2}{6}=\frac{1}{3}$ The formula above can be generalized to events that are not necessarily mutually exclusive. If events are overlapping, the probability of the common outcomes are counted twice in $P(A)+P(B),$ so an adjustment is needed.
For example, consider rolling a standard six-sided die. Let $A$ be the event that an even number is rolled, and $B$ be the event that a prime number is rolled.
Event | Outcome(s) | Probability |
---|---|---|
even | $2,$ $4,$ $6$ | $P(A)=\dfrac{3}{6}=\dfrac{1}{2}$ |
prime | $2,$ $3,$ $5$ | $P(B)=\dfrac{3}{6}=\dfrac{1}{2}$ |
even and prime | $2$ | $P(A\text{ and }B)=\dfrac{1}{6}$ |
Using the formula gives the probability that the result of the roll is even or prime. $P(A\text{ or }B)=P(A)+P(B)-P(A\text{ and }B)\\ \Downarrow\\ P(\text{even or prime})=\frac{1}{2}+\frac{1}{2}-\frac{1}{6}=\frac{5}{6}$
This can be verified by noticing that there are five outcomes that are even or prime, $2,$ $3,$ $4,$ $5,$ and $6.$Two events, $A$ and $B,$ are independent if and only if the probability that both events occur is equal to the product of the individual probabilities.
If a coin is flipped two times, the outcome of the first flip does not affect the outcome of the second flip. For example, suppose the first flip is heads. This does not affect the likelihood that the second flip is also heads. $P(\text{H and H})=P(\text{H})\cdot P(\text{H})$ By showing that the expressions are equal, it can be concluded that the events are independent. To find the probability of flipping heads twice a tree diagram can be drawn.
The number of possible outcomes when flipping two coins is $4.$ Additionally, the favorable outcome, two heads, is $1.$ $P(\text{H and H})=\frac{1}{4}$ Next, consider flipping a coin two separate times. The probability of flipping heads is $P(\text{H})=\frac{1}{2}.$ $P(\text{H})\cdot P(\text{H}) = \frac{1}{2}\cdot\frac{1}{2}=\frac{1}{4}$
The probability is the same for both expressions, $\frac{1}{4}.$ Therefore, because the rule is satisfied, the events are independent.Two events are said to be dependent when the occurrence of one affects the occurrence of the other. For example, consider drawing two marbles from a bowl, one at a time.
The probability of first picking a green marble can be calculated by dividing the favorable outcomes by the possible outcomes. There is $1$ green marble and $3$ total marbles. $P(\text{green})=\dfrac{1}{3}$ Suppose that the first marble is replaced before the second draw. Therefore, after the replacement, there is $1$ purple marble and $3$ total marbles. $P(\text{purple})= \frac{1}{3}$ The combined probability of picking a green marble first and a purple marble second can be calculated using the Multiplication Rule of Probability. $P(\text{green then purple}) = \dfrac{1}{3} \cdot \dfrac{1}{3} = \dfrac{1}{9}$ These events are not dependent. Suppose instead that, after the green marble is picked, it is not replaced in the bowl.
This affects the probability of picking a purple marble on the second draw. Now, there still is $1$ purple marble but, instead of $3,$ there are $2$ total marbles. $P(\text{purple})=\dfrac{1}{2}$ With this information, the probability of picking green and then purple can be calculated. $P(\text{green then purple}) = \dfrac{1}{3} \cdot \dfrac{1}{2} = \dfrac{1}{6}$
As can be seen, these events are dependent because the occurrence of the first affects the occurrence of the second.The conditional probability of an event $B$ is the probability that $B$ will occur given that another event $A$ has already occurred. The probability of $B$ given $A$ is written $P(B|A).$ It can be calculated by dividing the probability for $A$ and $B$ with the probability of $A.$
Consider two events $A$ and $B.$ The probability that $A$ and $B$ will occur is the probability of the intersection of $A$ and $B,$ and can be calculated using the Multiplication Rule of Probability.
$P(A \text{ and }B) = P(A \cap B)$
For two independent events $A$ and $B,$ the probability that $A$ and $B$ occur is the product of the individual probabilities.
For example, when rolling two dice, the probability of rolling two even numbers can be calculated using the individual probabilities. Note that there are $3$ outcomes that are even — $2, 4,$ and $6.$ $P(\text{even})=\frac{3}{6}=\frac{1}{2}$ The probability of rolling two even numbers can be calculated using the formula. $P(\text{both are even}) = \frac{1}{2} \cdot \frac{1}{2} =\frac{1}{4}$ For not necessarily independent events, the conditional probability formula can be rearranged to a product form.
$P(A\text{ and }B)=P(A)\cdot P(B|A)$
For example, consider a box with four red and six blue marbles. In an experiment two marbles are drawn randomly from the box. The formula can be used to find the probability that both marbles are red. Let $A$ and $B$ be the events that the first and second marbles are red.
The formula gives the probability that both marbles are red.
$P(A\text{ and }B)=P(A)\cdot P(B|A)\\ \Downarrow\\ P(\text{both red})=\frac{2}{5}\cdot \frac{1}{3}=\frac{2}{15}$When categorical data belongs to two categories, such as if people are asked whether they own a car and whether they have a driver's license, it can be presented in a two-way frequency table. One of the categories is represented by the rows of the table, and the other by the columns. The above survey, with $100$ participants, could result in the following answers.
Driver's license | |||
Yes | No | ||
Car | Yes | $43$ | $4$ |
No | $24$ | $29$ |
The two categories are then "car" and "driver's license," both with the possible answers "yes" and "no." The entries in the table are called joint frequencies. Often, two-way frequency tables include the total of the rows and columns. These totals are called marginal frequencies. The sum of the "total" row and "total" column are each equal to the sum of all joint frequencies, $100$ in this case.
Driver's license | ||||
Yes | No | Total | ||
Car | Yes | $43$ | $4$ | $47$ |
No | $24$ | $29$ | $53$ | |
Total | $67$ | $33$ | $100$ |
Organizing data in a two-way frequency table can help with visualization, which in turn makes it easier to analyze and present the data. Consider the following survey.
$53$ people took part in an online survey, where they got to choose their preferred hat, top hat or beret. Out of the $18$ males that participated, twelve of them prefer a beret. Fifteen of the females chose top hat as their preference.
First, determine the two categories of the table and draw it without frequencies. Here, the participants gave their hat preference and their gender, which are then the two categories. Hat preference can be further divided into top hat and beret, and gender into female and male. This gives the following table.
Hat preference | ||||
Top hat | Beret | Total | ||
Gender | Male | |||
Female | ||||
Total |
The "total" row and columns are included to make room for the marginal frequencies.
The given joint and marginal frequencies can now be added to the table.
Hat preference | ||||
Top hat | Beret | Total | ||
Gender | Male | $12$ | $18$ | |
Female | $15$ | |||
Total | $53$ |
Using the given frequencies, more information can potentially be found by reasoning. For instance, $12$ out of the $18$ males prefers berets, which means that $18 - 12 = 6$
males prefer top hats. Thus, there are $6$ males and $15$ females who prefer top hats, making a total of
$6 + 15 = 21$
participants that prefer top hats. Continuing this reasoning, the entire table can be completed.
Hat preference | ||||
Top hat | Beret | Total | ||
Gender | Male | $6$ | $12$ | $18$ |
Female | $15$ | $20$ | $35$ | |
Total | $21$ | $32$ | $53$ |
A joint relative frequency is the ratio of a joint frequency and the total number of values or observations. Similarly, a marginal relative frequency is the ratio of a marginal frequency and the total. For the example above, the joint and marginal relative frequencies are found by dividing the frequencies by $53,$ the number of participants.
Hat preference | ||||
Top hat | Beret | Total | ||
Gender | Male | $\dfrac{6}{53} \approx 0.11$ | $\dfrac{12}{53} \approx 0.23$ | $\dfrac{18}{53} \approx 0.34$ |
Female | $\dfrac{15}{53} \approx 0.28$ | $\dfrac{20}{53} \approx 0.38$ | $\dfrac{35}{53} \approx 0.66$ | |
Total | $\dfrac{21}{53} \approx 0.40$ | $\dfrac{32}{53} \approx 0.60$ | $1$ |
A conditional relative frequency is the ratio of a joint frequency and either of its corresponding two marginal frequencies. Alternatively, it can be calculated using relative joint and marginal frequencies. As an example, the following data will be used.
Driver's license | ||||
Yes | No | Total | ||
Car | Yes | $43$ | $4$ | $47$ |
No | $24$ | $29$ | $53$ | |
Total | $67$ | $33$ | $100$ |
Using the column totals, the left column of joint frequencies should be divided by $67,$ and the right column by $33.$ Since the column totals are used, the sum of the conditional relative frequencies will be $1.$
Driver's license | |||
Yes | No | ||
Car | Yes | $\dfrac{43}{67} \approx 0.64$ | $\dfrac{4}{33} \approx 0.12$ |
No | $\dfrac{24}{67} \approx 0.36$ | $\dfrac{29}{33} \approx 0.88$ |
From the two-way frequency table, find the conditional relative frequencies based on the columns. Then, find the probability that a vegetarian has a pet.
Vegetarian | ||||
Yes | No | Total | ||
Pet | Yes | $0.456$ | $0.154$ | $0.61$ |
No | $0.123$ | $0.267$ | $0.39$ | |
Total | $0.579$ | $0.421$ | $1$ |
To begin, recall that conditional relative frequencies can be calculated by dividing the joint relative frequencies by the marginal relative frequencies. Since it should be based on the columns, it's the totals of the vegetarians, $0.579$ and $0.421,$ that are used as the denominators.
Vegetarian | |||
Yes | No | ||
Pet | Yes | $\dfrac{0.456}{0.579}\approx0.79$ | $\dfrac{0.154}{0.421}\approx0.37$ |
No | $\dfrac{0.123}{0.579}\approx0.21$ | $\dfrac{0.267}{0.421}\approx0.63$ | |
Total | $0.579$ | $0.421$ |
Note that the sum of the conditional relative frequencies in each column is equal to $1.$ Now, to find the probability that a vegetarian has a pet, we look at the column for people who answered "Yes" on "Vegetarian".
Vegetarian | |||
Yes | No | ||
Pet | Yes | ${\color{#0000FF}{0.79}}$ | $0.37$ |
No | ${\color{#0000FF}{0.21}}$ | $0.63$ | |
Total | ${\color{#0000FF}{1}}$ | $1$ |
In that column, $79\,\%$ said "Yes" to having a pet and $21\,\%$ said "no". Thus, the probability that a vegetarian has a pet is $79\,\%.$