| {{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }} |
| {{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }} |
| {{ 'ml-lesson-time-estimation' | message }} |
Here are a few recommended readings before getting started with this lesson.
Consider four different situations that contain a data set.
Situation $1$ | A company wants to compare the monthly sales performance of its Product A over the course of a year. |
---|---|
Situation $2$ | A survey is conducted to gather data on the distribution of ages among participants in a community event. |
Situation $3$ | An analysis is conducted to compare the distribution of exam scores in a class. |
Situation $4$ | A study tracks the temperature variations over the course of a week in a particular city. |
Tadeo bought a new scientific magazine Imagineer,
which includes a lot of different graphs and math-related discussions on real-life topics. The first article discussed the results of a recent math competition among high-school students.
The article illustrated the grades of the participants obtained on the scale from $0$ to $10.$ Interpret the dot plot.
Range: $8$
Step $1$ | Determine the center (median) by finding the middle data point. |
---|---|
Step $2$ | Find the maximum and minimum values on the graph. Use these values to calculate the spread (range) of the data. |
Step $3$ | Analyze the overall shape of the graph. Note any other features of interest on the graph. |
Complete each step one at a time.
First, the total number of data points should be found. This is why begin by counting all the dots in the given dot plot.
There are $15$ data points. This means that there were $15$ participants in the math competition that obtained a grade. The median of the data set is, therefore, the $8th$ data point, as it divides the set into halves. Find its value on the dot plot.
The $8th$ data point is $6,$ which means that the median or the center of the data set is $6.$ This measure indicates the middle value of all the grades obtained by the participants.
Now, find the maximum and minimum values on the dot plot.
The minimum grade obtained by a participant is $2$ and the maximum grade is $10.$ To find the range, calculate the difference between these values.The next step is to analyze the overall shape of the graph.
The overall shape of this graph appears to be the bell shape of a normal distribution, meaning the grades are overall normally distributed, and the plot is symmetric in shape.
Step $1$ | Draw a horizontal line to begin the dot plot. Title the dot plot based on the problem, and label the plot with the categories/numbers. When labeling the line with numbers, the numbers must be sequential and in a consecutive order. |
---|---|
Step $2$ | Determine the frequency for each piece of data provided in the problem. |
Step $3$ | Place dots over each category or number on the horizontal line that corresponds to the frequency for each piece of data as depicted in the table. |
Complete each step one at a time.
Grades From Last Year.
The next step is to determine the frequency of each grade obtained by the participants from last year's math competition. To do so, count how many times each grade from $1$ to $9$ appears and write that number in a table next to the grade.
Grade | Frequency |
---|---|
$1$ | $1$ |
$2$ | $0$ |
$3$ | $1$ |
$4$ | $2$ |
$5$ | $0$ |
$6$ | $3$ |
$7$ | $3$ |
$8$ | $2$ |
$9$ | $1$ |
Lastly, place dots over each grade from $1$ to $9$ the number of times it appears in the data set. Use the values from the frequency table.
This way the dot plot for the given data set of values was formed.
A frequency distribution, sometimes called a histogram distribution, is a representation that displays the number of observations within a given interval. It is used to show the empirical or theoretical frequency of occurrence of each possible value in a data set, often recorded in a frequency table. Frequency distributions of categorical data are typically presented using a bar graph.
The most common types of distributions are symmetric frequency distribution and skewed frequency distribution.A histogram is a graphical illustration of a frequency distribution of a data set that contains numerical data. Histograms have several defining characteristics.
The next step is to make a frequency table showing how many data points lie in each interval.
Interval | Data Points | Frequency |
---|---|---|
$1−10$ | $4,$ $8$ | $2$ |
$11−20$ | $11,$ $11,$ $13,$ $15,$ $17,$ $19$ | $6$ |
$21−30$ | $21,$ $25,$ $26$ | $3$ |
$31−40$ | $37$ | $1$ |
From the frequency table, the histogram can be constructed by drawing a bar over each interval with a height corresponding to the found frequency.
Another article in Imagineer
focuses on the upcoming opening of a new screen room in the local cinema Movieton.
The theater had $2$ screens for the last $10$ years. The histogram in the article illustrates the distribution of ticket sales for a fiscal week in the year $2023.$
Interpret the bar graph by describing its shape, center, and any extreme values if they exist. Use the bar graph to determine what day tends to have the most ticket sales, and what the average amount of ticket sales is on that day.
Dependent Variable: The number of tickets sold
Distribution: Left-Skewed
Step $1$ | Identify the independent and dependent variable. |
---|---|
Step $2$ | List the frequency in each bin. |
Step $3$ | Interpret the data and describe the bar graph's shape. Use the interpretation to answer any questions about the data. |
Complete each step one at a time.
First, the independent and dependent variables need to be identified. The horizontal line lists days of a week while the vertical line represents the number of tickets sold.
The article analyzes how many tickets are sold on different days of a week and which day has the most sales. This means that the independent variable is the days of the week and the dependent variable is the number of tickets sold on each day.
Next, the frequency in each bin should be listed and interpreted. Use the values given in the bar graph to indicate the height of each bin. Remember that the vertical line represents the number of tickets sold on each day.
Day | Frequency |
---|---|
Monday | $64$ tickets were sold. |
Tuesday | $70$ tickets were sold. |
Wednesday | $62$ tickets were sold. |
Thursday | $137$ tickets were sold. |
Friday | $295$ tickets were sold. |
Saturday | $342$ tickets were sold. |
Sunday | $260$ tickets were sold. |
Lastly, interpret the data and describe the bar graph's shape. The bar graph shows that the distribution of ticket sales is left-skewed.
Friday and Saturday are the days with the most number of tickets sold, $295$ and $342$ respectively. Also, the largest number of tickets tend to be sold on Saturday, and that number of tickets is $342.$
Step $1$ | Choose the number of intervals |
---|---|
Step $2$ | Determine the size of the intervals |
Step $3$ | Make a frequency table |
Step $4$ | Draw a histogram |
Complete each step one at a time.
The next step is to make a frequency table showing how many data points lie in each interval.
Interval | Data Points | Frequency |
---|---|---|
$1−20$ | $7,$ $9,$ $11,$ $13,$ $15,$ $17,$ $18,$ $19,$ $20$ | $9$ |
$21−40$ | $21,$ $22,$ $23,$ $24,$ $24,$ $24,$ $25,$ $26,$ $27,$ $28,$ $30,$ $32,$ $35,$ $37,$ $39$ | $15$ |
$41−60$ | $42,$ $45,$ $49,$ $55,$ $60$ | $5$ |
$61−80$ | $70$ | $1$ |
From the frequency table, the histogram can be constructed by drawing a bar over each interval with a height corresponding to the found frequency.
A box plot or box and whisker plot can be used to illustrate the distribution of a data set. A box plot has three parts.
A box plot is a scaled figure, usually presented above a number line. The set of numbers used to draw the box plot is called the five-number summary of the data set. Each of the five numbers is labeled accordingly.
A box plot provides a visual illustration of the distribution of a data set. Each segment of the chart contains one quarter, or $25%$ of the data, and the center $50%$ of the data lies inside the box. The further apart the segments are, the greater the spread is for that quarter of the data.
The first and third quartiles are marked as the left and right sides of the box plot. The box plot can be completed by drawing a box between the quartiles.
The next article in Imagineer
focused on an analysis of air quality data collected from different urban and rural areas around the world. It included a box plot to visualize variations in pollution levels.
The box plot presented in the article is based on the scores from $0$ to $100,$ where $100$ is the greatest level of pollution, given to different areas by experts according to various factors of air pollution. Interpret the box plot.
Maximum: $98$
Median: $63.5$
First Quartile: $39$
Now, consider the given box plot.
By comparing the general box plot with this one, the minimum, maximum, median, first and third quartiles can be determined.Concept | Value | Meaning |
---|---|---|
Minimum | $24$ | The least level of pollution in the analyzed areas is $24$ out of $100.$ |
Maximum | $98$ | The greatest level of pollution in the analyzed areas is $98$ out of $100.$ |
Median | $63.5$ | The average level of pollution in the analyzed areas is $63.5$ out of $100.$ |
First Quartile | $39$ | $25%$ of the analyzed areas have the level of pollution at $39$ or less. |
Third Quartile | $83$ | $25%$ of the analyzed areas have the level of pollution at $83$ or more. |
Step $1$ | Order the data set from least to greatest value. Find the minimum and maximum. |
---|---|
Step $2$ | Determine the median. |
Step $3$ | Determine the first and third quartiles. |
Step $4$ | Draw a box plot. |
Complete each step one at a time.