| | {{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }} |
| | {{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }} |
| | {{ 'ml-lesson-time-estimation' | message }} |
Most of the data is grouped next to the mean value, which is Therefore, a man who wears a size shoe is more likely to be randomly selected than a man who wears a size shoe. When a data set is distributed this way and the domain of the distribution is continuous — not discrete — it is said that the data is normally distributed. This lesson explores this distribution.
Here are a few recommended readings before getting started with this lesson.
Kevin has a summer internship at a tech company in his town. The daily number of calls that the company receives is normally distributed with a mean of calls and a standard deviation of calls. The graph represents the distribution of the data.
Looking to make improvements in the company, Kevin's boss is interested in knowing the answers to the next couple of questions.
When dealing with probability distributions, there is one type that stands out above the rest because it is very common in different real-life scenarios like people's heights, shoe sizes, birth weights, average grades, IQ levels, and many qualities. Because of this regularity, this type of distribution is called the normal distribution.
A normal distribution is a type of probability distribution where the mean, the median, and the mode are all equal to each other. The graph that represents a normal distribution is called a normal curve and it is a continuous, bell-shaped curve that is symmetric with respect to the mean of the data set.
This type of distribution is the most common continuous probability distribution that can be observed in real life. When a normal distribution has a mean of and standard deviation of it is called a standard normal distribution.
In statistics, the Empirical Rule, also known as the $\bm{68\textbf{–}95\textbf{–}99.7}$ rule, is a shorthand used to remember the percentage of values that lie within certain intervals in a normal distribution. The rule states the following three facts.
Empirical Rule.
In his spare time, Kevin works with the Less Chat, More Talk campaign to encourage people to share with their loved ones in person instead of through screens. He wants to give away T-shirts with a cool logo outside a shopping mall to help spread this message.
Kevin is in charge of preparing the men's T-shirts, but he does not know how many of each size he should order. To figure it out, he searched the City Hall website and he found that the heights of the men in the city are normally distributed with a mean of centimeters and a standard deviation of centimeters. Along with this information, there was also a graph.
Due to the symmetry of the normal curve, of the data fall to the left of and of the data fall to the right of Consequently, of the men surveyed are shorter than centimeters.
According to the graph, of the surveyed men are between and centimeters tall.
First, draw a horizontal axis and mark the mean of the data in the middle. In this case, the mean is
Find more labels to write on the axis such that each interval is one standard deviation long. In this case, the intervals must be units long. To accomplish this, add and subtract multiples of the standard deviation to and from the mean.
| Labels to the Left of the Mean | Labels to the Right of the Mean |
|---|---|
Adding three labels to each side of the mean is enough.
Lastly, draw a bell-shaped curve with its peak at the mean. Remember, the curve is symmetric with respect to the mean. In this case, the peak occurs at
While reading some statistics about the people in the city, Kevin was surprised to learn that the weights of newborns are also normally distributed. He found the following information given by the local hospital.
Next, draw the normal curve — a bell-shaped curve that is symmetric with respect to the mean, where it has its peak.
According to the Empirical Rule, the percentages below the curve are distributed as follows.
The percentages in every interval can be labeled by using the symmetry of the curve. This will complete the diagram of the distribution.
The height of people is usually normally distributed. For example, the average height of a woman in the United States is about centimeters. Assuming a standard deviation of centimeters, the graph of this distribution looks as follows.
The Empirical Rule is used to determine the percentage of data that falls between any two labels on the axis. However, what about if the endpoints of the interval are different from the labels? For example, what is the percentage of women that are shorter than centimeters?
To find such a percentage, the first step is converting the data value into its corresponding score.
The score, also known as the value, represents the number of standard deviations that a given value is from the mean of a data set. The following formula can be used to convert any value into its corresponding score.
Consider a standard normal distribution and a randomly chosen score. The area below the normal curve that is to the left of this score can be calculated using a standard normal table. For example, consider
In the left column of the standard normal table, locate the whole part of the score. Since is positive, look at the four bottom rows. Because the whole part of is shade the fifth row.
The probability that corresponds to a score for which the integer part is appears in the shaded row.
In the top row of the standard normal table, locate the decimal part of the score. Here, the decimal part is Consequently, shade the seventh column.
Other areas can also be found using the same standard normal table.
To find the area below the normal curve and between two scores, subtract the area to the left of the smaller score from the area to the left of the greater score.
The area to the right of a score is the complement of the area to the left of the same score.
According to the standard normal table, the probability that a randomly selected value is less than or equal to is Therefore, about of women are shorter than or equal to centimeters.
Kevin has become a stats fan. He has recorded the time it takes him to commute to his internship over the past few days. He observes that the times are normally distributed with a mean of minutes and a standard deviation of minutes.
Find the following probabilities and write them in decimal form rounded to two decimal places.
The probability that Kevin spends less than minutes getting to work tomorrow is represented by the area below the curve that is to the left of
According to the table, the probability that tomorrow Kevin will spend less than minutes traveling to work is about
Therefore, both values will need to be converted into their corresponding scores first. Recall that and
| value | Substitute | Simplify |
|---|---|---|
Since is not a label on the axis, the Empirical Rule cannot be used. Therefore, scores must be used to find the area. In Part B it was determined the score that corresponds to is
| Probability of Kevin Being Late | Probability of Kevin Being on Time |
|---|---|
The company Kevin is interning with plans to release a new smartphone. He goes with the research team to a stadium with a prototype to let different people use the phone in order to determine what features and design people like.
After comparing and contrasting size preference with the ages of the participants, Kevin realizes that the data is normally distributed. Additionally, he notices that the middle of participants prefer a larger phone.
Due to the symmetry of the normal curve, the area to the left of is equal to the area to the right of Therefore, each portion corresponds to of the data. For the moment, focus on the area to the left of
According to the last graph, the probability that a randomly chosen value is less than is In other words, Now, look for the value that produces a probability of on a standard normal table.
It is seen in the table that Again, due to symmetry, is the opposite of Therefore,
Therefore, the limits of the middle of the data are and
Any normal distribution with mean and standard deviation can be converted into a standard normal distribution. For example, consider a normal distribution with and To standardize the distribution, all its values have to be converted into their corresponding scores.
First, shift all the values so that the mean of the new set is To do this, subtract the mean from each data value.
Notice that translating the values will not changed the standard deviation. The standard deviation of the new data set is still
The initial data set has been converted into
To obtain a data set with a standard deviation of divide the values obtained in the previous step by the standard deviation of the set.
| Score | ||
|---|---|---|
After the standardization, the new data set is Here, the mean is and the standard deviation
Notice that the resulting curve has a similar shape and distribution of data values as the original.
Kevin's friend LaShay took the SAT and scored points on the math section. Kevin took the ACT and scored points in the math section.
Since these tests use different scales — the math section of the SAT scores points while the math section of the ACT scores points — they wonder who did better. They looked at the stats for each test to find out.
Now Kevin's and LaShay's scores will be placed on the horizontal axis of their corresponding test. The score that is further to the right of the mean will tell who stood out the most compared to their class.
| Score | Mean | Standard Deviation | score | ||
|---|---|---|---|---|---|
| LaShay | |||||
| Kevin |
LaShay's score is greater than Kevin's score. This means that her score is further to the right of the mean. Consequently, LaShay excelled more in her class than Kevin did in his.
As before, the Empirical Rule is not helpful because LaShay's score is not of the form Therefore, the area will be found using scores. The score that corresponds to was found to be in Part A. This means that the area is given by which can be found on the standard normal table.
The probability that a randomly chosen classmate of LaShay's has scored less than or equal to her is
In the challenge presented at the beginning, it was said that Kevin has a summer internship at a tech company. The daily number of calls the company receives is normally distributed with a mean of calls and a standard deviation of calls. The corresponding normal curve is represented in the following graph.
In order to improve the company, Kevin's boss is interested in knowing the answer to the next couple of questions.
Notice that is exactly two standard deviations above the mean. Therefore, the required area can be found by using the Empirical Rule. This rule tells the percentage of data that fall within certain intervals. The graph below shows the distribution divided into labeled intervals according to the Empirical Rule.
| value | score | |
|---|---|---|
Graphing calculators are helpful for graphing a normal distribution and finding the area under the curve between specific limits. Additionally, the area can be found by using either the original normal distribution values or using scores.
In Part A, the value is two standard deviations to the right of the mean. Therefore, its score is To find the area below the curve that is to the right of follow these three steps in the calculator.
PGFTikZ parser error:Error with file uploading, missing permissions.
ShadeNorm(.
PGFTikZ parser error:Error with file uploading, missing permissions.
DRAW.
PGFTikZ parser error:Error with file uploading, missing permissions.
PGFTikZ parser error:Error with file uploading, missing permissions.
PGFTikZ parser error:Error with file uploading, missing permissions.
PGFTikZ parser error:Error with file uploading, missing permissions.
Consequently, is about
PGFTikZ parser error:Error with file uploading, missing permissions.
In the third step set the values corresponding to the distribution and press DRAW.
PGFTikZ parser error:Error with file uploading, missing permissions.
PGFTikZ parser error:Error with file uploading, missing permissions.
As seen, the result obtained is the same as before. Finally, for Part B, keep the window settings and only update the lower and upper limits.
PGFTikZ parser error:Error with file uploading, missing permissions.
PGFTikZ parser error:Error with file uploading, missing permissions.