| | {{ 'ml-lesson-number-slides' | message : article.intro.bblockCount }} |
| | {{ 'ml-lesson-number-exercises' | message : article.intro.exerciseCount }} |
| | {{ 'ml-lesson-time-estimation' | message }} |
Here are a few recommended readings before getting started with this lesson. Background to help understand Probability
Mark's father runs a burger restaurant. The mean age of people who visit the restaurant is years old. Mark suspects that this situation has changed during the last year. To investigate whether his suspicions were true, he surveyed customers and found a sample mean of years with a standard deviation of years.
If he wants to test his results with a significance, help him complete the following questions.
Inferential statistics uses data from a sample to draw conclusions or test hypotheses about a population. Conclusions made from a sample are almost never accurate but can be thought of as the best guess or most probable answer. One of the main tasks of inferential statistics is to provide a confidence interval.
The maximum error of estimate, also known as the margin of error, is the maximum difference between the estimate of the population mean and its actual value. The maximum error of estimate is calculated using the following formula.
In this formula, represents the value of a certain confidence level, is the standard deviation of the sample, and is the sample size. From the formula, some conclusions can be made about the error of estimate.
The maximum error of estimate is added to and subtracted from the estimation mean to find the bounds of a confidence interval.
A confidence interval for the population mean can be found by adding and subtracting the maximum error of estimate to and from the sample mean
Mark's father owns a burger restaurant. He wants to implement changes to improve the customer experience. Recently he found that in a sample of burgers, on average, a burger takes minutes to be cooked and given to the customer, with a standard deviation of minutes.
Is the sample size greater than
Since the confidence level is this portion of the area around the mean will be covered in a standard normal distribution. The area in the distribution's tails that are not in the confidence interval will be each.
Because the distribution is symmetric, the values limiting this area are opposites, so only one value needs to be found. Additionally, this value is given by the value of the upper or lower tail. One way to determine this value is to use a graphing calculator. Push then and choose the third option, invNorm(.
PGFTikZ parser error:Error with file uploading, missing permissions.
Next, enter and push to get the value of the lower tail.
PGFTikZ parser error:Error with file uploading, missing permissions.
The value is approximately and because of the symmetry of the distribution, this means that its additive inverse can be used to evaluate the formula.
Substitute values
Multiply
Calculate root
Use a calculator
Round to decimal place(s)
The secret to the success of the burger restaurant is not only the flavor of the meat but also the soda included in the King's Combo. This soda follows a unique brewing process, and a soda dispensing machine fills the bottles that are later sold with the combos.
Mark wants to find the mean volume contained in the bottles that are filled by the dispensing machine. He took a sample of bottles of soda and measured their volumes. He found that the mean volume of the bottles is milliliters with a standard deviation of Which option corresponds to a confidence interval for the population mean of soda volume?Begin by calculating the maximum error of estimate. Then add and subtract that from the sample mean to get the bounds of the confidence interval.
Determine a confidence interval for the population mean of soda volume in order to identify the right option. To do so, follow these steps.
The mean volume for the sample consisting of sodas was milliliters. The maximum error of estimate will be calculated next.
This value is given by the value of the upper or lower tail. Because the distribution is symmetric, the values limiting this area will be opposites of each other, so only one needs to be found. In this case, a short version of the standard normal table can be used to locate the value of the lower tail, which in decimal form is
While a confidence interval helps estimate the value of a population parameter like the mean, there is another inferential method that can help evaluate a specific claim about a population parameter. Before exploring this method, two statistical hypotheses about the population need to be identified. These are the null and alternative hypotheses.
The null hypothesis and alternative hypothesis are two mutually exclusive statements about the mean of a population. The null hypothesis, denoted by is a statement of equality or non-strict inequality about the population mean that is accepted as true unless strong evidence is shown against it.
Null Hypothesis
Conversely, the alternative hypothesis, denoted by or is a strict inequality statement that contradicts the null hypothesis. It is the complement of the null hypothesis and will be accepted if there is evidence in its favor.
Alternative Hypothesis
Notice that the initial claim made by the researcher is the one that sets the null and alternative hypotheses. If the claim can be written algebraically as a strict inequality, it will be part of the alternative hypothesis. Otherwise, it will be part of the null hypothesis.
Another characteristic of the King's Combo at Mark's father's restaurant is that customers can choose between a cookie or a soft ice cream as part of their meal. They can also pay more to get a piece of cake.
| Null Hypothesis | Alternative Hypothesis |
|---|---|
| The mean is greater than or equal to |
The mean is less than (claim) |
| Null Hypothesis | Alternative Hypothesis |
|---|---|
| The mean is equal to (claim) |
The mean is not equal to |
Once the null and alternative hypotheses have been correctly identified, they can be tested by performing a hypothesis test to see which statement is more likely true. Before the test can be performed, some information is needed.
A hypothesis test is an inferential method that uses sample data to examine a claim about the mean of a population. Because the population mean is almost always unknown, it is common to be suspicious about the truthfulness of any assumption about its value. The following are typical claims about the mean of a population.
| Typical Claims About the Mean | ||
|---|---|---|
| The mean is equal to a specific value, | The mean is greater than a specific value, | The mean is less than a specific value, |
Before making a hypothesis test, two hypotheses need to be specified, the null hypothesis and the alternative hypothesis. These hypotheses must be mutually exclusive. The null hypothesis is assumed to be true. The hypothesis test puts the null hypothesis on trial to see if there is strong evidence against it. If so, the alternative hypothesis is accepted instead.
The significance level is the probability that the results obtained in a sample are due to chance and is set in advance when making a hypothesis test. The smaller the value, the stronger the results of a sample are. These are typical values for the significance level.
| Typical Significance Levels | ||
|---|---|---|
In a standard normal distribution, the sample mean would fall around the center of the distribution if the null hypothesis were true. This means that a value in the tails of the distribution would be unusual if were true. The significance level tells how far the sample mean will lie in from the center of the distribution and whether to reject the null hypothesis and accept the alternative hypothesis
The critical region, determined by the significance level is the set of values that will lead to rejecting the null hypothesis In a standard normal distribution, this region is located in the tails of the distribution. The cutoff value of the region is a critical value given by the value of The tests of significance — left, right, or two-tail — determine whether there are one or two critical regions.
| Critical Values | |||
|---|---|---|---|
| Significance Level | Left-Tail Test |
Two-Tail Test |
Right-Tail Test |
In a hypothesis test, the region where the null hypothesis is rejected is known as the critical region. The location of this region depends on the significance level and the inequality symbol of the alternative hypothesis as determined by the tests of significance. The tests of significance can be divided into the left-tailed test, the two-tailed test, and the right-tailed test.
The applet below shows how the critical regions vary depending on the tests of significance.
When making a hypothesis test, begin by identifying the claim to set the null and alternative hypotheses. Then the critical regions and the critical values are determined based on the tests of significance. Finally, the null hypothesis is rejected if the statistic falls within the critical region. To illustrate this process, consider the following situation.
|
A company says that each of their packages of ham contains exactly slices. |
| Null Hypothesis $\bm{H_0}$ | Alternative Hypothesis $\bm{H_a}$ |
|---|---|
| The mean is equal to slices (claim). |
The mean is different than slices. |
Because the sign of the alternative hypothesis is a two-tailed test of significance will be conducted. This means that there are two critical regions whose cutoffs will be given by the value of the significance level The following are the critical values for the most common values.
| Critical Values | |||
|---|---|---|---|
| Significance Level | Left-Tail Test |
Right-Tail Test |
Two-Tail Test |
From the table, note that the critical values for a significance level are Now the critical regions and critical values can be labeled.
Substitute values
Subtract term
Calculate root
Put minus sign in front of fraction
Multiply
Calculate quotient
Next, verify if the statistic falls within the critical region. If so, reject the null hypothesis. To do so, plot the statistic jointly with the critical regions to see where it falls, outside or inside the critical region.
Because the statistic falls within the critical region, the null hypothesis is rejected in this case.
Use the result of the previous step to make a conclusion about the initial claim.
|
A company says that each of their packages of ham contains exactly slices. |
In this case, since the initial claim is related to the null hypothesis, it can be said that there is enough evidence to reject the claim that the packages of ham contain exactly slices.
The following situations need to be considered when calculating the critical values.
For the given example, each critical region will cover an area of Therefore, the value for the left will be found first. To do so, push then and choose the third option, invNorm(.
PGFTikZ parser error:Error with file uploading, missing permissions.
Now enter the desired value, which in this case is Finally, push to get the result.
PGFTikZ parser error:Error with file uploading, missing permissions.
The value for the left tail is about so the value for the right tail will be A similar process is followed when performing a one-tail test.
While watching the Dinos and Dragons
movie with his family, Mark decides to eat a bar of his favorite chocolate as a snack. After eating it, he feels slightly disappointed because the bar seemed a little smaller than the grams listed on the packages. He decides to investigate if the brand producing the chocolate bars lied about the weight of the chocolate bars.
To determine if what the package shows is true, Mark weighs a sample of chocolate bars and finds a sample mean of with a standard deviation of He wants to test the affirmation in the packages about the weight of chocolate bars with significance. Help him find the following information to draw a conclusion.
| Null Hypothesis $\bm{H_0}$ | Alternative Hypothesis $\bm{H_a}$ |
|---|---|
| The mean is equal to (claim) |
The mean is different than |
Because the sign of the alternative hypothesis is a two-tailed test of significance corresponds to this situation.
invNorm(.
PGFTikZ parser error:Error with file uploading, missing permissions.
Next, given that enter and push to get the result.
PGFTikZ parser error:Error with file uploading, missing permissions.
This is the critical value corresponding to the critical region on the left of the standard normal distribution. Because the distribution is symmetric, the critical value for the upper tail will be the same but with the opposite sign. With this information, the critical regions can be set in the distribution.
This corresponds to option A.
Note that the statistic falls outside the critical region. Therefore, the null hypothesis cannot be rejected. This means that there is not enough evidence to reject the claim about the weight of the chocolate bars. So, it is most likely true that the mean weight of the chocolate bars is
After enjoying the Dinos and Dragons
movie with his family, Mark and his father start watching sports news. The newscaster reports that, on average, teens spend at most minutes a day playing sports. Mark wants to determine if what the news reported is accurate.
Using a sample of teens, Mark calculates a mean of minutes and a standard deviation of minutes. Help Mark if he wants to test the news report with significance.
| Null Hypothesis $\bm{H_0}$ | Alternative Hypothesis $\bm{H_a}$ |
|---|---|
| The mean is less than or equal to minutes. (claim) |
The mean is greater than minutes. |
Because the sign of the alternative hypothesis is a right-tailed test of significance applies to this situation.
invNorm(.
PGFTikZ parser error:Error with file uploading, missing permissions.
Because the upper of the distribution is desired, the value to be entered into the calculator is given by Next, enter this value and push to get the result
PGFTikZ parser error:Error with file uploading, missing permissions.
The critical value is about This value will limit the critical region that will be located in the right tail of the distribution.
Therefore, this corresponds to option D.
Since the statistic falls in the critical region, the null hypothesis should be rejected. Additionally, because the initial claim is related to the null hypothesis, it can be said that it is more likely that the mean time spent by teens playing sports is greater than minutes.
This lesson reviewed the importance of samples when it comes to estimating population parameters. However, due to the margin of error in estimations, inferential methods are helpful when stating how confident a specific estimation is or testing a particular claim about the population mean.
| Inferential Methods | |
|---|---|
| Confidence Interval | Hypothesis Test |
| Estimates a population parameter as a range of values | Tests a claim about the mean of a population |
Now the challenge presented earlier about the average age of people at the burger restaurant can be solved.
The mean age of people who eat at Mark's father's burger restaurant used to be Mark suspects that this has changed, so he surveyed a sample of customers. He found a sample mean of years with a standard deviation of years. If he wants to conduct a test with significance, help him through the hypothesis test.
| Null Hypothesis $\bm{H_0}$ | Alternative Hypothesis $\bm{H_a}$ |
|---|---|
| The mean is equal to (claim) |
The mean is different than years. |
Because the sign of the alternative hypothesis is a two-tailed test of significance will be needed in this case.
invNorm(.
PGFTikZ parser error:Error with file uploading, missing permissions.
Mark wants to test his hypothesis at a significance level, meaning each critical region will contain of the distribution. Next, enter and push to get the result.
PGFTikZ parser error:Error with file uploading, missing permissions.
This is the critical value corresponding to the critical region on the left of the standard normal distribution. Moreover, because the distribution is symmetric, the critical value for the upper tail will be the same but with the opposite sign. With this information, the critical regions can be set in the distribution.
This corresponds to option B.
Because the statistic falls in the critical region, the null hypothesis should be rejected. Given that the initial claim is related to the null hypothesis, there is strong evidence to reject the claim that the mean age of customers at the restaurant is This means that it is more likely that the mean age is different than