zotero-db/storage/K34UFCDP/.zotero-ft-cache

6519 lines
163 KiB
Plaintext
Raw Normal View History

Mathematics Higher Level Topic 7 Option:
Statistics and Probability
for the IB Diploma
Paul Fannon, Vesna Kadelburg, Ben Woolley and Stephen Ward
cambridge university press Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo, Delhi, Mexico City Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK www.cambridge.org Information on this title: www.cambridge.org/9781107682269
© Cambridge University Press 2013
This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press.
First published 2013
Printed in Poland by Opolgraf
A catalogue record for this publication is available from the British Library
ISBN 978-1-107-68226-9 Paperback
Cover image: Thinkstock
Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. Information regarding prices, travel timetables and other factual information given in this work is correct at the time of first printing but Cambridge University Press does not guarantee the accuracy of such information thereafter.
notice to teachers Worksheets and copies of them remain in the copyright of Cambridge University Press and such copies may not be distributed or used in any way outside the purchasing institution.
Contents
how to use this book
v
Acknowledgements
viii
Introduction
1
1 Combining random variables
2
1A Adding and multiplying all the data by a constant
2
1B Adding independent random variables
5
1C Expectation and variance of the sample mean and sample sum
9
1D Linear combinations of normal variables
12
1E The distribution of sums and averages of samples
16
2 More about statistical distributions
22
2A Geometric distribution
22
2B Negative binomial distribution
24
2C Probability generating functions
27
2D Using probability generating functions to find the distribution of the sum
of discrete random variables
32
3 Cumulative distribution functions
38
3A Finding the cumulative probability function
38
3B Distributions of functions of a continuous random variable
43
4Unbiased estimators and confidence intervals
48
4A Unbiased estimates of the mean and variance
48
4B Theory of unbiased estimators
51
4C Confidence interval for the population mean
55
4D The t-distribution
60
4E Confidence interval for a mean with unknown variance
63
5Hypothesis testing
71
5A The principle of hypothesis testing
71
5B Hypothesis testing for a mean with known variance
78
5C Hypothesis testing for a mean with unknown variance
81
5D Paired samples
85
5E Errors in hypothesis testing
88
Contents iii
6 Bivariate distributions
6A Introduction to discrete bivariate distributions 6B Covariance and correlation 6C Linear regression
7Summary and mixed examination practice
Answers
Appendix: Calculator skills sheets
A Finding probabilities in the t-distribution CASIO TEXAS
B Finding t-scores given probabilities CASIO TEXAS
C Confidence interval for the mean with unknown variance (from data) CASIO TEXAS
D Confidence interval for the mean with unknown variance (from stats) CASIO TEXAS
E Hypothesis test for the mean with unknown variance (from data) CASIO TEXAS
F Hypothesis test for the mean with unknown variance (from stats) CASIO TEXAS
G Confidence interval for the mean with known variance (from data) CASIO TEXAS
H Confidence interval for the mean with known variance (from stats) CASIO TEXAS
I Hypothesis test for the mean with known variance (from stats) CASIO TEXAS
J Finding the correlation coefficient and the equation of the regression line CASIO TEXAS
Glossary
Index
iv Contents
97
97 100 107
115 123 131
132 133
134 135
136 137
138 140
141 143
144 146
148 149
150 152
154 155
157 158
159 163
How to use this book
Structure of the book
This book covers all the material for Topic 7 (Statistics and Probability Option) of the Higher Level Mathematics syllabus for the International Baccalaureate course. It assumes familiarity with the core Higher Level material (Syllabus Topics 1 to 6), in particular Topic 5 (Core Statistics and Probability) and Topic 6 (Core Calculus). We have tried to include in the main text only the material that will be examinable. There are many interesting applications and ideas that go beyond the syllabus and we have tried to highlight some of these in the From another perspective and Research explorer boxes. The book is split into seven chapters. Chapter 1 deals with combinations of random variables and requires familiarity with Binomial, Poisson and Normal distributions; we recommend that it is covered first. Chapters 2 and 3 extend your knowledge of random variables and probability distributions, and use differentiation and integration; they can be studied in either order. Chapters 4 and 5 develop the main theme of this Option: using samples to make inferences about a population. They require understanding of the material from chapter 1. Chapter 6, on bivariate distributions, is largely independent of the others, although it requires understanding of the concept of a hypothesis test. Chapter 7 contains a summary of all the topics and further examination practice, with many of the questions mixing several topics a favourite trick in IB examinations. Each chapter starts with a list of learning objectives to give you an idea about what the chapter contains. There is also an introductory problem, at the start of the topic, that illustrates what you will be able to do after you have completed the topic. You should not expect to be able to solve the problem, but you may want to think about possible strategies and what sort of new facts and methods would help you. The solution to the introductory problem is provided at the end of the topic, at the start of chapter 7.
Key point boxes
The most important ideas and formulae are emphasised in the KEY POINT boxes. When the formulae are given in the Formula booklet, there will be an icon: ; if this icon is not present, then the formulae are not in the Formula booklet and you may need to learn them or at least know how to derive them.
Worked examples
Each worked example is split into two columns. On the right is what you should write down. Sometimes the example might include more detail then you strictly need, but it is designed to give you an idea of what is required to score full method marks in examinations. However, mathematics is about much more than examinations and remembering methods. So, on the left of the worked examples are notes that describe the thought processes and suggest which route you should use to tackle the question. We hope that these will help you with any exercise questions that differ from the worked examples. It is very deliberate that some of the questions require you to do more than repeat the methods in the worked examples. Mathematics is about thinking!
How to use this book v
Exam Hint
Signposts
There are several boxes that appear throughout the book.
Theory of knowledge issues
Every lesson is a Theory of knowledge lesson, but sometimes the links may not be obvious. Mathematics is frequently used as an example of certainty and truth, but this is often not the case. In these boxes we will try to highlight some of the weaknesses and ambiguities in mathematics as well as showing how mathematics links to other areas of knowledge.
From another perspective
The International Baccalaureate®encourages looking at things in different ways. As well as highlighting some international differences between mathematicians these boxes also look at other perspectives on the mathematics we are covering: historical, pragmatic and cultural.
Research explorer
As part of your course, you will be asked to write a report on a mathematical topic of your choice. It is sometimes difficult to know which topics are suitable as a basis for such reports, and so we have tried to show where a topic can act as a jumping-off point for further work. This can also give you ideas for an Extended essay. There is a lot of great mathematics out there!
Exam hint
Although we would encourage you to think of mathematics as more than just learning in order to pass an examination, there are some common errors it is useful for you to be aware of. If there is a common pitfall we will try to highlight it in these boxes. We also point out where graphical calculators can be used effectively to simplify a question or speed up your work, often referring to the relevant calculator skills sheet in the back of the book.
Fast forward / rewind
Mathematics is all about making links. You might be interested to see how something you have just learned will be used elsewhere in the course, or you may need to go back and remind yourself of a previous topic. These boxes indicate connections with other sections of the book to help you find your way around.
How to use the questions
Calculator icon
You will be allowed to use a graphical calculator in the final examination paper for this Option. Some questions can be done in a particularly clever way by using one of the graphical calculator functions, or cannot be realistically done without. These questions are marked with a calculator symbol.
vi How to use this book
The colour-coding
The questions are colour-coded to distinguish between the levels. Black questions are drill questions. They help you practise the methods described in the book, but they are usually not structured like the questions in the examination. This does not mean they are easy, some of them are quite tough. Each differently numbered drill question tests a different skill. Lettered subparts of a question are of increasing difficulty. Within each lettered part there may be multiple roman-numeral parts ((i), (ii), ...) , all of which are of a similar difficulty. Unless you want to do lots of practice we would recommend that you only do one roman-numeral part and then check your answer. If you have made a mistake then you may want to think about what went wrong before you try any more. Otherwise move on to the next lettered part.
Green questions are examination-style questions which should be accessible to students on the path to getting a grade 3 or 4. Blue questions are harder examination-style questions. If you are aiming for a grade 5 or 6 you should be able to make significant progress through most of these. Red questions are at the very top end of difficulty in the examinations. If you can do these then you are likely to be on course for a grade 7. Gold questions are a type that are not set in the examination, but are designed to provoke thinking and discussion in order to help you to a better understanding of a particular concept. At the end of each chapter you will see longer questions typical of the second section of International Baccalaureate® examinations. These follow the same colour-coding scheme. Of course, these are just guidelines. If you are aiming for a grade 6, do not be surprised if you find a green question you cannot do. People are never equally good at all areas of the syllabus. Equally, if you can do all the red questions that does not guarantee you will get a grade 7; after all, in the examination you have to deal with time pressure and examination stress! These questions are graded relative to our experience of the final examination, so when you first start the course you will find all the questions relatively hard, but by the end of the course they should seem more straightforward. Do not get intimidated!
We hope you find the Statistics and Probability Option an interesting and enriching course. You might also find it quite challenging, but do not get intimidated, frequently topics only make sense after lots of revision and practice. Persevere and you will succeed.
The author team.
How to use this book vii
Acknowledgements
The authors and publishers are grateful for the permissions granted to reproduce materials in either the original or adapted form. While every effort has been made, it has not always been possible to identify the sources of all the materials used, or to trace all copyright holders. If any omissions are brought to our notice, we will be happy to include the appropriate acknowledgements on reprinting. IB exam questions © International Baccalaureate Organization. We gratefully acknowledge permission to reproduce International Baccalaureate Organization intellectual property. Cover image: Thinkstock Diagrams are created by Ben Woolley. TI83 fonts are reproduced on the calculator skills sheets with permission of Texas Instruments Incorporated. Casio fonts are reproduced on the calculator skills sheets with permission of Casio Electronics Company Ltd (to access downloadable Casio resources go to www.casio.co.uk/education and http://edu.casio.com/dll).
viii Acknowledgements
Introduction
In this Option you will learn:
how to combine information from more than one random variable • how to predict the distribution of the mean of a sample • about more distributions used to model common situations • about the probability generating function: an algebraic tool for combining probability
distributions • about the cumulative distribution: the probability of a variable being less than a
particular value • how to estimate information about the population from a sample • about hypothesis testing: how to decide if new information is significant • how to make predictions based upon data.
Introductory problem A school claims that their average International Baccalaureate (IB) score is 34 points. In a sample of four students the scores are 31, 31, 30 and 35 points. Does this suggest that the school was exaggerating?
As part of the core syllabus, you should have used statistics to find information about a population using a sample, and you should have used probability to predict the average and standard deviation of a given distribution. In this topic we extend both of these ideas to answer a very important question: does any new information gathered show a significant change, or could it just have happened by chance? The statistics option is examined in a separate, one-hour paper. There will be approximately five extended-response questions based mainly upon the material covered in the statistics option, although any aspect of the core may also be included.
Introduction 1
In this chapter you will learn:
how multiplying all of your data by a constant or adding a constant changes the mean and the variance
how adding or multiplying together two independent random variables changes the mean and the variance
how we can apply these ideas to making predictions about the average or the sum of a sample
about the distribution of linear combinations of normal variables
about the distribution of the sum or average of lots of observations from any distribution.
1Combining random
variables
If you know the average height of a brick, then it is fairly easy to guess the average height of two bricks, or the average height of half of a brick. What is less obvious is the variation of these heights.
Even if we can predict the mean and the variance of this random variable this is not enough to find the probability of it taking a particular value. To do this, we also need to know the distribution of the random variable. There are some special cases where it is possible to find the distribution of the random variable, but in most cases we meet the enormous significance of the normal distribution; if the sample is large enough, the sample average will (nearly) always follow a normal distribution.
1A A dding and multiplying all the data by a constant
The average height of the students in a class is 1.75 m and their standard deviation is 0.1 m. If they all then stood on their 0.5m tall chairs then the new average height would be 2.25m, but the range, and any other measure of variability, would not change, and so the standard deviation would still be 0.1 m. If we add a constant to all the variables in a distribution, we add the same constant on to the expectation, but the variance does not change:
E(X +c) = E(X)+c Var(X + c) = Var(X)
If, instead, each student were given a magical growing potion that doubled their heights, the new average height would be 3.5m, and in this case the range (and any similar measure of variability) would also double, so the new standard deviation would be 0.2m. This means that their variance would change from 0.01m2 to 0.04m2.
2 Topic 7 Option: Statistics and probability
If we multiply a random variable by a constant, we multiply the expectation by the constant and multiply the variance by the constant squared:
E(aX) = aE(X) Var (aX) = a2Var(X)
These ideas can be combined together:
Key Point 1.1
E(aX + c) = aE(X) + c Var(aX + c) = a2Var (X)
Exam hint
IttfhwolfiicaosnrihaEstreitinmhct(anehXnheorxpiosat)fiosustotemrr2nbtcqunapceaacluyntltnilieuslotvedwi,ratnmdoeEo.lEep(aarkS(nXklnXiotfs2oiX,te+o)wdc)to
E( X ) .
Worked example 1.1
A piece of pipe with average length 80cm and standard deviation 2 cm is cut from a 100 cm length of water pipe. The leftover piece is used as a short pipe. Find the mean and standard deviation of the length of the short pipe.
Define your variables
Write an equation to connect the variables Apply expectation algebra
L = crv length of long pipe S = crv length of short pipe
S = 100 - L
E(S) = E(100 L) = 100 E(L) = 100 80 = 20
So the mean of S is 20 cm Var(S) = Var (100 - L)
= (-1)2 Var(L) = Var(L) = 4cm2 So the standard deviation of S is also 2 cm.
Exam hint Even if the coefficients are negative, you will always get a positive variance (since square numbers are always positive). If you find you have a negative variance, something has gone wrong!
The result regarding E(aX + b) stated in Key point 1.1 represents a more general result about the expectation of a function of a random variable:
1 Combining random variables 3
Key Point 1.2
For a discrete random variable:
E( g (X)) = ∑ g (xi )pi i
For a continuous random variable with probability density function f (x) :
E(g (X)) = ∫ g (x)f (x)dx
Worked example 1.2
The continuous random variable X has probability density ex for 0 < x < ln 2. The random
variable Y is related to X by the function Y = e2X. Find E(Y ).
Use the formula for the expectation of a function of a variable
Use the laws of exponents
∫ ( ) ln 2
E e2x = e2x × exdx 0
ln 2
∫= exdx 0
[ ] =
ex
ln 2 0
= eln 2 (e0 )
= 1 +1 2
=1 2
Exercise 1A
1. If E(X) = 4 find: (a) (i) E(3X)
(b)
(i)
E
 
X 2
 
(c) (i) E(X)
(d) (i) E(X + 5)
(e) (i) E(5 2X)
2. If Var (X) = 6 find:
(a) (i) Var(3X)
(b)
(i)
Var
 
X 2
 
(c) (i) Var(X)
(d) (i) Var(X + 5)
(e) (i) Var(5 2X)
4 Topic 7 Option: Statistics and probability
(ii) E(6X)
(ii)
E
 
3X 4
 
(ii) E(4X)
(ii) E(X 3)
(ii) E(3X +1)
(ii) Var(6X)
(ii)
Var
 
3X 4
 
(ii) Var(4X)
(ii) Var(X 3)
(ii) Var(3X +1)
3. The probability density function of the continuous random variable Z is kz for 1 < z ≤ 3.
(a) Find the value of k.
(b) Find E(Z).
(c) Find E(6Z + 5).
(d)
Find the exact value
of
E
 
1
1 +Z
2
 
.

[10 marks]
1B A dding independent random variables
A tennis racquet is made by adding together two components; the handle and the head. If both components have their own distribution of length and they are combined together randomly then we have formed a new random variable: the length of the racquet. It is not surprising that the average length of the whole racquet is the sum of the average lengths of the parts, but with a little thought we can reason that the standard deviation will be less than the sum of the standard deviation of the parts. To get either extremely long or extremely short tennis racquets we must have extremes in the same direction for both the handle and the head. This is not very likely. It is more likely that either both are close to average or an extreme value is paired with an average value or an extreme value in one direction is balanced by another.
Key Point 1.3
Linear Combinations
E(a1X1 ± a2 X2 ) = a1E(X1 ) ± a2E(X2 )
Var (a1X1 ± a2 X2 ) = a12Var ( X1 ) + a22Var(X2 )
The result for variance is only true if X and Y are independent.
There is a similar result for the product of two independent random variables:
Key Point 1.4 If X and Y are independent random variables then:
E(XY ) = E(X)E(Y )
We could write the whole of statistics only using standard deviation, without referring to
variance at all, where σ (aX + bY + c) = a2σ 2 (X ) + b2σ 2 (Y ). However, as you can see, the
concept of standard deviation squared occurs very naturally. Is this a sufficient justification for the concept of variance?
1 Combining random variables 5
It is not immediately obvious that if Var(X + Y) = Var(X) + Var(Y) then the standard deviation of (X + Y) will always be less than the standard deviation of X plus the standard deviation of Y. This is an example of one of many interesting inequalities in statistics.
Another is that E(X 2 ) ≥ E(X )2 which ensures that variance is always positive. If you are
interested in proving these types of inequalities you might like to look at the Cauchy-Schwarz inequality.
Exam hint Notice in particular that, if X and Y are independent:
Var(X Y ) = (12) × Var(X ) + (1)2 × Var(Y )
= Var(X ) + Var(Y )
The result extends to more than two variables.
Worked example 1.3
The mean thickness of the base of a burger bun is 1.4 cm with variance 0.02 cm2. The mean thickness of a burger is 3.0 cm with variance 0.14 cm2. The mean thickness of the top of the burger bun is 2.2 cm with variance 0.2 cm2. Find the mean and standard deviation of the total height of the whole burger and bun, assuming that the thickness of each part is independent.
Define your variables
Write an equation to connect the variables Apply expectation algebra
X = crv Thickness of base Y = crv Thickness of burger Z = crv Thickness of top T = crv Total thickness
T = X + Y + Z
E(T ) = E(X + Y + Z ) = E(X ) + E(Y ) + E(Z ) = 6.6 cm
So the mean of T is 6.6cm
Var(T ) = Var(X + Y + Z ) = Var(X ) + Var(Y ) + Var(Z )
= 0.36 cm2 So standard deviation of T is 0.6 cm
X and Y have to be independent (see Key point 1.3) but this does not mean that they have to be drawn from different populations. They could be two different observations of the same population, for example the heights of two different people added together. This is a different variable from the height of
6 Topic 7 Option: Statistics and probability
one person doubled. We will use a subscript to emphasise when there are repeated observations from the same population:
X1 + X2 means adding together two different observations of X 2X means observing X once and doubling the result.
The expectation of both of these combinations is the same, 2E(X), but the variance is different. From Key point 1.3:
Var (X1 + X2 ) = Var (X1) + Var (X2 ) = 2Var(X)
From Key point 1.1:
Var(2X) = 4Var(X)
So the variability of a single observation doubled is greater than the variability of two independent observations added together. This is consistent with the earlier argument about the possibility of independent observations cancelling out extreme values.
Worked example 1.4
In an office, the mean mass of the men is 84 kg and standard deviation is 11 kg. The mean mass of women in the office is 64 kg and the standard deviation is 6 kg. The women think that if four of them are picked at random their total mass will be less than three times the mass of a randomly selected man. Find the mean and standard deviation of the difference between the sums of four womens masses and three times the mass of a man, assuming that all these people are chosen independently.
Define your variables
Write an equation to connect the variables Apply expectation algebra
X = crv Mass of a man Y = crv Mass of a woman D = crv Difference between the mass of 4
women and 3 lots of 1 man
D = Y1 + Y2 + Y3 + Y4 3X
E(D) = E(Y1) + E(Y2 ) + E(Y3 ) + E(Y4 ) 3E(X)
= 4 kg
Var (D) = Var (Y1 ) + Var (Y2 ) + Var (Y3 ) + Var (Y4 ) + (3)2 × Var(X )
= 1233 kg2 So the standard deviation of D is 35.1 kg
Finding the mean and variance of D is not very useful unless you also know the distribution of D. In Sections 1D and 1E you will see that this can be done in certain circumstances. We can then go on to calculate probabilities of different values of D.
1 Combining random variables 7
Exercise 1B
1. Let X and Y be two independent variables with E(X) = 1, Var (X) = 2, E(Y ) = 4 and Var(Y ) = 4. Find the expectation and
variance of:
(a) (i) X Y
(ii) X + Y
(b) (i) 3X + 2Y
(ii) 2X 4Y
(c) (i) X 3Y + 1 5
(ii) X + 2Y 2 3
Denote by Xi, Yi independent observations of X and Y.
(d) (i) X1 + X2 + X3 (e) (i) X1 X2 2Y
(ii) Y1 + Y2 (ii) 3X (Y1 + Y2 Y3)
2. If X is the random variable mass of a gerbil explain the difference between 2X and X1 + X2.
3. Let X and Y be two independent variables with E(X) = 4,
Var(X) = 2, E(Y ) = 1 and Var(Y ) = 6. Find:
(a) E(3X)
(b) Var(3X)
(c) E(3X Y +1)
(d) Var(3X Y +1)
[6 marks]
4. The average mass of a man in an office is 85 kg with standard
deviation 12 kg. The average mass of a woman in the office is
68 kg with standard deviation 8 kg. The empty lift has a mass of
500 kg. What is the expectation and standard deviation of the
total mass of the lift when 3 women and 4 men are inside?

[6 marks]
5. A weighted die has mean outcome 4 with standard deviation 1.
Brian rolls the die once and doubles the outcome. Camilla rolls
the die twice and adds her results together. What is the expected
mean and standard deviation of the difference between their
scores?
[7 marks]
6. Exam scores at a large school have mean 62 and standard
deviation 28. Two students are selected at random. Find the
expected mean and standard deviation of the difference between
their exam scores.
[6 marks]
7. Adrian cycles to school with a mean time of 20 minutes and a
standard deviation of 5 minutes. Pamela walks to school with a
mean time of 30 minutes and a standard deviation of 2 minutes.
They each calculate the total time it takes them to get to school
over a five-day week. What is the expected mean and standard
deviation of the difference in the total weekly journey times,
assuming journey times are independent?
[7 marks]
8 Topic 7 Option: Statistics and probability
8. In this question the discrete random variable X has the following probability distribution:
x
1
2
3
4
P(X = x)
0.1
0.5
0.2
k
(a) Find the value of k.
(b) Find the expectation and variance of X.
(c) The random variable Y is given by Y = 6 X . Find the expectation and the variance of Y.
(d) Find E(XY) and explain why the formula
E(XY ) = E(X)E(Y ) is not applicable to these two
variables.
(e) The discrete random variable Z has the following distribution, independent of X:
z P(Z = z)
1
2
p 1p
If E(XZ) = 35 find the value of p.
8 
[14 marks]
1C E xpectation and variance of the sample mean and sample sum
When calculating the mean of a sample of size n of the variable X we have to add up n independent observations of X then divide by n. We give this sample mean the symbol X and it is itself a random variable (as it might change each time it is observed).
This is a linear combination of independent observations of X, so we can apply the rules of the previous section to get the following very important results:
Key Point 1.5
E(X) = E(X) Var(X) = Var(X)
n
The first of these results seems very obvious; the average of a sample is, on average, the average of the original variable, but you will see in chapter 4 that this is not the case for all sample statistics.
1 Combining random variables 9
The result actually goes further than that; it contains what economists call The law of diminishing returns. The standard deviation
of the mean is proportional to 1 , n
so going from a sample of 1 to a sample of 20 has a much bigger impact than going from a sample of 101 to a sample of 120.
The second result demonstrates why means are so important; their standard deviation (which can be thought of as a measure of the error caused by randomness) is smaller than the standard deviation of a single observation. This proves mathematically what you probably already knew instinctively, that finding an average of several results produces a more reliable outcome than just looking at one result.
Worked example 1.5
Prove that if X is the average of n independent observations of X then Var (X) = Var (X) . n
Write X in terms of Xi
Apply expectation algebra
Since X1, X2,… are all observations of X
= 1 (nVar (X ))
n2
= Var(X )
n
We can apply similar ideas to the sample sum.
Key Point 1.6
For the sample sum:
∑ ∑ 
E 
n i=1 Xi 
= nE(X)
and
 Var 
n i=1 Xi 
= nVar(X)
10 Topic 7 Option: Statistics and probability
Exercise 1C
1. A sample is obtained from n independent observations of a random variable X. Find the expected value and the variance of the sample mean in the following situations:
(a) (i) E(X) = 5, Var (X) = 1.2, n = 7 (ii) E(X) = 6, Var (X) = 2.5, n = 12
(b) (i) E(X) = 4.7, Var (X) = 0.8, n = 20 (ii) E(X) = 15.1, Var (X) = 0.7, n = 15
(c) (i) X ~ N(12, 32 ), n = 10 (ii) X ~ N(8, 0.62 ), n = 14
(d) (i) X ~ N(21, 6.25), n = 7 (ii) X ~ N(14, 0.64), n = 15
(e) (i) X ~ B(6, 0.5), n = 10 (ii) X ~ B(12, 0.3), n = 8
(f) (i) X ~ Po(6.5), n = 20 (ii) X ~ Po(8.2), n = 15
2. Find the expected value and the variance of the total of the samples from the previous question.
3. Eggs are packed in boxes of 12. The mass of the box is 50 g. The
mass of one egg has mean 12.4 g and standard deviation 1.2 g.
Find the mean and the standard deviation of the mass of a box
of eggs.
[4 marks]
4. A machine produces chocolate bars so that the mean mass of a
bar is 102 g and the standard deviation is 8.6 g. As a part of the
quality control process, a sample of 20 chocolate bars is taken
and the mean mass is calculated. Find the expectation and
variance of the sample mean of these 20 chocolate bars.

[5 marks]
∑ 5.
Prove
that
Var
 
n i=1 Xi 
=
nVar(X). 
[4 marks]
6. The standard deviation of the mean mass of a sample of
2 aubergines is 20 g smaller than the standard deviation in the
mass of a single aubergine. Find the standard deviation of the
mass of an aubergine.
[5 marks]
7. A random variable X takes values 0 and 1 with probability 1 and
3, respectively.
4
4
(a) Calculate E(X) and Var(X).
1 Combining random variables 11
A sample of three observations of X is taken.
(b) List all possible samples of size 3 and calculate the mean of each.
(c) Hence complete the probability distribution table for the sample mean, X .
x
0
1 3
2 3
1
P(X = x)
1 64
(d)
Show that
E(X) =
E(X)
and Var(X) =
Var (X).
3
[14
marks]
8. A laptop manufacturer believes that the battery life of the
computers follows a normal distribution with mean 4.8 hours and
variance 1.7 hours2. They wish to take a sample to estimate the
mean battery life. If the standard deviation of the sample mean is to
be less than 0.3 hours, what is the minimum sample size needed?

[5 marks]
9. When the sample size is increased by 80, the standard deviation
of the sample mean decreases to a third of its original size. Find
the original sample size.
[4 marks]
1D L inear combinations of normal variables
Although the proof is beyond the scope of this course, it turns out that any linear combination of normal variables will also follow a normal distribution. We can use the methods of Section C to find out the parameters of this distribution.
Key Point 1.7
If X and Y are random variables following a normal distribution and Z = aX + bY + c then Z also follows a normal distribution.
Worked example 1.6 If X ~ N(12, 15), Y ~ N(1, 18) and Z = X + 2Y + 3 find P(Z > 20).
Use expectation algebra
E(Z ) = E(X ) + 2 × E(Y ) + 3 = 17
Var(Z ) = Var(X ) + 22 × Var(Y ) = 87
State distribution of Z
Z ~ N(17, 87)
P (Z > 20) = 0.626 (from GDC)
12 Topic 7 Option: Statistics and probability
Worked example 1.7
If X ~ N(15, 122 ) and four independent observations of X are made find P(X < 14).
Express X in terms of observations of X
Use expectation algebra
X = X1 + X2 + X3 + X4 4
E(X ) = E(X)
4 = 15
Var(X ) = Var(X )
4 = 36
State distribution of X
X ~ N(15, 36)
P (X < 14) = 0.434 (from GDC)
The Poisson distribution is scaleable. If the number of butterflies on a flower in 10 minutes follows a Poisson distribution with mean (expectation) m, then the number of butterflies on a flower in 20 minutes follows a Poisson distribution with mean 2 m and so on. We can interpret this as meaning that the sum of two Poisson variables is also Poisson. However, this only applies to sums of Poisson distributions, not differences or multiples or linear combinations.
Exercise 1D
1. If X ~ N(12, 16) and Y ~ N(8, 25), find:
(a) (i) P(X Y > 2)
(ii) P(X + Y < 24)
(b) (i) P(3X + 2Y > 50) (ii) P(2X 3Y > 2)
(c) (i) P(X > 2Y )
(ii) P(2X < 3Y )
(d) (i) P(X > 2Y 2)
(ii) P(3X + 1 < 5Y )
(e) (i) P(X1 + X2 > 2X3 + 1) (ii) P(X1 + Y1 + Y2 < X2 + 12)
(f) (i) P(X > 13) where X is the average of 12 observations of X
(ii) P(Y < 6) where Y is the average of 9 observations of Y
EMcdoxeanvakfiueamstseioutnrhheeaiynsondtautntdhdeoarndot variance!
2. An airline has found that the mass of their passengers follows a normal distribution with mean 82.2 kg and variance 10.7 kg2. The mass of their hand luggage follows a normal distribution with mean 9.1 kg and variance 5.6 kg2.
(a) State the distribution of the total mass of a passenger and their hand luggage and find any necessary parameters.
(b) What is the probability that the total mass of a passenger
and their luggage exceeds 100 kg?
[5 marks]
1 Combining random variables 13

3. Evidence suggests that the times Aaron takes to run 100 m are normally distributed with mean 13.1 s and standard deviation 0.4 s. The times Bashir takes to run 100 m are normally distributed with mean 12.8 s and standard deviation 0.6 s.
(a) Find the mean and standard deviation of the difference (Aaron Bashir) between Aarons and Bashirs times.
(b) Find the probability that Aaron finishes a 100 m race before Bashir.
(c) What is the probability that Bashir beats Aaron by more
than 1 second?
[7 marks]
4. A machine produces metal rods so that their length follows a normal distribution with mean 65 cm and variance 0.03 cm2. The rods are checked in batches of six, and a batch is rejected if the mean length is less than 64.8 cm or more than 65.3 cm.
(a) Find the mean and the variance of the mean of a random sample of six rods.
(b) Hence find the probability that a batch is rejected. [5 marks]
5. The lengths of pipes produced by a machine is normally distributed with mean 40 cm and standard deviation 3 cm.
(a) What is the probability that a randomly chosen pipe has a length of 42 cm or more?
(b) What is the probability that the average length of a randomly
chosen set of 10 pipes of this type is 42 cm or more?

[6 marks]
6. The masses, X kg, of male birds of a certain species are normally distributed with mean 4.6 kg and standard deviation 0.25 kg. The masses, Y kg, of female birds of this species are normally distributed with mean 2.5 kg and standard deviation 0.2 kg.
(a) Find the mean and variance of 2Y X.
(b) Find the probability that the mass of a randomly chosen male bird is more than twice the mass of a randomly chosen female bird.
(c) Find the probability that the total mass of three male birds
and 4 female birds (chosen independently) exceeds 25 kg.

[11 marks]
7. A shop sells apples and pears. The masses, in grams, of the
( ) apples may be assumed to have a N 180, 122 distribution and
the masses of the pears, in grams, may be assumed to have a
( ) N 100, 102 distribution.
(a) Find the probability that the mass of a randomly chosen apple is more than double the mass of a randomly chosen pear.
(b) A shopper buys 2 apples and a pear. Find the probability
that the total mass is greater than 500 g.
[10 marks]
14 Topic 7 Option: Statistics and probability
8. The length of a cornsnake is normally distributed with mean
1.2 m. The probability that a randomly selected sample of
5 cornsnakes having an average of above 1.4 m is 5%. Find
the standard deviation of the length of a cornsnake.

[6 marks]
9. (a) In a test, boys have scores which follow the distribution N(50, 25). Girls scores follow N(60, 16). What is the probability that a randomly chosen boy and a randomly chosen girl differ in score by less than 5?
(b) What is the probability that a randomly chosen boy scores less
than three quarters of the mark of a randomly chosen girl?

[10 marks]
10. The daily rainfall in Algebraville follows a normal distribution with mean μ mm and standard deviation σ mm. The rainfall each day is independent of the rainfall on other days.
On a randomly chosen day, there is a probability of 0.1 that the rainfall is greater than 8 mm.
In a randomly chosen 7-day week, there is a probability of 0.05 that the mean daily rainfall is less than 7 mm.
Find the value of μ and of σ.
[7 marks]
11. Anu uses public transport to go to school each morning. The time she waits each morning for the transport is normally distributed with a mean of 12 minutes and a standard deviation of 4 minutes.
(a) On a specific morning, what is the probability that Anu waits more than 20 minutes?
(b) During a particular week (Monday to Friday), what is the probability that
(i)her total morning waiting time does not exceed 70 minutes?
(ii)she waits less than 10 minutes on exactly 2 mornings of the week?
(iii)her average morning waiting time is more than 10 minutes?
(c) Given that the total morning waiting time for the first four days is 50 minutes, find the probability that the average for the week is over 12 minutes.
(d) Given that Anus average morning waiting time in a week
is over 14 minutes, find the probability that it is less than
15 minutes.
[20 marks]
1 Combining random variables 15
f
1E T he distribution of sums and
averages of samples
In this section we shall look at how to find the distribution of the sample mean or the sample total, even if we do not know the original distribution.
The graph alongside shows 1000 observations of the roll of a fair die.
It seems to follow a uniform distribution quite well, as we would x expect.
123456
However, if we look at the sum of 2 dice 1000 times the distribution looks quite different.
f
f
x 20 40 60 80 100 120
TThheerereaareremmaannyyooththeerr ddisistrtirbibuutitoionnsswwhhicichhhhaavveeaa ssimimilialarrsshhaappee,,ssuucchhaassththee CCaauucchhyyddisistrtirbibuutitoionn..ToTosshhoowwththaat t ththeesseessuummssfoformrmaannoormrmaal l ddisistrtirbibuutitoionnwweenneeeeddtotouusseemoment gmenoemraentitngefnuenrcatitoinngs,fuwnhcitciohnasr,e wwehllicbheyaorendwtehlilsbceoyuornsed. this course.
x
1 2 3 4 5 6 7 8 9 10 11 12
The sum of 20 dice is starting to form a more familiar shape. The sum seems to form a normal distribution. This is more than a coincidence. If we sum enough independent observations of any random variable, the result will follow a normal distribution. This result is called the Central Limit Theorem or CLT. We generally take 30 to be a sufficiently large sample size to apply the CLT.
As we saw in Section 1D, if a variable is normally distributed
then a multiple of that variable will also be normally distributed.
∑ Since
X=1 n
n 1
Xi
it follows that the mean of a sufficiently large
sample is also normally distributed. Using Key point 1.5 where
E(X)
=
E(X)
and
Var(X) =
Var (X),
n
we
can
predict
which
normal distribution is being followed:
Key Point 1.8
Central Limit Theorem
For any distribution if E(X) = µ, Var(X) = σ2 and n ≥ 30,
then the approximate distributions of the sum and the
mean are given by: n ( ) ∑Xi ~ N nµ, nσ2 i =1
X
~
N

µ,
σ2 n

16 Topic 7 Option: Statistics and probability
Worked example 1.8
Esme eats an average of 1900 kcal each day with a standard deviation of 400 kcal. What is the probability that in a 31-day month she eats more than 2000 kcal per day on average?
Check conditions for CLT are met
State distribution of the mean Calculate the probability
Since we are finding an average over 31 days we can use the CLT.
X
~ N  1900,
4002 31

P (X > 2000) = 0.0820 (3SF from GDC)
Exercise 1E
1. The random variable X has mean 80 and standard deviation 20. State where possible the approximate distribution of:
(a) (i) X if the sample has size 12.
(ii) X if the sample has size 3.
(b) (i) X if the average is taken from 100 observations.
(ii) X if the average is taken from 400 observations.
i=50
(c) (i) ∑Xi i=1
i=150
(ii) ∑ Xi i=1
2. The random variable Y has mean 200 and standard deviation 25. A sample of size n is found. Find, where possible, the probability that:
(a) (i) P(Y < 198) if n = 100 (ii) P(Y < 198) if n = 200 (b) (i) P(Y < 190) if n = 2 (ii) P(Y < 190) if n = 3
(c) (i) P( Y 195 > 10) if n = 100
(ii) P( Y 201 > 3) if n = 400
∑ (d) (i)
 i=50
P 
Yi
i=1
>
10
 500
∑ (ii)
 i=150
P 
Yi
i=1
29
 500

3. Random variable X has mean 12 and standard deviation 3.5.
A sample of 40 independent observations of X is taken. Use
the Central Limit Theorem to calculate the probability that the
mean of the sample is between 13 and 14.
[5 marks]
4. The weight of a pomegranate, in grams, has mean 145 and
variance 96. A crate is filled with 70 pomegranates. What is the
probability that the total weight of the pomegranates in the crate
is less than 10 kg?
[5 marks]
5. Given that X ~ Po(6), find the probability that the mean of 35 independent observations of X is greater than 7. [6 marks]
1 Combining random variables 17

6. The average mass of a sheet of A4 paper is 5 g and the standard deviation of the masses is 0.08 g.
(a) Find the mean and standard deviation of the mass of a ream of 500 sheets of A4 paper.
(b) Find the probability that the mass of a ream of 500 sheets is within 5 g of the expected mass.
(c) Explain how you have used the Central Limit Theorem in
your answer.
[7 marks]
7. The times Markus takes to answer a multiple choice question are normally distributed with mean 1.5 minutes and standard deviation 0.6 minutes. He has one hour to complete a test consisting of 35 questions.
(a) Assuming the questions are independent, find the probability that Markus does not complete the test in time.
(b) Explain why you did not need to use the Central Limit
Theorem in your answer to part (a).
[6 marks]
8. A random variable has mean 15 and standard deviation 4. A large number of independent observations of the random variable is taken. Find the minimum sample size so that the probability that the sample mean is more than 16 is less than 0.05. [8 marks]
Summary
• When adding and multiplying all the data by a constant:
the expectation of variables generally behaves as you would expect:
E(aX + c) = aE(X) + c
( ) E a1X1 ± a2 X2 = a1E(X1 ) ± a2E(X2 )
the variance is more subtle:
Var(aX + c) = a2Var (X)
Var (a1X1 ± a2 X2 ) = a12Var (X1 ) + a22Var (X2 )
when X1 and X2 are independent. • A more general result about the expectation of a function of a discrete random variable is:
E( g (X)) = ∑ g (xi )pi . For a continuous random variable with probability density function i
f (x): E(g (X)) = ∫ g (x) f (x)dx.
• For the sum of independent random variables: E(a1X1 ± a2X2) = a1E(X1) ± a2E(X2) Var(a1X1 ± a2X2) = a12Var(X1) ± a22Var(X2), note that Var(X Y) = Var(X) + Var(Y).
• For the product of two independent variables: E(XY) = E(X)E(Y).
For a sample of n observations of a random variable X, the sample mean X is a random
variable
with
mean
E(X)
=
E(X)
and
variance
Var(X) =
Var (X).
n
18 Topic 7 Option: Statistics and probability
∑ ∑ •
For
the
sample
sum
E 
n i=1 Xi 
=
nE(X)
and
Var
 
n i=1 Xi 
=
nVar(X).
When we combine different variables we do not normally know the resulting distribution. However there are two important exceptions:
1. A linear combination of normal variables also follows a normal distribution. If X and Y are random variables following a normal distribution and Z = aX + bY + c then Z also follows a normal distribution.
2. The sum or mean of a large sample of observations of a variable follows a normal distribution, irrespective of the original distribution this is called the Central
Limit Theorem. For any distribution if E(X) = µ, Var(X) = σ2 and n ≥ 30 then the
approximate distributions are given by:
( ) n
∑Xn ~ N nµ, nσ2
i =1
X
~
N

µ,
σ2 n

1 Combining random variables 19
Mixed examination practice 1
This chapter does not usually have its own examination questions, so the examples below are
parts of longer examination questions. 1. X is a random variable with mean μ and variance σ2. Y is a random variable with mean m and variance s2. Find in terms of μ, σ, m and s:
(a) E(X 2Y )
(b) Var(X 2Y )
(c) Var(4X)
(d) Var (X1 + X2 + X3 + X4 ) where Xi is the ith observation of X.

[4 marks]
2. The heights of trees in a forest have mean 16 m and variance 60 m2. A sample of 35 trees is measured.
(a) Find the mean and variance of the average height of the trees in the sample.
(b) Use the Central Limit Theorem to find the probability that the average
height of the trees in the sample is less than 12 m.
[5 marks]
3. The number of cars arriving at a car park in a five minute interval follows
a Poisson distribution with mean 7, and the number of motorbikes follows
Poisson distribution with mean 2. Find the probability that exactly 10 vehicles
arrive at the car park in a particular five minute interval.
[4 marks]
4. The number of announcements posted by a head teacher in a day follows a
normal distribution with mean 4 and standard deviation 2. Find the mean and
standard deviation of the total number of announcements she posts in a
five-day week.
[3 marks]
5. The masses of men in a factory are known to be normally distributed with
mean 80 kg and standard deviation 6 kg. There is an elevator with a maximum
recommended load of 600 kg. With 7 men in the elevator, calculate the probability
that their combined weight exceeds the maximum recommended load.

[5 marks]
6. Davina makes bracelets using purple and yellow beads. Each bracelet consists of
seven randomly selected purple beads and four randomly selected yellow beads.
The diameters of the beads are normally distributed with standard deviation
0.4cm. The average diameter of a purple bead is 1.5 cm and the average
diameter of a yellow bead is 2.1 cm. Find the probability that the length of the
bracelet is less than 18 cm.
[7 marks]
20 Topic 7 Option: Statistics and probability

7. The masses of the parents at a primary school are normally distributed with mean 78 kg and variance 30 kg2, and the masses of the children are normally distributed with mean 33 kg and variance 62 kg2. Let the random variable P represent the combined mass of two randomly chosen parents and the random variable C the combined mass of four randomly chosen children.
(a) Find the mean and variance of C P.
(b) Find the probability that four children have a mass of more than two
parents.
[6 marks]
8. X is a random variable with mean μ and variance σ2. Prove that the
expectation of the mean of three observations of X is μ but the standard
σ deviation of this mean is 3 .
[7 marks]
9. An animal scientist is investigating the lengths of a particular type of fish. It is
known that the lengths have standard deviation 4.6 cm. She wishes to take a
sample to estimate the mean length. She requires that the standard deviation
of the sample mean is smaller than 1, and that the standard deviation of the
total length of the sample is less than 22. What is the smallest sample size she
could take?
[6 marks]
10. The marks in a Mathematics test are known to follow a normal distribution with mean 63 and variance 64. The marks in an English test follow a normal distribution with mean 61 and variance 71.
(a) Find the probability that a randomly chosen mark in English is higher than a randomly chosen Mathematics mark.
(b) Find the probability that the mean of 12 English marks is higher than the
mean of 12 Mathematics marks.
[9 marks]
11. The masses of loaves of bread have mean 802 g and standard deviation σ. The
probability that a box containing 40 loaves of bread has mass under 32 kg is
0.146. Find the value of σ.
[7 marks]
1 Combining random variables 21
In this chapter you will learn about:
the probability distribution describing the number of trials until a success occurs: the geometric distribution
the probability distribution describing the number of trials until a fixed number of successes occur: the negative binomial distribution
an algebraic function which can help us to combine probability distributions: the probability generating function.
Exam hint You can
fpaipnrnrododbbgacaeubobmimliilutiiteeliaetsrstiicvoen your calculator.
2 More about statistical
distributions
When we meet a random situation that we wish to model we could return to the ideas of random variables covered in the core syllabus and write out a list of all possible outcomes and their probabilities. However, as with the Binomial and Poisson distributions, it is often easier to simply recognise a situation and apply a known distribution to it. In this chapter we shall meet two new distributions which can be used to model more situations, and then meet a technique which can be used to combine distributions together.
2A G eometric distribution
If there is a series of independent trials with only two possible outcomes and an unchanging probability of success, then the geometric distribution models the number of trials x until the first success. It only depends upon p, the probability of a success. If X follows a geometric distribution we use the
notation X ~ Geo( p).
To calculate the probability of X taking any particular value, x, we use the fact that there must be x 1 consecutive failures (each with probability q = 1 p) followed by a single success. This gives the following probability mass function.
Key Point 2.1
If X ~ Geo( p) then P(X = x) = pqx1 for x = 1,2,3,…
It is useful to apply similar ideas to get a result for P(X > x). For this situation to occur we must have started with x consecutive
failures, therefore P(X > x) = qx.
You are not required to know the derivation of the expectation and variance of the geometric distribution, you only need to use the results, which are:
22 Topic 7 Option: Statistics and probability
Key Point 2.2
If X ~ Geo( p) then:
E(X) = 1 and Var(X) = q
p
p2
Worked example 2.1
(a) A normal six-sided die is rolled. What is the probability that the first 3 occurs (i) on the fifth throw? (ii) after the fifth throw?
(b) W hat is the expected number of throws it will take until a 3 occurs?
Define variables Identify the distribution Apply the formula for P(X = x)
Apply the formula for P(X > x) Apply the formula for E(X)
(a) (i) X = Number of throws until the first 3
X
~
Geo
 
1 6 
P(X
=
5)
=
1 6
×

5 6

4
= 0.0804 (3SF)
(ii)
P(X
>
5)
=

5 6

5
= 0.402 (3SF)
(b)
E(X) =
1 1
=6
 6
Exercise 2A
1. Find the following probabilities:
(a)
(i)
P(X
=
5 )
if
X
~
Geo
 
1 3
(ii)
P(X
= 7)
if
X
~
Geo
 
1 10
 
(b)
(i)
P(X
5 )
if
X
~
Geo
 
1 4
 
(ii)
P(X
< 4)
if
X
~
Geo
 
2 3
(c)
(i)
P(
X
>
10 )
if
X
~
Geo 
1 6
 
(ii) P(X ≥ 20) if X ~ Geo(0.06)
(d) (i) The first boy born in a hospital on a given day is the 4th baby born (assuming no multiple births).
(ii) A prize contained in 1 in 5 crisp packets is first won with the 8th crisp packet.
2 More about statistical distributions 23
2. Find the expected mean and standard deviation of:
(a)
(i)
Geo 
1 3
 
(ii) Geo(0.15)
(b) (i) The number of attempts to hit a target with an arrow
(there is a 1 in 12 chance of hitting the target on any
given attempt).
(ii) The number of times a die must be rolled up to and including the first time a multiple of 3 is rolled.
3. The probability of passing a driving test on any given attempt is 0.4 and the attempts are independent of each other.
(a) Find the probability that you pass the driving test on your third attempt.
(b) Find the expected average number of attempts needed to
pass the driving test.
[5 marks]
In some countries the name geometric distribution refers to a distribution that models the number of failures before the first success. This is not the convention used in the IB, but it does demonstrate that mathematics is not an absolutely universal language.
4. There are 12 green and 8 yellow balls in a bag. One ball is drawn from the bag and replaced. This is repeated until a yellow ball is drawn.
(a) Find the expected mean and variance of the number of balls drawn.
(b) Find the probability that the number of balls drawn is at most one standard deviation from the mean. [7 marks]
5. If X ~ Geo( p), prove that ∑P(X = i) = 1.
i =1

6. If X ~ Geo( p), find the mode of X.
[4 marks] [3 marks]
7. If T ~ Geo(p) and P(T = 4) = 0.0189, find the value of p.

[3 marks]
8. Y ~ Geo(p) and the variance of Y is 3 times the mean of Y.
Find the value of p.
[3 marks]
9.
(a) If
X
~
Geo 
3 4
 
,
find
the
smallest
value
of
x
such
that
P(X = x) < 106. (b) Find the smallest value of x such that P(X > x) < 106.

[5 marks]
10. Prove that the standard deviation of a variable following a geometric distribution is always less than its mean. [5 marks]
2B N egative binomial distribution
The negative binomial distribution is an extension of the geometric distribution. It models the number of trials before the rth success. If X follows a negative binomial distribution, we
24 Topic 7 Option: Statistics and probability
write this as X ~ NB(r, p), where p is the probability of success for each trial.
For X to take a particular value, x, there must be r 1 successes in the first x 1 trials followed by a success on the xth trial. But the probability of r 1 successes in x 1 trials can be found using the binomial distribution, and then we multiply this result by p. This gives the following probability mass function.
Key Point 2.3
If X ~ NB(r, p), then
P(X
=
x)
=
 
x r
1 1
pr q x r
for x = r,r + 1,…
Expectation algebra can be used to link the mean and variance of the negative binomial distribution to the mean and variance of the geometric distribution. See Exercise 2B, question 8.
The results are: Key Point 2.4
If X ~ NB(r, p) then:
E(X) =
r p
and
Var(X) =
rq p2
Why is the negative binomial distribution given this name? The answer has to do with the extension of the binomial expansion into negative powers.
Worked example 2.2
A voucher is placed in 2 of all cereal boxes of a particular brand. Three of these vouchers can 11
be exchanged for a toy.
(a)Find the probability that exactly 8 boxes of this cereal need to be opened to get enough vouchers for a toy.
(b)Find the expected number of boxes which need to be opened to get enough vouchers for a toy.
Define variables Identify the distribution Apply the formula for P(X = x)
Apply the formula for E(X)
(a) X = Number of boxes opened until 3 vouchers found
X ~ NB  3, 211
P
(X
=
8)
=
8  3
1 1
 
2 3 11
 
9 11
5
= 0.0463 (3SF)
(b)
E(X) =
3 2
=
16.5
 11
2 More about statistical distributions 25
Exercise 2B
1. Find the probabilities:
( ) (a) (i) P(X = 3) if X ~ NB 2, 0.8
(ii) P(X = 7) if X ~ NB(3, 0.3)
(b)
(i)
P(X
=
3 )
if
X
~
NB
 
5,
9 10
(ii)
P(X
=
4 )
if
X
~
NB
 
7,
2 3
(c)
(i)
P(X
4 )
if
X
~
NB 3,
4 5
(ii)
P(X
>
5 )
if
X
~
NB
 
3,
1 2
(d) (i) Seeing your 3rd six on the 10th roll of a die. (ii) Getting your 5th head on the 9th flip of a coin.
(e) (i) Taking fewer than 6 attempts to roll your second one on a die.
(ii) Taking more than 5 attempts to pick your second heart from a standard suit of cards (with cards being replaced).
2. Find the expected mean and standard deviation of:
(a) (i) NB(2, 0.8)
(ii) X ~ NB(3, 0.3)
(b)
(i)
NB
 
n,
1 n
 
(ii)
NB
 
2n
+ 1,
1 2n
(c) (i) The number of rolls required to roll 3 sixes on a
standard die.
(ii) The number of tosses required to get 5 tails using a fair coin.
3. A magazine publisher promotes his magazine by putting a
concert ticket at random in one out of every five magazines.
If you need 4 tickets to take friends to the concert, what is the
probability that you will find your last ticket when you buy the
20th magazine?
[4 marks]
4. In a party game, players need to either sing or draw. 30 pieces
of paper are placed in a hat, with an equal number of sing and
draw instructions. Players take turns to take an instruction at
random and then return it to the hat. Find the probability that
the fifth person to sing is the tenth player.
[4 marks]
5. Given that X ~ NB(4, 0.4): (a) State the mean and variance of X. (b) Find the mode of X.
[5 marks]
26 Topic 7 Option: Statistics and probability
6. A discrete random variable X follows the distribution NB(r, p). If
the sum of the mean and the variance of X is 10, find and simplify
an expression for r in terms of p.
[4 marks]
7. In a casino game Ruben rolls a die and whenever a one or a six is rolled he receives a token. The game ends when Ruben has received y tokens; he then receives $x, where x is the number of rolls he has made.
(a) The probability of the game ending on the sixth roll is 40 . 729
Find the value of y.
(b) The casino wishes to make an average profit of $3 per game. How much should it charge to play the game?
(c) What is the standard deviation in the casinos profit per
game?
[7 marks]
8. Let X1, X2,…, X12 be independent random variables each having a geometric distribution with probability of success p.
r
Let Y = ∑Xi 1
(a) Explain why the random variable Y has a negative binomial distribution.
(b) Hence prove that the variance of the negative binomial
distribution NB(r, p) is rq . p2
[6 marks]
2C P robability generating functions
We have found formulae for the mean and variance resulting from adding independent variables. However it is also useful to calculate probabilities of the sum of independent variables. For discrete variables we can use a technique called a Probability Generating Function, which links probability with algebra and calculus.
Suppose that X and Y are discrete random variables, which can only take positive integer values. If the variable Z is their sum then we can work out the probability that Z = 2. This could happen in three different ways: (A = 0 and B = 2) or (A = 1 and B = 1) or (A = 2 and B = 0).
If X and Y are independent we can then write that:
P(Z = 2) = P(A = 0)P(B = 2)+ P(A = 1)P(B = 1) + P(A = 2)P(B = 0)
2 More about statistical distributions 27
This may remind you of a situation where you multiply two long polynomials together, for example:
(a0 + a1t + a2t2 )(b0 + b1t + b2t2 )
We can write the coefficient of t2 in the result as aob2 + a1b1 + a2b0. This suggests that there may be some benefit in writing discrete probability distributions as polynomials with the coefficient of t x being the probability of the random variable taking the value x. Key Point 2.5
The probability generating function of the discrete random variable X is given by:
G(t ) = ∑ P (X = x)t x x
In this expression t has no real-world meaning. It is a dummy variable used to keep track of the value X is taking.
An alternative definition, which we shall only use for some proofs involving generating functions, is: Key Point 2.6
The probability generating function of the discrete random variable X is also given by:
G(t) = E(t X )
This is a direct result of the expectation of a function of a variable (Key point 1.2).
Worked example 2.3
Write down the probability generating function for the distribution below.
x
2
3
5
6
7
P(X = x)
0.1
0.4
0.3
0.15
0.05
G (t) = 0.1t2 + 0.4t3 + 0.3t5 + 0.15t6 + 0.05t7
Although these generating functions have several interesting features, as polynomials they are a very complicated way of writing a probability distribution. They become much more powerful when we can rewrite the polynomial in a simpler way.
28 Topic 7 Option: Statistics and probability
Worked example 2.4
Find and simplify an expression for the probability generating function of the random variable
X
where
X
~
B
 
4,
1 3
.
This can be recognised as a binomial expansion
P (X
=
x)
=
 
4 x
 
2 3
4 x
 
1 3
x
G (t )
=
 
2 3 
4
+
4
 
2 3 
3
 
1 3 
t
+
6
 
2 3 
2
 
1 3 
2
t2
+
4
 
2 3 
 
1 3 
3
t3
+
 
1 3 
4
t 4
=
 
2 3 
4
+
4
 
2 3
3
 
1 3
t
+
6 
2 3 
2
 
1 3
t
2
+
4
 
2 3 
 
1 3
t
3
+
 
1 3
t 
4
=
 
2 3
+
1 3
t
4
Most of the important properties of the probability generating function come from its polynomial form, but in most applications we will try to use it in some other form.
The first property comes from considering G (1): G(1) = P(X = 0) + P(X = 1) ×1+ P(X = 2) ×12 …
But this is just the sum of the probabilities of any value of X occurring, which is one.
Key Point 2.7
For any probability generating function:
G(1) = 1
The second property comes from considering the derivative of G(t) with respect to t:
G(t) = P(X = 0) × 0 + P(X = 1) ×1+ P(X = 2) × 2t + P(X = 3) × 3t2 …
Using the same method as above, by setting t = 1 we get a known expression:
G(1) = P(X = 0) × 0 + P(X = 1) ×1+ P(X = 2) × 2 + P(X = 3) × 3…
Key Point 2.8
G(1) = E(X)
2 More about statistical distributions 29
If we differentiate the definition of a probability generating function twice and then set t = 1 we get:
∑ G(t ) = x (x 1)t x2P(X = x) G(1) = ∑x (x 1)P(X = x)
= ∑(x2 x)P(X = x) = ∑x2P(X = x) ∑xP(X = x)
= E(X2 ) E(X)
Therefore, since Var (X) = E(X2 ) [E(X)]2:
Key Point 2.9
Var(X) = G(1) + G(1) (G(1))2
Worked example 2.5
If G(t ) = e3t3 find E(X) and Var (X).
Find G(t ) and G(t ) Use formula for expectation
G(t) = 3e3t3 G(t) = 9e3t3
E(X ) = G(1) = 3
Use formula for variance
Var(X ) = G(1) + G(1) (G(1))2
= 9 + 3 32 =3
As well as finding the expectation and the variance we can use probability generating functions to find probabilities. We want to isolate just one coefficient in the polynomial and we can do this by differentiating until the coefficient we want is a constant term, and then setting t = 0: For example:
G(t) = P(X = 0) + P(X = 1)t + P(X = 2)t2 … therefore G(0) = P(X = 0). G(t) = P(X = 1) + P(X = 2) × 2t + P(X = 3) × 3t2 … therefore G(0) = P(X = 1). G(t) = P(X = 2) × 2 + P(X = 3) × 6t + P(X = 4) ×12t2 … therefore G(0) = 2P(X = 2).
30 Topic 7 Option: Statistics and probability
In general we find the following probability mass function. Key Point 2.10
P(X = n) = 1 G(n)(0)
n!
Exercise 2C
1. Find the probability generating function for each of the following distributions:
(a) X
1 2 3
P(X = x) 0.5 0.2 0.1
45 6 0 0.05 0.05
(b) X
1
P(X = x)
0.3
2 0.3
3 0.3
4 0.1
2. For each of the following probability generating functions find P(X = 1):
(a) (i) G(t ) = 0.6 + 0.4t (ii) G(t ) = 0.3 + 0.4t + 0.3t2
(b) (i) G(t ) = 0.1t2 + 0.9 (ii) G(t ) = t
(c) (i) G(t) = (1+ t)2
4
(d) (i) G(t ) = e5t5
(ii) G(t) = (1+ t)3
8
(ii)
G (t
)
=
1
0.4t 0.6t

3. A discrete random variable can take any value in . It has a
probability generating function of G(t ) = ea(t1). Find the mean
and the variance in terms of a.
[5 marks]
4. A discrete random variable Y can take the values 2, 3, 4, … and
1t2
has
a
probability
generating
function
G (t )
=
9 1
2t 9
2 
.
(a) Find the probability that Y = 2.
(b) Find the expectation and variance of Y.
[4 marks]
5. Prove that if X ~ B(n, p) then the probability generating
function of X is (q + pt )n where q = 1 p.
[4 marks]
6. A discrete random variable X has a probability generating
function G(t ) = Ae(t2).
(a) Find the value of A.
(b) Find E(X).
(c) Find Var(X).
[7 marks]
2 More about statistical distributions 31
7. A random variable X has a probability generating function
G(t ). Show that the probability that X takes an even value is
1 (1+ G(1)).
2

[4 marks]
8. A random variable X has probability generating function
G(t) = k 1.
k (a) Prove
t by
induction
that
dn dt n
G (t
)
=
n!(k 1) (k t )n+1
.
(b) Hence or otherwise find the probability distribution of X in terms of k.
(c) Find the expectation and variance of X in terms of k.

[14 marks]
9. A discrete random variable X has probability generating
function GX (t ). If Y = aX + b show that the probability generating function of Y is given by GY (t ) = tbGX (ta ).
Hence prove that E(Y ) = aE(X) + b and that Var (Y ) = a2Var(X).

[13 marks]
2D U sing probability generating functions to find the distribution of the sum of discrete random variables
Each of the discrete distributions you already know has a probability generating function:
Key Point 2.11
Distribution B(n, p)
Geo(p)
Po(l)
Probability generating function (q + pt)n
pt 1 qt
eλ (t 1)
We now return to the original purpose of probability generating functions: finding the probability distribution of the sum of independent random variables.
When we have two distinct generating functions for the random variables X and Y we shall label them as GX (t) and GY (t). We can find the probability generating function of Z = X + Y by using the definition of probability generating functions given in Key point 2.6:
GZ (t ) = E(t Z ) = (E t X+Y ) = E(t X × tY ) = E(t X ) × E(tY ) = GX (t ) × GY (t )
32 Topic 7 Option: Statistics and probability
In the penultimate step we used Key point 1.4 which requires that X and Y are independent. We can therefore state that:
Key Point 2.12
If Z = X + Y where X and Y are independent:
GZ (t ) = GX (t ) × GY (t)
Worked example 2.6 Find the probability generating function of the negative binomial function.
The negative binomial distribution is the sum of r geometric distributions:
i =r
If X ~ NB(r,p) and Yi ~ Geo(p) then X = ∑Yi i=1
Therefore the probability generating function is the product of the generating functions of r
geometric distributions:
G
(t
)
=
 
1
pt qt
 
r
Exercise 2D
1. A football team gets three points for a win, one point for a draw and no points for a loss.
St Atistics football team win 40% of their matches, draw 30% and lose the rest. X is the number of points they receive from one game.
(a) Find the probability generating function for X.
(b) St Atistics play ten matches in their season. The results of their matches are independent. Find the probability generating function of Y, their total number of points.
(c) Find the expected number of points at the end of the
season.
[7 marks]
2. Prove using generating functions that if X and Y are independent
random variables then E(X + Y ) = E(X) + E(Y ). 
[5 marks]
3. Prove that the sum of two Poisson variables also follows a
Poisson distribution.
[4 marks]
4. If X ~ B(n, p) and Y ~ B(m, p) prove that X + Y also follows a
binomial distribution and state its parameters.
[5 marks]
5. A textbook contains short questions, worth one mark each, and long questions worth four marks each. 30% of questions are short questions. Let M be the number of marks for answering one question correctly.
2 More about statistical distributions 33
(a) Find the probability generating function for M. Caroline selects eight questions at random, and answers them all correctly. Let T be her total number of marks.
(b) Write down the probability generating function for T.
(c) Show that she cannot score exactly 15 marks. [12 marks]
Summary
• In this chapter we have met two new distributions:
The number of trials until the first success (geometric).
The number of trials until a specified number of successes (negative binomial).
• For both of these distributions we found the probability mass function and a formula for the expectation and variance, all of which are in the Formula booklet:
For the geometric distribution, if X ~ Geo( p) then: P(X = x) = pqx1 for x = 1,2,3,…
E(X) = 1 and Var(X) = q
p
p2
For the negative binomial distribution, if X ~ NB(r, p) then:
P(X
=
x)
=
 x 1  r 1
pr q x r
for
x
=
r,r
+ 1,…
(q is the probability of a failure, so q = 1 p)
E(X) = r and Var(X) = rq
p
p2
• We met a new technique for writing probability distributions called a probability generating function. The probability generating function of the discrete random variable X is given by
G(t) = ∑ P(X = x)t x and G(t) = E(t X ) . x
For any probability generating function G(1) = 1.
• We saw how probability generating functions can be used to find the expectation and variance
of a random variable: G(1) = E(X) and Var(X) = G(1) + G(1) (G(1))2.
• We saw that each of the discrete distributions we already know has a probability generating function:
Distribution B(n, p)
Geo(p)
Po(l)
Probability generating function (q + pt)n
pt 1 qt
eλ (t 1)
• We saw how probability generating functions can be used to find the probability mass function of a sum of independent random variables:
P(x = n) = 1 G(n)(0)
n!
If Z = X + Y where X and Y are independent, GZ (t ) = GX (t ) × GY (t).
34 Topic 7 Option: Statistics and probability
Mixed examination practice 2
You might want to remind yourself of the binomial, Poisson and normal distributions before reading on.

1. A bag contains a large number of coloured pens, 1 of which are red. Find the
probability that:
3
(a) I have to select exactly 3 pens before I get a red one.
(b) I have to select at least 3 pens before I get a red one.
[6 marks]
2. The probability generating function of the discrete random variable X is given
by G(t ) = k(1+ 2t + t2).
(a) Find the value of k.
(b) Find the mode of X.
[4 marks]
3. Sweets are sold in packets of 20. The probability that a sweet is a fizzy cola bottle is 0.2.
(a) Find the probability that a pack of sweets contains exactly 5 fizzy cola bottles.
(b) Find the probability that I have to buy 10 packets of sweets before I get 4
with exactly 5 fizzy cola bottles.
[8 marks]
4. The masses of apples are normally distributed with mean 136 g and standard deviation 27 g.
(a) Find the probability that an apple has a mass of more than 150 g.
(b) Find the probability that in a pack of six apples at least two have a mass of more than 150 g.
(c) What is the expected number of apples I need to buy before I get two which
have a mass of more than 150 g?
[7 marks]
5. Given that X ~ Geo(p) and that P(X ≤ 10) = 0.175, find the value of p.

[4 marks]
6. (a) Show that if the random variable X has a probability generating function
G(t) then the probability of X taking an odd value is 1 (1 G(1)).
2
( ) (b) X is a random variable such that X ~ B 10, 0.2 . Write down the probability generating function of X.
(c) Y is a random variable such that Y ~ B(12, 0.25). If Z = X + Y, write down the probability generating function of Z.
(d) Find the probability that Z is even.
[12 marks]
2 More about statistical distributions 35
 7. A fair six-sided die is rolled repeatedly.
(a) Find the probability that 5 sixes are obtained from 20 rolls.
(b) Find the probability that the 5th six is obtained on the 20th roll.
(c) Given that the 2nd six is obtained on the 6th roll, find the probability that 5 sixes are obtained from 20 rolls.
(d) Given that 5 sixes are obtained from 20 rolls, find the probability that the 2nd six was rolled on the 6th roll.

[12 marks]
8. Ian has joined a new social networking site. In order to join a particular group he needs to get nine invitations. The probability that he receives an invitation on any given day is 0.8, independently of other days (he never gets more than one invitation in a day).
(a) What is the expected number of days Ian has to wait before he can join the group?
(b) Find the probability that Ian will first be able to join the group on the 14th day.
(c) Given that after 10 days he has had 8 invitations, find the probability that he will first be able to join the group on the 14th day.
(d) Ian joins the group as soon as he receives his 9th invitation. Given that Ian
has joined the group on the 14th day, find the probability that he received
his first invitation on the first day.
[12 marks]
9. The number of letters Naomi receives in a week follows a Poisson distribution with mean 5.
(a) Find the probability that in a particular week she receives more than an average number of letters.
(b) What is the expected number of weeks she has to wait before she receives more than an average number of letters in a week?
(c) Naomi wants to know how long she needs to wait until she has received more than an average number of letters 5 times.
(i) Find the probability that she has to wait exactly 12 weeks.
(ii)What is the most likely number of weeks she has to wait? [9 marks]
10. Random variable X has distribution NB(r, p).
(a)
Show
that
P(X = x P(X =
+ 1) x)
=
x (1 p) .
x r +1
(b) (i)Show that P(X = x + 1) > P(X = x) when x < r 1 and p
P(X = x + 1) < P(X = x) when x > r 1.
p
36 Topic 7 Option: Statistics and probability
(ii) Deduce that X is bimodal only if r 1 is an integer. p
(c) X ~ NB(9, p) has modes 12 and 13. Find the value of p.
[11 marks]
11. A discrete random variable X has probability generating function G(t) = ktet.
(a) Find the value of k. (b) Prove by induction that G(n)(t) = ket(n + t). (c) Hence find P(X = 7). w
[12 marks]
2 More about statistical distributions 37
In this chapter you will learn:
how to convert between the probability mass
function,P(X = x ),
and the cumulative distribution function,
P(X ≤ x)
how to convert between the probability density function and the cumulative distribution function
how to use the cumulative distribution function to find the median and quartiles
how to use the cumulative distribution function to find the distribution of the function of a random variable.
3 Cumulative distribution
functions
Cumulative distribution functions give the probability of a random variable being less than or equal to a particular value. They allow us quickly to find a range of values of a discrete variable. In the past these functions were the only way of tabulating probabilities for continuous random variables, but today we can use our graphical display calculators (GDC) to do this for us. However, the cumulative distribution functions are still a very important tool for working with continuous variables because they connect directly to probabilities, unlike the probability density function.
3A F inding the cumulative probability function
For a discrete variable with probability mass function P(X = x), the cumulative probability function is found by adding up all of the probabilities of values less than or equal to the given value. Frequently the cumulative distribution function will only be defined over a finite domain. At the bottom end of the domain and below it must take the value 0 and at the top end of the domain and above it must take the value 1.
Key Point 3.1
i=x
For a discrete distribution P(X ≤ x) = ∑ pi i = −∞
There is a similar result for a continuous variable. If the
probability density function is f (x) we usually write the
cumulative distribution function as F(x). Key Point 3.2
∫ For a continuous distribution P(X ≤ x) = F (x) = x f (t )dt −∞
Since integration can be undone by differentiation, we can find the probability density function from F(x):
38 Topic 7 Option: Statistics and probability
Key Point 3.3
f (x) = d F(x)
dx
Worked example 3.1
Find the cumulative distribution function (cdf) of a continuous random variable X, which has
a probability density function f (x) = ex for 0 < x < ln2.
State F(x) when x is below and above the range in which f(x) is defined
Use integration to find the cdf. The lower limit is zero since the pdf is zero below this point
If x ≤ 0 : F (x) = 0 If x ≥ ln2 : F (x) = 1
If 0 < x < ln2:
∫ F (x) =
x
e t dt
0
=
[e
t
]x 0
= ex 1
Once we have the cumulative distribution function we can use it
to find the median, quartiles and any other percentiles, since the
pth percentile is defined as the value x such that P(X ≤ x) = p%.
We can write this as F (x) = p .
100
Worked example 3.2
The continuous random variable X has a cumulative distribution function:
0
F (x) = x2
 
1
(a) Find the probability density function of X.
x≤0 0< x <1
x ≥1
(b) Find the lower quartile of X.
pdf is derivative of cdf
(a)f (x) = d F (x) = 2x if 0 < x < 1
d x and zero otherwise
Lower quartile is 25th percentile
(b) At the lower quartile:
F (x) = 0.25
⇒ x2 = 0.25
⇒ x = ±0.5
Decide which solution to choose
f (x) is non-zero only if 0 < x < 1 therefore x = 0.5
3 Cumulative distribution functions 39
All of these techniques may be applied to a function which is defined piecewise.
Worked example 3.3
The continuous random variable W is defined by the probability density function f (w)
w2
f
(w)
=
  
27 7
w
12 12
0≤w≤3 3≤w≤k
(a) Sketch the probability density function.
(b) Find the value of k.
(c) Find E(w).
(d) Find the median of w.
(e) Find the mode of w.
(a)
f (w)
Use the formula for E(W ) split up
over the different domains
w
k
(b) The area under the curved section is
∫3w2
0 27
dw
=
w 3  81
3 0
=
1 3
The remaining area is 2 so 3
∫ k 7 w dw = 2
3 12 12
3
 7w  12
w 2 k 24 3
=
2 3
 
7k 12
k2  24 
 
21 12
9 24 
=
2 3
7k k2 = 49 12 24 24
0 = k2 14k + 49
= (k 7)2
So k = 7.
∫ ∫ (c)
E(W ) =
3
w
×
w
2
dw
+
0 27
7
w
3
×

7 12
1w2 
dw
40 Topic 7 Option: Statistics and probability
continued...
Use GDC to evaluate the definite integrals
The median is the point where
P(W
curved
<
m)
=
1 2
.
section of
The the
area pdf is
under 1 so
the the
3
median must lie in the second section
Using the quadratic equation
The mode corresponds to the highest point on the graph
= 3 + 26 49
= 131 ≈ 3.64 36
(d) We need
m
∫0 f
(w )dw
=
1 2
∫ 1 + m 7 w dw = 1
3 3 12
2
 
7 12
w
w 2 m 24 3
=
1 6

7m 12
m2  24 

21 12
9 24

=
1 6
0 = m2 14m + 37
m = 14 ± 196 148 2
=7±2 3 But the median lies between 3 and 7 so the median is 7 2 3 ≈ 3.54
(e) From the sketch, the mode is when W = 3 .
Exercise 3A
1. Find the cumulative distribution function for each of the following distributions:
(a) (i) P(X = x) = 1 for x = 1, 2, 3, 4, 5
5
(ii) P(X = x) = 1 for x = 1, 2, 3, …, 10
10
(b) (i) P(X = x) = 1 for x = 3, 4, 5, 6
4
(ii) P(X = x) = 1 for x = 0, 0.1, 0.2, …, 0.9
10
2. Find the cumulative distribution function for each of the following probability density functions, and hence find the median of the distribution:
(a) (i)
f
(x)
=
2 0
2x
0< x <1 otherwise
3 Cumulative distribution functions 41
(ii) f (x) = 1x6 0
(b) (i)
f
(x
)
=
 
sin
x
0
(ii)
f
(x
)
=
 
x
1 ln10
0
2<x<6
otherwise 0<x< π
2 otherwise
1 < x < 10
otherwise
3. For each of the following continuous cumulative probability
functions, find the probability density function and the median:
0 (a) (i) F (x) = x 1
x <1 1≤ x <2
1
x≥2
 0 (ii) F (x) = 3x  1
(b) (i)
 0 F (x) = x2 x  1
x<0 0≤ x < 1
3 x≥1
3
x <1 1≤ x < 1+ 5
2 x ≥ 1+ 5
2
 0 (ii) F (x) = sin x  1
x<0
0≤x< π 2
x≥π 2
4. A discrete random variable has the cumulative distribution
function P(X ≤ x) = x (x +1)(2x +1) for x = 1, 2, 3, …, n.
1224
(a) Find P(X = 3).
(b) Find n.
[5 marks]
5. Find the exact value of the 80th percentile of the continuous
random variable Y which has pdf f ( y) = 1 for 1 < y < e.
y

[4 marks]
6. (a) If P(X = x) = x for x = 1, 2, 3, 4 find P(X ≤ x).
10 (b) Find the median of X.
[5 marks]
42 Topic 7 Option: Statistics and probability

7. (a) If P(Y = y) = y for y = 4, 5, 6, 7 find P(Y ≤ y).
22
(b) Find the mode of Y.
[4 marks]
8. A continuous variable X has cumulative distribution function:
0
x<0
F (x) = e2x 1 0 ≤ x < k
1
x≥k
(a) Find the value of k.
(b) Find the probability density function for x.
(c) Find the median of the distribution.
[6 marks]
9. P(X ≤ x) = x3 for x = 1, 2, 3, …, n .
1000 (a) Find the value of n.
(b) Find the probability mass function of X.
[4 marks]
10. The continuous random variable X has the probability density function:
ax3
f
(x
)
=
  
a x
0
0≤ x <1 1≤ x <2 otherwise
(a) Find the value of the parameter a.
(b) Find the expectation of X.
(c) Find the cumulative distribution function of X.
(d) Find the median of X.
(e) Find the lower quartile of X.
(f) What is the probability that X lies between the median and
the lower quartile?
[25 marks]
3B D istributions of functions of a continuous random variable
Using this discrete distribution
x
-1
0
1
P(X = x)
3
1
7
11
11
11
you can find the distribution of a random variable Y which is related to X by the formula Y = X2 + 3 by simply listing all the
3 Cumulative distribution functions 43
possible values of Y and their probabilities (remembering that y = 4 when x = 1 or 1):
y P(Y = y)
3
4
1
10
11
11
There are many situations where we would like to do the same thing with a continuous random variable but this is much more difficult as we cannot access probabilities directly using the probability density function. We must use the cumulative function instead and then differentiate it to find the probability density function.
Worked example 3.4
X is the crv length of the side of a square and X has pdf f (x) = 1 for 1 < x < 3. Find the
probability density function of Y, the area of the square.
2
We need to relate F(x) to G(y)
Use the fact that Y = X2 Solve the inequality
Write in terms of cumulative probabilities Write in terms of the cdf of X
Remember that F(x) = 0 when x < 1 Consider the domain of F(x) pdf is the derivative of cdf
The cdf of X is
F(x) = x 1 , 1< x < 3
22 The cdf of Y is G(y )
G(y ) = P (Y < y )
= P(X 2 < y )
( ) = P y < X < y
(= P X < ) ( y P X < y ) =F( y )F( y )
= y 1 0 22
This is only true if 1 < y < 3 i.e. 1 < y < 9
g
(y
)
=
d dy
 
y 2
1 2 
= 1, 4y
1< y <9
exam hint
This manipulation is challenging. Thankfully, it has only rarely appeared on examination questions.
44 Topic 7 Option: Statistics and probability
The general method for finding the probability density function is: Key Point 3.4
If X has cdf F(x) for a < x < b and Y = g(X) (where g(X) is a 1-to-1 function) then the probability density function of Y, h(y), is given by:
•  Relating H(y) to F ( g 1(y)) by rearranging the inequality in P(Y ≤ X) = P( g(X) ≤ y).
•  Differentiating H(y) with respect to y.
•  Writing the domain of h(y) by solving the inequality
a < g 1 ( y) < b.
Exercise 3B
1. X is a continuous random variable with pdf
f
(x) =
4 x5
for
x
> 1.
If Y
=
1 X2
,
show
that
Y
has
pdf
g ( y) = 2y, 0 < y < 1. 
[7 marks]
2. The volume (V) of a spherical soap bubble follows a continuous
( ) uniform distribution: f (v) = 1 for v ∈ 0, 10 . 10
(a) Find the cumulative distribution function of V.
(b) Hence find the probability density function of R, the radius
of the bubble. 
[6 marks]
3. X is a continuous random variable with pdf
f (x) = 3 x2, 1 < x < 3.
26
(a) Find the cumulative distribution function of X.
(b) If Y = 1 , find the probability that Y > 3.
X
4
(c) Find the probability density function of Y. 
[13 marks]
4. X is a continuous random variable with pdf
f (x) =1 0 < x <1
Three independent observations of X are made. Find the
probability density function of Y where Y = max(X1, X2, X3).

[4 marks]
3 Cumulative distribution functions 45
Summary
• The cumulative distribution function gives the probability of the random variable taking a value less than or equal to x.
• For a discrete distribution with probability mass function P(X = x):
i=x
P(X ≤ x) = ∑ pi i= −∞
• For a continuous distribution with pdf f (x):
P(X
x)
=
F (x)
=
x
∫−∞
f
(t )dt
and
f
(x)
=
d dx
F(x)
• The main uses of cumulative distribution functions are finding percentiles of a distribution and converting from a distribution of one continuous variable to a distribution of a function of that variable.
• If X has cdf F(x) for a < x < b and Y = g(X) (where g(X) is a 1-to-1 function) then the probability density function of Y, h( y), is given by:
1. Relating H(y) to F(g 1(y)) by rearranging the inequality in P(Y ≤ X) = P(g (X) ≤ y).
2. Differentiating H(y) with respect to y.
3. Writing the domain of h(y) by solving the inequality a < g 1 ( y) < b.
46 Topic 7 Option: Statistics and probability
Mixed examination practice 3
1. The continuous random variable Y has probability density
g ( y) = ky (1 y), 0 < y < 1.
Find the cumulative distribution function of Y. 
[6 marks]
2. The continuous random variable X has the probability density function
f (x) = x + k 5, 5 ≤ x ≤ 6.
(a) Find the cumulative distribution function of X.
(b) Find the exact value of the median of X. 
[9 marks]
3. If the continuous random variable X has pdf f (x) = 3 (1 x2 ), 1 < x < 1 find
4
the interquartile range of X. 
[7 marks]
4. The continuous random variable X has cdf F (x) = 1 (8x x2 7), 1 < x < 3 .
8
Find the probability that in four observations of X more than two observations
take a value of less than two. 
[5 marks]
5. The continuous random variable X has cdf F (x) = cx3, a < x < b. The median of
F (x) is 3 4. Find the values of a, b and c. 
[6 marks]
6. The number of beta particles emitted from a radioactive substance is modelled by a Poisson distribution with a mean of 3 emissions per second. X is the discrete random variable Number of emissions in n seconds.
(a) Write down the probability distribution of X.
(b) Find an expression for the probability that there are no emissions in a period of n seconds.
(c) Y is the continuous random variable Time until first emission. Using your answer to (b) find the probability density function of Y.
(d) Find P(0.5 < Y < 1).
[10 marks]
3 Cumulative distribution functions 47
In this chapter you will learn:
about finding a single value to estimate a population parameter
about estimating an interval in which a population parameter lies, called a confidence interval
how to find the confidence interval for the mean when the true variance is known
how to find the confidence interval for the mean when the true variance is unknown.
Are there other areas of knowledge in which we have to balance usefulness against truth?
We shall look more at the theory of unbiased estimators in Section 4B.
4 Unbiased estimators and
confidence
intervals
In the statistics sections of the core syllabus, you should have looked exclusively at finding statistics of samples. However, we are often interested in using the sample to infer the parameters for the entire population. Unfortunately, the sample statistic does not always give us the best estimate of the population parameter. Even if we find the best single number to estimate the population parameter it is unlikely to be exactly correct. There are some situations where it is more useful to have a range of values in which we are reasonably certain the population parameter lies. This is called a confidence interval.
4A U nbiased estimates of the mean and variance
Generally the true mean of the whole population is given the symbol μ and the true standard deviation is given the symbol σ. We can only use our sample mean x to estimate the population mean μ. Although we do not know how inaccurate this might be, we do know that it is equally likely to be an underestimate or an overestimate. The expected value of the sample mean is the population mean. We say that the sample mean is an unbiased estimator of the population mean.
Unfortunately, things are more complicated for the variance. The variance of a sample sn2 is a biased estimator of σ2. This means that the sample variance tends to get the population variance wrong in one particular direction. To illustrate how this happens, we can look at a slightly simpler measure of spread: the range. A sample can never have a larger range than the whole population, but it has a smaller range whenever it does not include both the largest and smallest value in the population. The range of a sample can therefore be expected to underestimate the range of the population. A similar idea applies to variance: sn2 tends to underestimate σ2.
48 Topic 7 Option: Statistics and probability
Fortunately (using some quite complex maths) there is a value we can calculate from the sample which gives an unbiased estimate for the variance. It is given the symbol sn21 .
Key Point 4.1
sn21
=
n
n
1
sn2
is an unbiased estimator of σ2.
Unfortunately this does not mean that sn1 is an unbiased estimate of σ, but it is often a very good approximation.
Exam hint Make sure you always know whether you are being
asked to find sn or sn1 , and how to select the correct option on your calculator.
See Mixed exam­ ination practice question 4 at the end of this chapter for a demonstration of this problem.
Worked example 4.1
The IQ values of ten 12-year-old boys are summarised below:
∑ x = 1062, ∑ x2 = 114 664.
Find the mean and standard deviation of this sample. Assuming this is a representative sample of the whole population of 12-year-old boys, estimate the mean and standard deviation of the whole population.
Use the formulae for x and sn
Sn1 =
n
n
1Sn
n = 10 x = 1062 = 106.2
10
sn =
114664 106.22 10
= 13.7 (3SF)
sn1 =
10 × 13.7 = 14.5 (3SF) 9
For the whole population we can estimate the mean as 106.2 and the standard deviation as 14.5
4 Unbiased estimators and confidence intervals 49
Exercise 4A
1. A random sample drawn from a large population contains the following data:
19.3, 16.2, 14.1, 17.3, 18.2.
Calculate an unbiased estimate of:
(a) The population mean.
(b) The population variance. 
[4 marks]
2. A machine fills tins with beans. A sample of six tins is taken at random.
The tins contain the following amounts (in grams) of beans:
126, 130, 137, 128, 135, 133.
Find:
(a) The sample standard deviation.
(b) A n unbiased estimate of the population variance from
which this sample is taken. 
[4 marks]
3. Vitamin F tablets are produced by a machine. The amounts of vitamin F in 30 tablets chosen at random are shown in the table.
Mass (mg) 49.6 49.7 49.8 49.9 50.0 50.1 50.2 50.3 Frequency 1 3 4 6 8 4 3 1
Find unbiased estimates of:
(a) T he mean of the population from which this sample is taken.
(b) T he variance of the population from which this
sample is taken. 
[5 marks]
4. A sample of 75 lightbulbs was tested to see how long they last. The results were:
Time (hours) 0 ≤ t < 100
100 ≤ t < 200 200 ≤ t < 300 300 ≤ t < 400 400 ≤ t < 500 500 ≤ t < 600 600 ≤ t < 700 700 ≤ t < 800 800 ≤ t < 900 900 ≤ t < 1000
Number of lightbulbs (frequency) 2 4 8 9 12 16 9 8 6 1
50 Topic 7 Option: Statistics and probability
Estimate:
(a) The sample standard deviation.
(b) A n unbiased estimate of the variance of the population from
which this sample is taken.
[5 marks]
5. A pupil cycles to school. She records the time taken on each
of 10 randomly chosen days. She finds that ∑xi = 180 and ∑xi2 = 68580 where xi denotes the time, in minutes, taken on
the ith day.
Calculate an unbiased estimate of:
(a) The mean time taken to cycle to school.
(b) T he variance of the time taken to cycle to school.  [6 marks]
6. The standard deviation of a sample is 4 3 of the square root 7
of the unbiased estimate of the population variance. How many
objects are in the sample?
[4 marks]
4B T heory of unbiased estimators
We can find estimators of quantities other than the mean and the variance. To do this we need a general definition of an unbiased estimator. Key Point 4.2
If a population has a parameter a then the sample statistic  is an unbiased estimator of a if E(Â) = a.
We can interpret this to mean that if samples are taken many times and the sample statistic is calculated each time, the average of these values tends towards the true population statistic.
Worked example 4.2
Prove that the sample mean is an unbiased estimate of the population mean.
Define the sample mean as a random variable
WobhseerrevaXiteioanchofreXp. resents the ith independent
Apply expectation algebra
4 Unbiased estimators and confidence intervals 51
continued . . .
Use the fact that E(X ) = µ, the
population mean
If we have an idea what the estimator might be, we can test it by finding the expectation of that expression. It is often a good idea to first try finding the expectation of the variable and then see if there is an obvious link.
Worked example 4.3
X is a continuous random variable with probability density function f (x) = 1 , 1 < x < k + 1.
k Find an unbiased estimator for k.
Start by trying E(X)
∫ E(X ) =
k+1 x dx 1k
=
 x2 k+1  2k 1
=
(k
+ 1)2
2k
12 2k
=
k 2
+
1
This is close to what we need. We can use expectation algebra to find
the required expression
E(2X ) = k + 2 ∴E(2X 2) = k
So 2X-2 is an unbiased estimator of k
You may be asked to demonstrate that the sample statistic forms a biased estimate for a particular distribution.
Worked example 4.4
A distribution is equally likely to take the values 1 or 3.
(a) Show that the variance of this distribution is 1.
(b)List the four equally likely outcomes when a sample of size two is taken from this population.
(c)Find the expected value of S22 (sample variance for samples of size two) and comment on your result.
(a) E(X ) = 1 × 1 + 3 × 1 = 2
2
2
E(X 2 ) = 12 × 1 + 32 × 1 = 5
2
2
Var (X ) = 5 22 = 1
(b) Outcomes could be 1,1 or 1,3 or 3,1 or 3,3
52 Topic 7 Option: Statistics and probability
continued . . .
For each sample of size two, we need to find its variance and its probability,
and then find the expected value of
the variances
(c) Sample
Probability
x
x2
Sn2
1
1,1
4
1
1
0
1
1,3
4
2
5
1
1
3,1
4
2
5
1
1
3,3
4
9
9
0
E(S22
)
=
0
×
1 4
+
1
×
1 4
+
1
×
1 4
+
0
×
1 4
=1 2
This is not the same as the population variance, so S22 is a biased estimator of σ2
There may be more than one unbiased estimator of a population parameter. One important way to distinguish between them is efficiency. This is measured by the variance of the unbiased estimator. The smaller the variance, the more efficient the estimator is.
Worked example 4.5
(a)Show that for all values of c the statistic cX1 + (1 c) X2 forms an unbiased estimate of the
population mean of X.
(b)Find the value of c that maximises the efficiency of this estimator.
An estimator is unbiased if
its expected value equals the population
mean of X
The most efficient estimator has the smallest variance
(a) E(cX 1 + (1 c)X 2 ) = cE (X 1 ) + (1 c)E(X 2 )
= cµ + (1 c)µ
= cµ + µ cµ =µ
Therefore cX 1 + (1 c)X 2 forms an unbiased estimator of µ for all values
of c.
(b) Var (cX 1 + (1 c)X 2 ) = c2Var (X 1 ) + (1 c)2 Var (X 2 ) = c2σ2 + (1 2c + c2 )σ2
= 2σ2c2 2σ2c + σ2
This is minimised when d (2σ2c2 2σ2c + σ2 ) = 0
dc ⇒ 4σ2c 2σ2 = 0
c
=
1 2
if
σ2
≠0
So the most efficient estimator is when
c=
1
2
4 Unbiased estimators and confidence intervals 53
Exercise 4B
1. A bag contains 5 blue marbles and 3 red marbles. Two marbles are selected at random without replacement.
(a) F ind the sampling distribution of P, the proportion of the sample which is blue.
(b) Show that P is an unbiased estimator of the population
proportion of blue marbles. 
[7 marks]
2. The continuous random variable X has probability distribution f (x) = 3x2 0 < x < k. k3
(a) Find E(X).
(b) Hence find an unbiased estimator for k.
(c) A single observation of X is 7. Use your estimator to suggest
a value for k. 
[5 marks]
3. The random variable X can take values 1, 2 or 3.
(a) L ist all possible samples of size two.
(b) Show that the maximum of the sample forms a biased estimate of the maximum of the population.
(c) An unbiased estimator for the population maximum can be written in the form k × max, where max is the maximum of a sample of size two. Write down the value of k. [9 marks]
4. X1, X2 and X3 are three independent observations of the random variable X which has mean μ and variance σ2.
(a) S how that both A = X1 + 2X2 + X3 and B = X1 + 2X2 + 3X3
4
6
are unbiased estimators of μ.
(b) Show that A is a more efficient estimator than B.  [7 marks]
5. Two independent random samples of observations containing n1 and n2 values respectively are made of a random variable, X, which has mean μ and variance σ2. The means of the samples are denoted by X1 and X2 .
(a) S how that c X1 + (1 c)X2 is an unbiased estimator of μ.
(b) Show that the most efficient estimator of this form is
n1 X1 + n2 X2 . n1 + n2 
[9 marks]
6. A biased coin has a probability p that it gives a tail when it is tossed. The random variable T is the number of tosses up to and including the second tail.
(a) State the distribution of T.
(b) S how that P(T = t ) = (t 1)(1 )p t2 p2 for t ≥ 2.
(c) Hence
show
that
T
1
1
is
an
unbiased
estimator
of
p.

[8 marks]
54 Topic 7 Option: Statistics and probability
7. Two independent observations X1 and X2 are made of a
continuous random variable with probability density function
f (x) = 1 0 ≤ x ≤ k.
k (a) Show that X1 + X2 forms an unbiased estimator of k.
(b) Find the cumulative distribution of X.
(c) H ence find the probability that both X1 and X2 are less than m where 0 ≤ m ≤ k .
(d) Find the probability distribution of M, the larger of X1 and X2. (e) Show that 3 M is an unbiased estimator of k.
2
(f) Find with justification which is the more efficient estimator
of
k:
X1
+
X2
or
3 M. 2
[21 marks]
4C C onfidence interval for the population mean
A point estimate is a single value calculated from the sample and used to estimate a population parameter. However, this can be misleading as it does not give any idea of how certain we are in the value. We want to find an interval which has a specified probability of including the true population value of the parameter we are interested in. This interval is called a confidence interval and the specified probability is called the confidence level. All of the confidence intervals in the IB are symmetrical, meaning that the point estimate is at the centre of the interval. For example, given the data 1, 1, 3, 5, 5, 6 we can find the sample mean as 3.5. However, it is very unlikely that the mean of the population this sample was drawn from is exactly 3.5. We shall see in Section E a method that allows us to say with 95% confidence that the population mean is somewhere between 1.22 and 5.78.
We are first going to look at creating confidence intervals for the population mean μ when the population variance σ2 is known. This is not a very realistic situation, but it is useful to develop the theory.
Suppose we are estimating μ using a sample statistic X.
As long as the random variable is normally distributed or
the sample size is large enough for the central limit theorem
to apply we know that
X ~ N  µ,
σ2 n

.
We
can
use
our
knowledge of the normal distribution to find, in terms of μ and
σ, a region symmetrical about μ which has a 95% probability of
containing x .
2.5%
95%
2.5%
x
Lower
µ
Upper
Bound
Bound
4 Unbiased estimators and confidence intervals 55
Is P(3 < X ) referring to a probability about X or a probability about 3?
Exam hint Your calculator ccieaonoSsitnrktneehifserlfeiulivdsnrCmaedsslanmshalcecmaueeusrptliysanletgsCodtar,dtDias,ttiacs.
G, and H.
Exam hint
TdhoeesFonromt utelall
booklet you how
to find z.
Using the method from the core syllabus we can find the Z-score of the upper bound. Using the symmetry of the situation we find that 2.5% of the distribution is above the upper bound, so the Z-score is Φ1 (0.975) = 1.96 (3SF). We can say that:
P(1.96 < Z < 1.96) = 0.95
Converting to a statement about x, μ and σ:
P
 
1.96
<
x σ/
µ n
< 1.96
=
0.95
Rearranging to focus on μ:
P
 
x
1.96σ n
<
µ
<
x
+
1.96σ  n 
=
0.95
This looks like it is a statement about the probability of μ, but
in our derivation we treated μ as a constant so it is meaningless
to talk about a probability of μ. This statement is still concerned
with the probability distribution of X.
So our 95% confidence interval for μ based upon an observation of the sample mean is:
x 1.96
σ, n
x + 1.96
σ n 
We can say that 95% of such confidence intervals contain μ, rather than the probability of μ being in the confidence interval is 95%.
We can generalise this method to other confidence levels. To find a c% confidence interval we can find the critical Z-value geometrically by thinking about the graph.
c 2
%
c 2
%
x
q
Z
50%
From this diagram we can see that the critical Z-value is the one
where there is a probability of 0.5 +
1 2
c
being below it.
100
Key Point 4.3
When the variance is known a c% confidence interval for μ is:
x z
σ <µ<x+z n
σ n
where
z
=
Φ
1

0.5
+
1 2
c
100

56 Topic 7 Option: Statistics and probability
Worked example 4.6
The mass of fish in a pond is known to have standard deviation 150 g. The average mass of 96 fish from the pond is found to be 806 g. (a) Find a 90% confidence interval for the average mass of all the fish in the pond. (b)State, with a reason, whether or not you used the central limit theorem in your previous
answer.
Find the Z-score associated with a 90% confidence interval
(a)With 90% confidence we need
z = Φ1 (0.95) = 1.64
So confidence interval is 806 ± 1.64 × 150 which 96
is [780.9, 831.1]
(b)We did need to use the central limit theorem as we were not told that the mass of fish is normally distributed.
You do not need to know the centre of the interval to find the width of the confidence interval.
Key Point 30.4
The width of a confidence interval is 2z
σ n.
Worked example 4.7
The results in a test are known to be normally distributed with a standard deviation of 20. How many people need to be tested to find a 80% confidence interval with a width of less than 5?
Find the Z-score associated with a 80% confidence interval
Set up an inequality
With 80% confidence we need z = Φ1 (0.9) = 1.28
2 × 1.28 × 20 < 5 n
⇒ 2 × 1.28 × 20 < n 5
⇒ 104.9 < n So at least 105 people need to be tested.
4 Unbiased estimators and confidence intervals 57
Exercise 4C
1. Find z for the following confidence levels:
(a) 80%
(b) 99%
2. Which of the following statements are true for 90% confidence intervals of the mean?
(a) There is a probability of 90% that the true mean is within the interval.
(b) If you were to repeat the sampling process 100 times, 90 of the intervals would contain the true mean.
(c) Once the interval has been created there is a 90% chance that the next sample mean will be within the interval.
(d) On average 90% of intervals created in this way contain the true mean.
(e) 90% of sample means will fall within this interval.
3. For a given sample, which will be larger: an 80% confidence interval for the mean or a 90% confidence interval for the mean?
4. Give an example of a statistic for which the confidence interval would not be symmetric about the sample statistic.
5. Find the required confidence interval for the population mean for the following summarised data. You may assume that the data are taken from a normal distribution with known variance. (a) (i) x = 20, σ2 = 14, n = 8, 95% confidence interval
(ii) x = 42.1, σ2 = 18.4, n = 20, 80% confidence interval
(b) (i) x = 350, σ = 105, n = 15, 90% confidence interval
(ii) x = 1.8, σ = 14, n = 6, 99% confidence interval
(a) (i) (ii)
(b) (i) (ii)
(c) (i) (ii)
(d) (i) (ii)
(e) (i) (ii)
x 58.6 0.178
8 0.4
σ
8.2 0.01 4 1.2 18 25
12 0.01
6. Fill in the missing values in the table. You may assume that the data are taken from a normal distribution with known variance.
n Confidence level Lower bound of Upper bound of
interval
interval
4
90
12
80
4
900
95
88
100
75
400
90
39.44 30.30 115.59 1097.3 0.601 15.967
44.56 30.50 124.41 1102.7 8.601 16.033
14 80
0.539
0.403
58 Topic 7 Option: Statistics and probability
7. The blood oxygen levels of an individual (measured in percent)
are known to be normally distributed with a standard deviation
of 3%. Based upon six readings Niamh finds that her blood
oxygen levels are on average 88.2%. Find a 95% confidence
interval for Niamhs true blood oxygen level.
[5 marks]
8. The birth weight of male babies in a hospital is known to be
normally distributed with variance 2 kg2. Find a 90% confidence
interval for the average birth weight, if a random sample of ten
male babies has an average weight of 3.8 kg.
[6 marks]
9. When a scientist measures the concentration of a solution, the measurement obtained may be assumed to be a normally distributed random variable with standard deviation 0.2.
(a) He makes 18 independent measurements of the concentration of a particular solution and correctly calculates the confidence interval for the true value as [43.908, 44.092]. Determine the confidence level of this interval.
(b) He is now given a different solution and is asked to determine
a 90% confidence interval for its concentration. The
confidence interval is required to have a width less than 0.05.
Find the minimum number of measurements required.

[8 marks]
10. A supermarket wishes to estimate the average amount single
men spend on their shopping each week. It is known that the
amount spent has a normal distribution with standard deviation
€22.40. What is the smallest sample required so that the margin
of error (the difference between the centre of the interval and
the boundary) for an 80% confidence interval is less than €10?

[5 marks]
11. The masses of bananas are investigated. The masses of a random sample of 100 of these bananas was measured and the average was found to be 168g. From experience, it is known that the mass of a banana has variance 200 g2.
(a) Find a 95% confidence interval for μ.
(b) State, with a reason, whether or not your answer requires
the assumption that the masses are normally distributed.

[6 marks]
12. A physicist wishes to find a confidence interval for the mean voltage of some batteries. He therefore randomly selects n batteries and measures their voltages. Based on his results, he obtains the 90% confidence interval 8.884V, 8.916V. The voltages of batteries are known to be normally distributed with a standard deviation of 0.1V.
(a) Find the value of n.
(b) Assuming that the same confidence interval had been
obtained from measuring 49 batteries, what would be its
level of confidence?
[8 marks]
4 Unbiased estimators and confidence intervals 59
You will also need the T-score for hypothesis testing: see Section 5C.
4D The t-distribution
In the previous section, we based calculations on the assumption that the population variance was known, even though its mean was not. In reality we commonly need to estimate the population variance from the sample. In our calculations, we then need to use a new distribution instead of the normal distribution. It is called the t-distribution.
If the random variable X follows a normal distribution so that
( ) X ~ N µ, σ2 , or if the CLT applies, the Z-score for the mean
follows a standardised normal distribution:
Z
=
Xn σ/
−µ n
~
N(0, 1)
The parameters μ and σ may be unknown, but they are constant; they are the same every time a sample of X is taken.
When the true population standard deviation is unknown, we replace it with our best estimate: sn1. We then get the T-score:
T = Xn −µ sn1 / n
The T-score is not normally distributed. The proof of this is beyond the scope of the course, but we can use intuition to suggest how it might be related to the normal distribution:
•­ The most probable value of T will be zero. As |T | increases, the probability decreases; so it is roughly the same shape as the normal distribution.
•­ If n is very large, our estimate of the population standard deviation should be very good, so T will be very close to a normal distribution.
•­ If n is very small, our estimate of the population standard deviation may not be very accurate. The probability of getting a Z-score above 3 or below 3 is very small indeed. However, if sn1 is smaller than σ it is possible that T is artificially increased relative to Z. This means that the probability of getting an extreme value of T ( T > 3) is significant.
From this we can conclude that T follows a different distribution depending upon the value of n. This distribution is called the t-distribution and it depends only upon the value of n.
Key Point 4.5
60 Topic 7 Option: Statistics and probability
The actual formula for the probability density of t ν is
f (x) = 2
(ν 1)(ν 3) ν (ν 2)(ν
5
4)
×3 4×
2 1+
x2 ν
 
21(ν+1)
if
ν
is even and
f (x) =
π
(ν 1)(ν 3) ν (ν 2)(ν
4
4)
×2 5×
3
1+
x2 ν
 
21(ν+1)
if
ν
is odd.
This relates to something called the gamma function and is not on the syllabus!
The suffix is n 1 because that describes the number of degrees
of freedom once sn1 has been estimated. It is also given the symbol ν. It is nearly always one less than the total number of data items: ν = n 1.
The shapes of these distributions are shown.
Normal
Normal
The only exception to this is when testing for correlation in Section 6B.
Normal
t1
t6
t30
If n is large ( n > 30 ) we have noted that the t-distribution is approximately the same as the normal distribution but when n is small the t-distribution is distinct from the normal distribution. We still need to have Xn following a normal distribution, but with small n we can no longer apply the CLT. Therefore the t-distribution applies to a small sample mean only if the original distribution of X is normal.
There are two types of calculation you need to be able to do with the t-distribution:
•  Find the probability that T lies in a certain range.
• Given the cumulative probability P(T ≤ t), find the boundary value t.
If P(T ≤ t ) = p% then t is called the pth percentage point of
the distribution.
Exam hint
We can use a graphical calculator to find probabilities associated with the t-distribution.
See Calculator skills sheets A and B.
4 Unbiased estimators and confidence intervals 61
Worked example 4.8 Find the probability that 1 < T < 3 if n = 5.
ν =n 1 = 4
P (1 < T < 3) = 0.793(3SF from GDC )
Worked example 4.9
If n = 8, find the value of t such that P (T < t ) = 0.95.
ν =n 1 =7 95th percentage point of t7 is 1.90 (3SF from GDC)
Exercise 4D
1. In each situation below, T ~ tν . (Remember that ν = n 1.)
Find the following probabilities:
(a) (i) P(2 < T < 3) if n = 5
(ii) P(1 < T < 1) if n = 8
(b) (i) P(T ≥ 5.1) if ν = 4
(ii) P(T ≥ 1.8) if ν = 6
(c) (i) P(T < 2.4) if n = 12
(ii) P(T < 0.2) if n = 16
(d) (i) P( T < 1.9) if n = 20
(ii) P( T > 2.6) if n = 17
2. How does P(2 < T < 3) change as n increases?
3. In each situation below, T ~ tν. Find the values of t:
(a) (i) P(T < t ) = 0.8 if n = 13 (ii) P(T < t ) = 0.15 if n = 9
(b) (i) P(T > t ) = 0.75 if n = 10 (ii) P(T > t ) = 0.3 if n = 20
(c) (i) P( T < t ) = 0.6 if n = 14
( ) (ii) P |T | < t = 0.4 if n = 11
4. If T ~ t7 solve the equation:
P(T > t ) + P(T > 0) + P(T > t ) + P(T > 2t ) = 1.75
[6 marks]
62 Topic 7 Option: Statistics and probability
4E C onfidence interval for a mean with unknown variance
When finding an estimate for the population mean we do not know the true population standard deviation; we estimate it
x −µ from the sample. This means that the statistic sn1 / n does not follow the normal distribution, but rather the t-distribution (as long as X follows a normal distribution). Following a similar analysis to the one in Section 4C we get:
Key Point 4.6
When the variance is not known, the c% confidence
interval for the population mean is given by:
x t sn1 < µ < x + t sn1
n
n
where
t
is chosen
so
that
P(Tν
< t) = 0.5 +
1 2
c
100
.
ETdhxoeeasFmonromht uteilalnl bytoouokhleotw to find t.
Worked example 4.10
The sample {4, 4, 7, 9, 11} is drawn from a normal distribution. Find the 90% confidence interval for the mean of the population.
Find sample mean and unbiased estimate of σ
Find the number of degrees of freedom
Find the t-score associated with a 90% confidence interval
when v = 4
From GDC: x = 7, sn1 ≈ 3.08 ν =n 1 = 4 95th percentage point of t4 is 2.132 (from GDC)
Apply formula
7 2.132 × 3.08 < µ < 7 + 2.132 × 3.08
5
5
∴ 4.06 < µ < 9.94 (3SF)
We are often interested in the difference between two situations, such as Are people more awake in the morning or afternoon? or Were the results better in the French or the Spanish examinations? If we study two different groups to look at this, we risk any observed difference being due to differences between the groups rather than differences caused by the factor being studied. One way to avoid this is to use data which are
4 Unbiased estimators and confidence intervals 63
naturally paired; the same person in the morning and afternoon, or the same person in the French and Spanish examinations. If we do this we can then simply look at the difference between the paired data and treat this as a single variable.
Worked example 4.11
Six people were asked to estimate the length of a line and the angle at a point. The percentage error in the two measurements was recorded, and it was assumed that the results followed a normal distribution. Find an 80% confidence interval for the average difference between the accuracy of estimating angles and lengths.
Person
A
B
C
D
E
F
Error in length
17 12
9
14
8
6
Error in angle
12 12 15 19 12
8
Define variables
Find sample mean and unbiased estimate of σ Find the number of degrees
of freedom Find the t-score associated with a 80% confidence interval when v = 5
Let d = error in angle error in length Person A B C D E F 5 0 6 5 4 2
d = 2, sn1 = 4.04 (from GDC)
ν =n 1 = 5
90th percentage point of t5 is 1.476
Apply formula
2 1.476 × 4.04 < µ < 2 + 1.476 × 4.04
6
6
0.434 < µ < 4.43
Exam hint Your calculator can find confidence intervals associated with both normal and t-distributions. In your answer, you need to make it clear which distribution and which data you are using. In the above example, you would need to show the table, the values of d and sn1, state that you are using t-distribution with v = 5 and then write down the confidence interval.
64 Topic 7 Option: Statistics and probability
Exercise 4E
1. Find the required confidence interval for the population mean
for the following data, some of which have been summarised.
You may assume that the data are taken from a normal
distribution.
(a) (i) x = 14.1, sn1 = 3.4, n = 15, 85% confidence interval
(ii) x = 191, sn1 = 12.4, n = 100, 80% confidence interval
(b) (i) x = 18, sn = 2.7, n = 10, 95% confidence interval
(ii) x = 0.04, sn = 0.01, n = 4, 75% confidence interval
15
15
∑ ∑ (c) (i) xi = 32, xi2 = 1200, 75% confidence interval
1
1
20
20
∑ ∑ (ii) xi = 18, xi2 = 650, 90% confidence interval
1
1
{ } (d) (i) x = 1, 1, 3, 5, 12, 20 , 95% confidence interval
(ii) x = {150, 210, 130, 96, 209}, 90% confidence interval
2. Find the required confidence intervals for the average difference (after before) for the data below, given that the data are normally distributed.
(a) 95% confidence interval
Subject A
B
C
D
E
Before 16
20
20
16
12
After
18
24
18
16
16
(b) 99% confidence interval
Subject A B
C
D
E
F
Before 4.2 6.5 9.2 8.1 6.6 7.1
After 5.3 5.5 8.3 9.0 6.1 7.0
3. The times taken for a group of children to complete a race are recorded:
t (minutes) 8 ≤ t < 12
Number of children
9
12 ≤ t < 14
18
14 ≤ t < 16
16
16 ≤ t < 20
20
Assuming that these children are drawn from a random sample of all children, calculate:
(a) An unbiased estimate of the mean time taken by a child in the race.
(b) An unbiased estimate of the variance of the time taken.
(c) A 90% confidence interval for the mean time taken.

[7 marks]
4 Unbiased estimators and confidence intervals 65
4. Four pupils took a Spanish test before and after a trip to Mexico. Their scores are shown in the table.
Before trip After trip
Amir 12 15
Barbara 9 12
Chris 16 17
Delroy 18 18
Find a 90% confidence interval for the average increase in scores
after the trip.
[4 marks]
5. A garden contains many rose bushes. A random sample of eight bushes is taken and the heights in centimetres were measured and the data were summarised as:
∑x = 943,∑x2 = 113005
(a) State an assumption that is necessary to find a confidence interval for the mean height of rose bushes.
(b) Find the sample mean.
(c) Find an unbiased estimate for the population standard deviation.
(d) Find an 80% confidence interval for the mean height of rose
bushes in the garden. 
[9 marks]
6. The mass of four steaks (in grams) before and after being cooked for one minute is measured.
Steak
A
B
C
D
Before cooking 148
167
160
142
After cooking 124
135
134
x
A confidence interval for the mean mass loss was found to include values from 21.5 g to 31.0 g.
(a) Find the value of x.
(b) Find the confidence level of this interval. 
[10 marks]
7. A sample of 3 randomly selected students are found to have a variance of 1.44 hours2 in the amount of time they watch television each weekday. Based upon this sample the confidence interval for the mean time a student spends watching television is calculated as [3.66, 7.54].
(a) Find the mean time spent watching television.
(b) Find the confidence level of the interval. 
[8 marks]
8. The random variable X is normally distributed with mean μ. A random sample of 16 observations is taken on X, and it is found that:
16
∑(x x )2 = 984.15 1
A confidence interval [40.88, 46.72] is calculated for this
sample. Find the confidence level for this interval. [8 marks]
66 Topic 7 Option: Statistics and probability
9. The lifetime of a printer cartridge, measured in pages, is believed to be approximately normally distributed. The lifetimes of 5 randomly chosen print cartridges were measured and the results were:
120, 480, 370, 650, x
A confidence interval for the mean was found to be [218, 510].
(a) Find the value of x .
(b) What is the confidence level of this interval?  [8 marks]
10. The temperature of a block of wood 3 minutes after being lifted out of liquid nitrogen is measured and then the experiment is repeated. The results are 1.2oC and 4.8oC.
(a) Assuming that the temperatures are normally distributed find a 95% confidence interval for the mean temperature of a block of wood 3 minutes after being lifted out of liquid nitrogen.
(b) A different block of wood is subjected to the same
experiment and the results are 0oC and xoC where x > 0.
Prove that the two confidence intervals overlap for all
values of x. 
[12 marks]
11. In a random sample of three pupils, xi is the mark of the ith pupil in a test on volcanoes and yi is the mark of the ith pupil in a test on glaciers. All three pupils sit both tests.
(a) Show that y x is always the same as y x .
(b) Give an example to show that the variance of y x is not necessarily the same as the difference between the variance of y and the variance of x.
(c) It is believed that the difference between the results in
these two tests follows a normal distribution with variance
16 marks. If the mean mark of the volcano test was 23
and the mean mark for the glacier test was 30, find a 95%
confidence interval for the improvement in marks from
the volcano test to the glacier test.
[10 marks]
Summary
• An unbiased estimator of a population parameter has an expectation equal to the population parameter: if a is a parameter of a population then the sample statistic  is an unbiased estimator of a if E(Â) = a. This means that if samples are taken many times and the sample statistic calculated each time, the average of these values tends towards the true population statistic.
( ) • The sample mean X is an unbiased estimator of the population mean µ.
• The sample variance (sn21 ) is a biased estimator of the population variance (σ2), but the value
sn21
=
n
n
1
sn2
is
an
unbiased
estimate.
4 Unbiased estimators and confidence intervals 67
• Sn1 is not an unbiased estimator of the standard deviation, but it is often a very good approximation.
• There may be more than one unbiased estimator of a population parameter. The efficiency of a parameter is measured by the variance of the unbiased estimator; the smaller the variance, the more efficient the estimator is.
• If X follows a normal distribution with mean µ and unknown variance, and if a random sample of n independent observations of X is taken, then it is useful to calculate the T-score: T = Xn µ sn1 / n
This follows a tn1 distribution.
• Rather than estimating a population parameter using a single number (a point estimate), we
can provide an interval (called the confidence interval) that has a specified probability (called
the confidence level) of including the true population value of the statistic we are interested in:
The width of a confidence interval is 2z σ .
n
If the true population variance is known and the sample mean follows a normal
distribution then the c% confidence interval takes the form x z σ < µ < x + z σ ,
where
z
=
Φ 1
 
0.5
+
1 2
c
100
 
n
n
If the true population variance is unknown and the population follows a normal
distribution then the c% confidence interval takes the form x t sn1 < µ < x + t sn1 ,
n
n
where
t
is
chosen
so
that
P(Tν
<
t)
=
0.5
+
1 2
c
.
100
68 Topic 7 Option: Statistics and probability
Mixed examination practice 4
1. The mass of a sample of 10 eggs is recorded and the results in grams are:
62, 57, 84, 92, 77, 68, 59, 80, 81, 72
Assuming that these masses form a random sample from a normal population, calculate:
(a) Unbiased estimates of the mean and variance of this population.
(b) A 90% confidence interval for the mean. 
[6 marks]
2. From experience we know that the variance in the increase between marks in a beginning of year test and an end of year test is 64. A random sample of four students was selected and the results in the two tests were recorded.
Alma Brenda Ciaron Dominique
Beginning of year 98
62
88
82
End of year
124
92
120
116
Find a 95% confidence interval for the mean increase in marks from the
beginning of year to the end of year. 
[5 marks]
3. The time (t) taken for a mechanic to replace a set of brake pads on a car
is recorded. In a week she changes 14 tyres and ∑t = 308 minutes and ∑t2 = 7672minutes2. Assuming that the times are normally distributed,
calculate a 98% confidence interval for the mean time taken for the mechanic to
replace a set of brake pads. 
[7 marks]
4. A distribution is equally likely to take the values 1 or 4. Show that sn1 forms a
biased estimator of σ. 
[8 marks]
5. The random variable X is normally distributed with mean μ and standard
deviation 2.5. A random sample of 25 observations of X gave the result
∑x = 315.
(a) Find a 90% confidence interval for μ.
(b) It is believed that P(X ≤ 14) = 0.55. Determine whether or not this is
consistent with your confidence interval for μ.
[12 marks]

(© IB Organization 2006)
6. The proportion of fish in a lake which are below a certain size can be estimated
by catching a random sample of the fish. The random variable X1 is the number
of fish in a sample of size n1 which are below the specified size.
(a)
Show that
P1
=
X1 n1
is an unbiased estimator of
p.
(b) Find the variance of P1 .
A further sample of size n2 is taken and the random variable X2 is the number
of
undersized
fish
in
this
sample.
Define
P2
=
X2 n2
.
4 Unbiased estimators and confidence intervals 69
(c) Show that PT = (d) For what values
12o(fPn1 1+
n2
P2 is
) is
PT
also an a more
unbiased estimator of p. efficient estimator than either
of P1 or P2? 
[15 marks]
7. A discrete random variable, X, takes values 0, 1, 2 with probabilities 1 2α, α, α respectively, where α is an unknown constant 0 ≤ α ≤ 1. A random 2 sample of n observations is made of X. Two estimators are proposed for α. The
first is 1 X, and the second is 1 Y where Y is the proportion of observations in
3
2
the sample which are not equal to 0.
(a) Show that 1 X and 1 Y are both unbiased estimators of α.
3
2
(b) Show that 1 Y is the more efficient estimator. 2
[13 marks]
70 Topic 7 Option: Statistics and probability
5 Hypothesis testing
If you toss a coin 100 times and get 50 heads, you cannot say that the coin was biased; equally if 52 or 56 heads were observed, you would still not be suspicious. However, if there were 90 heads you would probably conclude that the coin was biased. So how many heads would be enough to decide that the coin is really biased? This type of question occurs frequently in real situations: a result may not be exactly what you would expect, but with random variation, results rarely are. You have to decide if the evidence is significant enough to change from the default position; this is called a hypothesis test.
In this chapter you will learn:
how to find out if a mean has changed significantly when the variance is known (a Z-test)
how to find out if a mean has changed significantly when the variance has been estimated (a t-test)
about the types of error associated with these decisions.
5A T he principle of hypothesis testing
The basic principle of hypothesis testing is innocent until proven guilty beyond reasonable doubt. We start from a fallback position which we will accept if there is no significant evidence against it, this is called the null hypothesis, H0. We will compare this against our suspicion of how things might be, this is called the alternative hypothesis, H1.
There are two philosophies for using data to make decisions. Hypothesis testing is one approach but there is increasing support for another method called Bayesian statistics.
Worked example 5.1
The labels on cans of soup claim that a can contains 350 ml of soup. A consumer believes that on average, they contain less than 350 ml. State the null and alternative hypotheses.
Define variables
µ = mean amount of soup in a tin
Decide which is the conservative position
H0 : µ = 350
Decide in which direction suspicion lies
H1 : µ < 350
Generally the null hypothesis is written as an equality while the alternative hypothesis is written as an inequality. If the alternative hypothesis is only looking for a change in one direction ( > or < ) it is called a one-tailed test. If the alternative hypothesis is looking for a change in either direction ( ≠ ) it is called a two-tailed test.
5 Hypothesis testing 71
Exam hint
This is the used by your
method GDC.
SsheeeeCtsaElc,uFlaatonrdsIk.ills
The main importance
of this method is
that it can be used
to calculate the
probabilities
of
different types of
error. See Section 5E.
We must now come up with a way of deciding whether or not the information gathered is significant. In the example of tossing 100 coins it is possible that a fair coin comes down heads 90 times by chance. Based upon this outcome we cannot say with certainty that the coin is biased. However, we can say that this outcome is extremely unlikely while the coin is fair. Before performing the hypothesis test you must decide exactly how unlikely an outcome must be to reject H0 ; this is called the significance level.
We can now outline the general procedure for hypothesis testing.
Key Point 5.1
1. Write down H0 and H1.
2. Decide on the significance level.
3. Decide what statistic you are going to calculate, called the test statistic.
4. Find the distribution of this statistic assuming that H0 is true.
5. Calculate the test statistic from the sample.
6. Decide whether the test statistic is sufficiently unlikely.
7. Determine the outcome of the test and interpret it in the context of the question.
The hardest stage in this process is usually stage 6. This can be done in one of two ways, the p-value or the critical region, both of which have their advantages:
The p-value method involves finding the probability of the observed test statistic, or more extreme, occurring when H0 is true. So for example, if you were testing against µ > 100 and you observed a mean of 110 you would find the probability of the mean being greater than or equal to 110 rather than just 110. If you were testing against µ ≠ 100 and you observed a mean of 110 you would find the probability of the mean being equal to or above 110 or equal to or below 90 (as this is the same distance away from the mean in the opposite direction). If this p-value is less than the significance level we reject H0.
The critical region method finds all the values the test statistic could take so that H0 is rejected: all the values which have a p-value less than the significance level. The values which result in H0 being rejected form the critical or rejection regions and they have a total probability equal to the significance level. All other values lie in the acceptance region. The boundary between the two regions is called the critical value.
72 Topic 7 Option: Statistics and probability
Once we have collected data we can look at what region it falls in and decide what conclusion to make. There are two standard conclusions:
1. reject H0
2. do not reject H0.
In this first example we use the p-value method.
In section 5E we see that we cannot really accept H0 as we do not know how likely we are to be drawing a false conclusion. We are controlling the probability of falsely rejecting H0 and our conclusions should reflect this.
Worked example 5.2
It is believed that the normal level of testosterone in blood is normally distributed with mean 24 nmol/l and standard deviation 6 nmol/l. Following a race a sprinter gives a sample with 34 nmol/l. Is this sufficiently different (at 5% significance) to suggest that the sprinters sample is being drawn from a population with a different level of blood testosterone?
Define variables
X = crv level of blood testosterone in a
sprinter
X ~ N( µ, 62)
Decide which is the conservative position
H0 : µ = 24
Decide in which direction suspicion lies
State distribution of X under H0
Find the p-value remembering that it includes everything further away from the mean than 34 in the direction of H1
Draw conclusion
H1 : µ ≠ 24 Therefore a two-tailed test.
( ) Under H0 , X ~ N 24, 62
p-value = P (X ≥ 34) + P (X ≤ 14)
= 0.0478 + 0.0478 = 0.0956
This p-value is greater than 0.05, so we do not reject H0. There is not sufficient evidence to suggest that the level is different.
Exam hint Notice that the hypotheses concern the underlying population parameter µ. You do not need to define conventional terms like µ (being the population mean) since it is within the IBs list of accepted notation.
We can also apply the p-value method to one-tailed tests.
5 Hypothesis testing 73
Worked example 5.3
According to a geography textbook the average volume of raindrops globally is normally distributed with variance 0.01 ml2 and mean 0.4 ml. Misha believes that the volume of raindrops in Brazil is significantly larger than the global average. He measures the volume of a raindrop and finds that it is 0.6 ml. Test at the 5% significance level whether or not his suspicion is correct.
Define variables
X = crv volume of a raindrop in ml
X ~ N(µ, 0.01)
Decide which is the conservative position
H0 : µ = 0.4
Decide in which direction suspicion lies
State distribution of X under H0
Find the p-value remembering that it includes everything further away from the
mean than 0.6 in the direction of H1 Draw conclusion
H1 : µ > 0.4 Therefore a one-tailed test.
Under H0 , X ~ N (0.4, 0.01)
p-value = P (X ≥ 0.6)
= 0.0228
This p-value is less than 0.05, so we reject H0. There is evidence that Mishas suspicion is correct.
You may prefer to use the critical region method, and some questions may require you to use it.
Worked example 5.4
A machine produces screws which have a mean length of 6 cm and a standard deviation of 0.2 cm. The controls are knocked and it is believed that the mean length may have changed while the standard deviation stays the same. A single screw is measured. Find the critical region at the 5% significance level.
Define variables Decide which is the conservative position
X = crv length of a screw
X ~ N(µ, 0.22 )
H0 : µ = 6
Decide in which direction suspicion lies
H1 : µ ≠ 6 Therefore a two-tailed test.
74 Topic 7 Option: Statistics and probability
continued . . . State distribution of X under H0
Find the x-values for the critical region: For a two-tail test with 5% significance level, the
probability in each tail is 2.5%
Under H0 , X ~ N (6, 0.22 )
State the critical region
2.5%
2.5%
x
a
6
b
P (X < a) = 0.025⇒ a = 5.61 P (X > b) = 0.025⇒ b = 6.39
The critical region is X < 5.61 or X > 6.39 (3SF)
You may have noticed that the method in Worked example 5.4 was very similar to the method used in confidence intervals. It is indeed the case that the boundaries for a c% confidence interval correspond to the critical values for a 2-tailed hypothesis test at (100 c)% significance. Unfortunately, we cannot apply the methods from confidence intervals to one-tailed tests. However we can still use the critical region method.
Worked example 5.5
The reaction time in catching a falling rod is believed to be normally distributed with mean 0.9 seconds and standard deviation 0.2 seconds. Xinyi believes that her reaction times are faster than this.
(a) Find the critical region at the 5% significance level to test Xinyis claim.
(b) In a test Xinyi catches the rod after 0.6 seconds. State the conclusion to your hypothesis test.
Define variables
(a) X = crv reaction time
X ~ N(µ, 0.22 )
Decide which is the conservative position Decide in which direction suspicion lies State distribution of X under H0
H0 : µ = 0.9
H1 : µ < 0.9 Therefore a one-tailed test.
Under H0 , X ~ N (0.9, 0.22 )
5 Hypothesis testing 75
continued . . . Find the x-value associated with 5% significance and one-tailed test
If the observed value is in the critical region, there is evidence to reject this
x
a
0.9
P (x < a) = 0.05
⇒ a = 0.571 (3SF , from GDC)
The critical region is x < 0.571
(b)Observed value falls into acceptance region therefore accept H0, there is no significant evidence for Xinyis claim.
Exam hint
If you are not sure which end to label as the rejection region in a one-tailed test, think about what values would encourage you to accept the alternative hypothesis
Exercise 5A
1. Write null and alternative hypotheses for each of the following situations:
(a) (i) The average IQ in a school (μ) over a long period of time has been 102. It is thought that changing the menu in the cafeteria might have an effect upon the average IQ.
(ii) It is claimed that the average size of photos created by a camera (μ) is 1.2 Mb. A computer scientist believes that this figure is inaccurate.
(b) (i) A consumer believes that steaks sold in portions of 250 g are on average underweight.
(ii) A careers adviser believes that the average extra amount earned by people with a degree is more than the $150 000 figure he has been told at a seminar.
(c) (i) The mean breaking tension of a brake cable (μT) does not normally exceed 3000 N. A new brand claims that it regularly does exceed this value.
(ii) The average time taken to match a fingerprint (μt) is normally more than 28 minutes. A new computer program claims to be able to do better.
76 Topic 7 Option: Statistics and probability
(d) (i) The fraction (p) of toffees in a box of chocolates is
advertised as being 1 , but Jason thinks that it is more
than this.
3
(ii) The standard deviation (σ) of measurements of the temperature of meat is thought to have decreased from its previous value of 0.5oC.
2. If it is observed that x = 10, find the p-value for each of the
following hypotheses, and hence decide the outcome of the
hypothesis test at the 5% significance level.
(a)
(i)
X
~
N
 
µ,
1 4
;
H0
:
µ
= 10.8;
H1
:
µ
≠ 10.8
( ) (ii) X ~ N µ, 5 ; H0 : µ = 15; H1 : µ ≠ 15
( ) (b) (i) X ~ N µ, 7 ; H0 : µ = 4; H1 : µ > 4
( ) (ii) X ~ N µ, 400 ; H0 : µ = 40; H1 : µ < 40
3. Find the acceptance region for each of the following hypothesis tests when a single value is observed.
( ) (a) (i) X ~ N µ, 52 ; H0 : µ = 2; H1 : µ ≠ 2; 5% significance ( ) (ii) X ~ N µ, 122 ; H0 : µ = 16; H1 : µ ≠ 16; 5% significance
( ) (b) (i) X ~ N µ, 52 ; H0 : µ = 2; H1 : µ ≠ 2; 1% significance ( ) (ii) X ~ N µ, 122 ; H0 : µ = 16; H1 : µ ≠ 16; 10% significance
( ) (c) (i) X ~ N µ, 52 ; H0 : µ = 2; H1 : µ > 2; 5% significance ( ) (ii) X ~ N µ, 122 ; H0 : µ = 16; H1 : µ > 16; 5% significance
( ) (d) (i) X ~ N µ, 52 ; H0 : µ = 2; H1 : µ < 2; 5% significance ( ) (ii) X ~ N µ, 122 ; H0 : µ = 16; H1 : µ < 16; 5% significance
( ) (e) (i) X ~ N µ, 16 ; H0 : µ = 5; H1 : µ > 5; 1% significance ( ) (ii) X ~ N µ, 100 ; H0 : µ = 18; H1 : µ < 18; 10% significance
4. The null hypothesis µ = 30 is tested and a value X = 35 is observed. Will it have a greater p-value if the alternative hypothesis is µ ≠ 30 or µ > 30 ?
5. A commuter conducts a study and he claims that the average
time taken for a train to complete a journey (t) is above
32 minutes. The correct null hypothesis for this is µt = 32. What further information would you need before you could
write down the alternative hypothesis?
5 Hypothesis testing 77
5B H ypothesis testing for a mean with known variance
One very common and useful parameter whose sample distribution is often known is the mean. If the random variable is normally distributed, or if the mean is taken from a sample large enough to use the CLT then the sample mean follows
a normal distribution. More specifically, if E(X) = µ and Var (X) = σ2 and either of the above conditions is satisfied then
X
~
N

µ,
σ2 n

.
As
long
as
σ
is
known
we
could
either
use
x
or
the Z-score as the test statistic. This is called a Z-test.
If we are given an observed value of X we can use it as the test statistic and find the p-value.
Exam hint
If you have the information for the null hypothesis and the observed value of the test statistic then you can use your GDC to perform a Z-test. It will return a p-value which you then need to compare to the significance level. You should state the test statistic and its distribution, as this shows that you have used the correct test. See Calculator skills sheet I.
Worked example 5.6
Standard light bulbs have an average lifetime of 800 hours and a standard deviation of 100 hours. A manufacturer of low energy light bulbs claims that their bulbs lifetimes have the same standard deviations but that they last longer. A sample of 50 low energy light bulbs have an average lifetime of 829.4 hours. Test the manufacturers claim at the 5% significance level.
Define variables
X = crv Lifetime of a bulb
X ~ N (µ, 1002 )
State hypotheses State the test statistic and its distribution
Use the calculator to find the p-value
H0 : µ = 800 H1 : µ > 800
X
~
N
 
800,
1002  50 
p-value = P (X ≥ 829.4)
= 0.0188 (3SF, from GDC)
Compare to significance level and conclude
0.0188 < 0.05
Therefore reject H0; there is evidence to support the manufacturers claim.
78 Topic 7 Option: Statistics and probability
We can also find the critical region for a Z-test by using the inverse normal distribution.
Worked example 5.7
The temperature of a water bath is normally distributed with a mean of 60 oC and a standard deviation of 1 oC. After being serviced it is assumed that the standard deviation is unchanged. The temperature is measured on 5 independent occasions and a test is performed at the 5% significance level to see if the temperature has changed from 60 oC. What range of mean temperatures would result in accepting that the temperature has changed?
Define variables
X = crv temperature of water bath
X ~ N (µ, 1)
State hypotheses State test statistic and its distribution
H0 : µ = 60 H1 : µ ≠ 60
X ~ N 60,
1 5 
Use inverse normal distribution to find the critical values of X for the twotailed region
Write down the rejection region
2.5%
2.5%
x
a
60
b
P (X < a) = 0.025⇒ a = 59.1
P (X > b) = 0.025⇔ P (X < b = 0.975)
⇒ b = 60.9
∴ X < 59.1 or X > 60.9
Exercise 5B
1. In each of the following situations it is believed that X ~ N(µ, 100). Find the acceptance region in each of the following cases:
(a) (i) H0 : µ = 60; H1 : µ ≠ 60; 5% significance; n = 16 (ii) H0 : µ = 120; H1 : µ ≠ 120; 10% significance; n = 30
(b) (i) H0 : µ = 80; H1 : µ > 80; 1% significance; n = 18 (ii) H0 : µ = 750; H1 : µ > 750; 2% significance; n = 45
5 Hypothesis testing 79
(c) (i) H0 : µ = 80.4; H1 : µ < 80.4; 10% significance; n = 120 (ii) H0 : µ = 93; H1 : µ < 93; 5% significance; n = 400
2. In each of the following situations it is believed that X ~ N(µ, 400). Find the p-value of the observed sample mean. Hence decide the result of the test if it is conducted at the 5% significance level:
(a) (i) H0 : µ = 85; H1 : µ ≠ 85; n = 16; x = 95 (ii) H0 : µ = 144; H1 : µ ≠ 144; n = 40; x = 150
(b) (i) H0 : µ = 85; H1 : µ > 85; n = 16; x = 95 (ii) H0 : µ = 144; H1 : µ > 144; n = 40; x = 150
(c) (i) H0 : µ = 265; H1 : µ < 265; n = 14; x = 256.8 (ii) H0 : µ = 377; H1 : µ < 377; n = 100; x = 374.9
(d) (i) H0 : µ = 95; H1 : µ < 95; n = 12; x = 96.4 (ii) H0 : µ = 184; H1 : µ > 184; n = 50; x = 183.2
3. The average height of 18-year-olds in England is 168.8 cm and the standard deviation is 12 cm. Caroline believes that the students in her class are taller than average. To test her belief she measures the heights of 16 students in her class.
(a) State the hypotheses for Carolines test.
We can assume that the heights follow a normal distribution and that the standard deviation of heights in Carolines class is the same as the standard deviation for the whole population.
The students in Carolines class have an average height of 171.4 cm.
(b) Test Carolines belief at the 5% level of significance.

[6 marks]
4. All students in a large school were given a typing test and it was
found that the times taken to type one page of text are normally
distributed with mean 10.3 minutes and standard deviation
3.7 minutes. The students were given a month-long typing
course and then a random sample of 20 students were asked to
take the typing test again. The mean time was 9.2 minutes and
we can assume that the standard deviation is unchanged. Test
at 10% significance level whether there is evidence that the time
the students took to type a page of text had decreased.

[6 marks]
5. The mean score in Mathematics Higher Level is 4.73 with a standard deviation of 1.21. In a particular school the mean of 50 students is 4.81.
(a) Assuming that the standard deviation is the same as the whole population, test at the 5% significance level whether the school is producing better results than the international average.
(b) Does part (a) need the central limit theorem? Justify your
answer.

[8 marks]
80 Topic 7 Option: Statistics and probability
6. A farmer knows from experience that the average height of apple trees is 2.7m with standard deviation 0.7 m. He buys a new orchard and wants to test whether the average height of apple trees is different. He assumes that the standard deviation of heights is still 0.7 m.
(a) State the hypotheses he should use for his test.
The farmer measures the heights of 45 trees and finds their average.
(b) Find the critical region for the test at 10% level of significance.
(c) If the average height of the 45 trees is 2.3 m state the
conclusion of the hypothesis test. 
[9 marks]
7. A doctor has a large number of patients starting a new diet in order to lose weight. Before the diet, the weight of the patients was normally distributed with mean 82.4 kg and standard deviation 7.9 kg. The doctor assumes that the diet does not change the standard deviation of the weights. After the patients have been on the diet for a while, the doctor takes a sample of 40 patients and finds their average weight.
(a) The doctor believes that the average weight of the patients has decreased following the diet. He wishes to test his belief at the 5% level of significance. Find the critical region for this test.
(b) Did you use the central limit theorem in your answer to part (a)? Justify your answer.
(c) The average weight of the 40 patients after the diet was
78.4 kg. State the conclusion of the test. 
[11 marks]
8. The school canteen sells coffee in cups claiming to contain 250 ml. It is known that the amount of coffee in a cup is normally distributed with standard deviation 6 ml. Adam believes that on average the cups contain less coffee than claimed. He wishes to test his belief at 5% significance level.
(a) Adam measures the amount of coffee in 10 randomly chosen cups and finds the average to be 248 ml. Can he conclude that the average amount of coffee in a cup is less than 250 ml?
(b) Adam decides to collect a larger sample. He finds the average
to be 248 ml again, but this time this is sufficient evidence to
conclude at the 1% significance level that the average amount
of coffee in a cup is less than 250 ml. What is the minimum
sample size he must have used?
[12 marks]
5C H ypothesis testing for a mean with unknown variance
In the more realistic situation where we do not know the true population variance, we must use the T-score as our test statistic, knowing that it follows a tn1 distribution. This is called a t-test.
See
Section
4D to remind
yourself about the
t-distribution.
5 Hypothesis testing 81
Exam hint
Your calculator can perform a t-test. You should state the test statistic, its distribution and the p-value from your calculator. If the mean and standard deviation of the sample were not given in the question you should state those too; the calculator will find them in the process of performing the t-test. See Calculator skills sheets E and F.
Worked example 5.8
The label of a pre-packaged steak claims that it has a mass of 250 g. A random sample of 6 steaks is taken and their masses are: 240 g, 256 g, 244 g, 239 g, 245 g, 251 g Test at the 10% significance level whether the labels claim is accurate, stating any assumptions you need to make.
Define variables State hypotheses
X = mass of a steak in g
We assume that X ~ N(µ, σ2)
H0 : µ = 250 H1 : µ ≠ 250
State test statistic and its distribution: use t-test since variance is unknown
T
=
X
250 sn 1
~ t5
6
Find sample statistics Find the p-value (use calculator)
From GDC:
x = 245.8 sn1 = 6.55
p-value = 0.177 (3SF, from GDC)
Compare to significance level and conclude
0.177 > 0.1 Therefore do not reject H0 - there is insufficient evidence to show that the labels claim is inaccurate.
Exercise 5C
1. In each of the following situations it is believed that X is normally distributed.
Find the p -value of the observed sample mean. Hence decide the result of the test if it is conducted at the 5% significance level:
(a) (i) H0 : µ = 85; H1 : µ ≠ 85; n = 30; x = 92; sn1 = 12 (ii) H0 : µ = 122; H1 : µ ≠ 122; n = 16; x = 117; sn1 = 8.6
82 Topic 7 Option: Statistics and probability
(b) (i) H0 : µ = 62; H1 : µ > 62; n = 6; x = 65; sn = 32.1 (ii) H0 : µ = 83.4; H1 : µ < 83.4; n = 8; x = 72; sn = 30.7
(c) (i) H0 : µ = 14.7; H1 : µ ≠ 14.7 Data :14.7, 14.4, 14.1, 14.2, 15.0, 14.6
(ii) H0 : µ = 79.4; H1 : µ < 79.4 Data : 86.4, 79.5, 80.1, 69.9, 75.5
2. John believes that the average time taken for his computer to start is 90 seconds. To test his belief, he records the times (in seconds) taken for the computer to start:
84, 98, 79, 75, 91, 81, 86, 94, 72, 78
(a) State suitable hypotheses.
(b) Test Johns belief at the 10% significance level.
(c) Justify your choice of test, including any assumptions
required. 
[8 marks]
3. Michel regularly buys 150 g packets of tea. He has noticed recently that he gets more cups of tea than usual out of one packet, and suspects that the packets contain more than 150 g on average. He weighs eight packets and finds that their mean mass is 153 g and the standard deviation of their masses is 4.2 g.
(a) Find the unbiased estimate of the standard deviation of the masses based on Michels sample.
(b) Assuming that the masses are normally distributed, test Michels suspicion at 5% level of significance.  [7 marks]
4. The age when 20 babies in a nursery first start to crawl is
recorded. The sample has mean 7.1 months and standard
deviation 1.2 months. A parenting book claims that the
average age for babies crawling is 8 months. Test at the 5%
level whether babies in the nursery crawl significantly earlier
than average, assuming that the distribution of crawling ages is
normal. 
[7 marks]
5. Ayesha thinks that cleaning the kettle will decrease the amount of time it takes to boil (t). She knows that the average boiling time before cleaning is 48 seconds. After cleaning she boils the kettle 6 times and summarises the results as:
∑t = 283, ∑t2 = 13369
(a) State suitable hypotheses.
(b) Test Ayeshas ideas at the 10% significance level.  [8 marks]
6. A national survey of athletics clubs found that the mean time for a 17-year-old athlete to run 100 m is 12.7s. A coach believes that athletes in his club are faster than average. To test his belief he collects the times for 60 athletes from his club and summarises the results in the following table:
5 Hypothesis testing 83
Time (s) 11.311.7 11.711.9 11.912.1 12.112.3 12.312.5 12.512.7 12.712.9 12.913.1 13.113.5
Frequency 2 3 5 9 12 12 9 5 3
(a) Estimate the mean time for the athletes in the club.
(b) Find an unbiased estimate of the population standard deviation based on this sample.
(c) Test the coachs belief at the 5% level of significance.
(d) Explain how you have used the central limit theorem in
your answer to part (c). 
[10 marks]
7. The lengths of bananas are found to follow a normal distribution with mean 26 cm. Roland has recently changed banana supplier and wants to test whether their mean length is different. He takes a random sample of 12 bananas and obtains the following summary statistics:
∑x = 270, ∑x2 = 6740
(a) State suitable hypotheses for Rolands test.
(b) Test at 10% significance level whether the data support the hypothesis that the mean length of Rolands bananas is different from 23cm.
(c) Rolands assistant Sabiya suggests that they should test whether the mean length of bananas from the new supplier is less than 23cm.
(i) State suitable hypotheses for Sabiyas test.
(ii) Find the outcome of Sabiyas test. 
[11 marks]
8. Tins of soup claim to contain 300 ml of soup. Aki wants to test if this is an accurate claim. She samples n tins of soup and finds that they have a mean of 303 ml and an unbiased estimate of the population standard deviation of 2 ml.
(a) State appropriate null and alternative hypotheses.
(b) For what values of n will Aki reject the null hypothesis at
the 5% significance level?
[6 marks]
84 Topic 7 Option: Statistics and probability
5D Paired samples
We often need to ask if a particular factor has a measurable influence. As we saw in Section 4E there are issues associated with studying two different groups so we look for paired samples. If the data come in pairs we can then create another random variable, the difference between the pair, and apply the methods of Sections 5B and 5C to test if the average difference is zero.
We must decide whether to apply a t-test or a Z-test by checking if the population standard deviation is estimated from the data or given.
Worked example 5.9
The masses of rabbits after a long period of eating only grass is compared to the masses of the same rabbits after a period of eating only Ra-Bites pet food. It may be assumed that their masses are normally distributed. The makers of Ra-Bites claim that rabbits will get heavier if they eat their food instead of grass. Test this claim at the 10% significance level.
Rabbit
ABCD
Mass on grass diet / kg 2.8 2.6 3.1 3.4
Mass on Ra-Bites / kg 3.0 2.8 3.1 3.2
Define variables State hypotheses State test statistic and its distribution: use t-test as the variance is unknown Find the differences
Find sample statistics
Find the p-value using calculator Compare to significance level and conclude
d = mass on Ra-Bites- mass on grass d ~ N(µ, σ2 )
H0 : µd = 0 H1 : µd > 0
T
=
d 0 sn 1
~ t3
4
Rabbit A B C D
d
0.2 0.2 0 0.2
From GDC: d = 0.05
sn1 = 0.191
p-value = 0.319 (3SF, from GDC)
0.319 > 0.1
Therefore do not reject H0 - there is insufficient evidence for the makers claim.
5 Hypothesis testing 85
Exercise 5D
Exam hint cabmatthehlleiecteYswauntohsleoauuuyeftsrpocneedoarmititnfhtthfoeeeetunorcssttewsirepsneaoecaytenerotfessdout.rrm
1. Test the stated hypotheses at 5% significance. The difference, d, is defined as after before. You may assume that the data are normally distributed.
(a) H0 : µd = 0; H1 : µd > 0
Subject
A
B
C
D
E
Before
16
20
20
16
12
After
18
24
18
16
16
(b): H0 : µd = 0; H1 : µd ≠ 0
Subject A
B
C
D
E
F
Before 4.2 6.5 9.2 8.1 6.6 7.1
After
5.3 5.5 8.3 9.0 6.1 7.0
2. A tennis coach wants to determine whether a new racquet improves the speed of his pupils serves (faster serves are considered better). He tests a group of 9 children to discover their service speed with their current racquet and with the new racquet. The results are shown in the table below.
Child
ABCDE F GH I
Speed with
58 68 49 71 80 57 46 57 66
current racquet
Speed with new 72 81 52 59 75 72 48 62 70
racquet
(a) State appropriate null and alternative hypotheses.
(b) Test at the 5% significance level whether or not the new racquets
increase the service speed, justifying your choice of test.

[8 marks]
3. Reading speed of 12-year-olds is measured at different times of the day. It is known that the differences between the reading speed in the morning and in the evening follow a normal distribution with standard deviation of 80 words per minute. Eight 12-year-old pupils from a particular school are tested and their reading speeds in the morning and in the evening are recorded.
Pupil
ABCDE F GH
Morning 572 421 348 612 364 817 228 350
Evening 421 482 302 687 403 817 220 341
Test at 5% significance level whether there is evidence that for the pupils in this school, the reading in the morning and in the evening are different. State your hypotheses clearly. [8 marks]
4. It is believed that the second harvest of apples from trees is smaller than the first. Ten trees are sampled and the number of apples in the first and second harvests are recorded.
86 Topic 7 Option: Statistics and probability
Tree Apples in first harvest Apples in second harvest
ABCDE F GH I J 80 72 45 73 68 53 64 48 81 70 75 74 40 67 60 55 58 36 89 60
Stating the null and alternative hypotheses, carry out an
appropriate test at the 5% significance level to decide if the claim
is justified. 
[8 marks]
5. Doctor Tosco claims to have found a diet that will reduce a persons weight, on average, by 5 kg in a month. Doctor Crocci claims that the average weight loss is less than this. Ten people use this diet for a month. Their weights before and after are shown below:
Person
ABCDE F GH I
J
Weight before (kg) 82.6 78.8 83.1 69.9 74.2 79.5 80.3 76.2 77.8 84.1
Weight after (kg) 75.8 74.1 79.2 65.6 72.2 73.6 76.7 72.9 75.0 79.9
(a) State suitable hypotheses to test the doctors claims.
(b) Use an appropriate test to analyse these data. State your conclusion at:
(i) the 1% significance level
(ii) the 10% significance level.
(c) What assumption do you have to make about the data?
[9 marks]

(© IB Organization 2006)
6. It is known that marks on a Mathematics test follow a normal distribution with standard deviation 12 and marks on an English test follow a normal distribution with standard deviation 9.5.
Random variables M and E are defined as follows: M = mark on Mathematics test E = mark on English test Define D = E M.
(a) State the distribution of D and find its standard deviation.
Marika believes that students at her college get higher marks on average in the English test. The marks of seven students from the college are shown in the table:
Student Maths English
PQR S TUV 72 61 45 98 82 53 58 72 55 55 97 95 72 61
(b) State suitable null and alternative hypotheses to test whether Marikas belief is justified.
(c) State your conclusion at the 5% level of significance.

[9 marks]
5 Hypothesis testing 87
5E Errors in hypothesis testing
The acceptable conclusions to a hypothesis test are: 1. Sufficient evidence to reject H0 at the significance level. 2. Insufficient evidence to reject H0 at the significance level. It is always possible that these conclusions are wrong. If the first conclusion is wrong, that is we have rejected H0 while it was true, it is called a type I error. If the second conclusion is wrong, that is we have failed to reject H0 when we should have done, it is called a type II error. We cannot eliminate these errors, but we can find the probability that they occur. For a type I error to occur the test statistic must fall within the rejection region while H0 was true. But we set up the rejection (critical) region to fix this probability as the significance level. Key Point 5.2
When a test statistic follows a continuous distribution the probability of a type I error is equal to the significance level.
If the test statistic follows a discrete distribution, we may not be able to find a critical region so that its probability is exactly equal to the required significance level. In that case the probability of a type I error will be less than or equal to the stated significance level.
Key Point 5.3
( ) P type I error = P(rejecting H0 | H0 is true)
Worked example 5.10
X ~ Geo(p) and the following hypotheses are tested:
H0
:
p
=
1 3
H1
:
p
<
1 3
The null hypothesis is rejected if X ≥ 7. Find the probability of a type I error in this test.
Use definition of type I error
P
 
rejectingH0
|p
=
1 3 
=
P
 
X
7
where
X
~
Geo
 
1 3
 
 
=
 
2 3 
6
= 8.78%
88 Topic 7 Option: Statistics and probability
If the true mean is anything other than that suggested by the null hypothesis and we have not rejected the null hypothesis then we have made a type II error. The probability of a type II error depends upon the true value the population mean takes. Suppose we are testing the null hypothesis μ = 120 with a standard deviation of 10. If the true mean were 160 we would expect to be able to detect this very easily. If the true mean were 120.4 we might have greater difficulty distinguishing this from120. If we knew the true mean then we could find the probability of an observation of this distribution falling in the acceptance region for H0. In the diagrams below, the red regions are where the conclusion is correct, and the blue regions represent errors.
Rejection region
Acceptance region
Rejection region
H0 is true 5% Type I error 95% Correctly not reject H0
2.5%
95%
2.5%
x
µ = 120
µ = 120.4
H0 is untrue (small difference) Type II error
Correctly reject H0
x
µ = 130
H0 is untrue (medium difference) Type II error
Correctly reject H0
x
H0 is untrue (large difference) Type II error
Correctly reject H0
x
µ = 160
Key Point 5.4
( ) P type II error = P(not rejecting H0 | a specific alternative to H0 )
5 Hypothesis testing 89
Worked example 5.11
Internet speeds to a particular house are normally distributed with a standard deviation of 0.4 Mbps. The internet provider claims that the average speed of an internet connection has increased above its long term value of 9 Mbps. A sample is taken on 6 occasions and a hypothesis test is conducted at the 1% significance level. Find the probability of a type II error if the true average speed is 9.6 Mbps.
Define variables
X = crv Speed of internet connection
X ~ N (µ, 0.42)
State hypotheses
State test statistic and its distribution (assuming H0 is true)
H0 : µ = 9 H1 : µ > 9
X
~
N
 
9,
0.42 6
 
Decide range of X which falls into one-tailed acceptance region
State the acceptance region
x
9
a
P (x < a) = 0.99
P (X < a) = 0.99 ⇒ a = 9.38
So accept H0 if X < 9.38
Use the definition of type II error
P (type II error) = P(X < 9.38 | µ = 9.6)
=P
(X
<
9.38) where
X
~
N
 
9.6,
0.42  6 
= 8.94% (3SF from GDC)
90 Topic 7 Option: Statistics and probability