Skip to document

Statistics Coursework Report

Statistics report for coursework based on hypothesis testing
Module

Statistics for business I (MA20228)

3 Documents
Students shared 3 documents in this course
Academic year: 2022/2023
Uploaded by:
Anonymous Student
This document has been uploaded by a student, just like you, who decided to remain anonymous.
University of Bath

Comments

Please sign in or register to post comments.

Preview text

Statistics Coursework Report

Problem 1

The first thing I did for problem 1 was creating my sample using the Data Analysis: Random Number Generation tool. I used the following criteria:

As the sample size (n) is 45, that is the amount of random numbers I requested. I gathered the information about the distribution, p-value and number of trials from Population B.

For part a: This is a one sample, two tailed hypothesis test with a known variance.

We know the variance as the binomial distribution states the variance for a binomial distribution is npq, we know that the population n is 200, p is 0, making q 0 (1-0) – giving us 48 for our variance, and 6 as our standard deviation. We can also use the binomial distribution rule to find the mean, with the formula np, giving us a population mean of 120.

We would expect that our population mean/variance, and our sample mean/variance are similar due to central limit theorem – and this shows true. I calculated the sample mean using the ‘average’ function in excel, the sample standard deviation (s) using the ‘st’ function, and the sample variance () by squaring the sample standard deviation.

As the number of trials in the population is 200, if the coin was fair there would be a 0 chance of heads or tails. As this is a binomial experiment, we can calculate the hypothesised mean as np, where n is 200 and p is 0, giving us an answer of 100. Therefore, my hypotheses are:

H 0 :

H 1 :

This is a two tailed test because we aren’t stating whether the coin is biased towards heads or tails – just that it is biased. The significance level I will be using is 0 (1%).

For this question I can use the Z-Test, this is justified by the sample size being greater than 30, and the fact we know the population variance.

The formula for finding a z-score from a sample is:. My formula is:

Using the norm.s we can find the critical value for this test. Norm.s gives the inverse of the standard normal distribution (z distribution) and returns the critical value if I correctly input the information (Cosmosweb, 2022). As it is two tailed, the alpha of 0 must be divided by 2 in the

function, giving an answer of. Our test statistic of 0 is less than the critical value, meaning that we cannot reject the null hypothesis and we hypothesise that the coin is fair.

For part b: I calculated the p-value with 2 different methods. For method 1 I rounded up the z score to 2d and then by using the z tables, I found that 0 is equal to 0. I then calculated 1 – zvalue, and because it is 2 tailed, I had to multiply it by 2 so it includes both tails of the normal distribution curve, giving an answer of 0. For method 2 I used the excel function Norm.s, as it is 2 tailed, I multiplied it by 2 at the start, and input the z value, which gave me an answer of 0. Both these methods gave different answers, but they are extremely close together – giving me confidence that they are the correct calculations. The differences can be explained by the fact I had to round up my z-score in method 1, so that I could find it on the z-tables, whereas method 2 uses the accurate number. Despite these differences, I have reached the same conclusion that as the p-values are lower than the significance level of 0, we can confirm that the null hypothesis can’t be rejected.

For part c: A type 2 error occurs when we don’t reject (we accept) the null hypothesis, when actually the null hypothesis was false, and we should have rejected it. If the true probability is 0, we know from its binomial distribution that the true mean is np, therefore 200x0 which is 120.

We are testing P(Type 2 error) = P(didn’t reject H0 | H0 is false) in other words P(accepted H0 | should have rejected H0). In numbers:.

I calculated the new z value, as we now know the true population mean, the denominator of the equation stayed the same. I received an answer of -19. My next step would be to look at the z- table to see which number correlated with -19, however this number doesn’t show on the z-table, implying an error in my calculations. If I were to look on a z-table, the probability would be: P(z<-19).

Problem 2

For part a: I firstly generated the 44 samples with the alloy in column A, using the data from Population A. To generate F, I initially needed to generate 44 samples from populations D and E – and then to get the sample from population F, I added the values in each row from D and E, and took away 50 from each one.

For part b: In this problem, we have a two-sample hypothesis test with an unknown variance. We know the variance from Population A, as it has a standard deviation of 6, however we don’t know the variance from population F as it is a mixture of 2 Populations (D and E).

This is a two tailed test because we are looking if the difference of means is either equal to 80, or not equal to 80 – meaning it could be greater or lesser than 80kg. The two samples’ sizes involved are both 44, and both samples are independent and are from two distinct populations (before alloy addition and after alloy addition).

As the variance is unknown, we will use the sample variance to calculate the pooled variance estimate (: 

Therefore, we can conclude that the sampling distribution of the sample proportion is normally distributed as the interval values are between 0 and 1.

For part c: This is a one sample hypothesis test involving proportion. As the proportion is above 30 (33%), central limit theorem tells us to use the Z-Test. Our sample proportion is

So: .

The standard deviation for this proportion is .

The z-score is calculated: .

I calculated the critical value using norm.s, as it is a two tailed test I used Z0/2. My critical value is 1. Therefore, as -1 is greater than -1 and less than +1, we cannot reject the null hypothesis, and hypothesise that the mean of companies operating in the UK has a proportional mean of 0.

For part d: To calculate the p-value I used 2 methods, to double check myself. The first method was using norm.s, and multiplying the answer by 2 as it’s a 2 tailed test. The other method was using the z-tables. Both methods gave extremely close answers, the difference is due to having to round up to 2 d in method 2, making method 1 more accurate. However, as the p-value figure is above 1, I know there has been a mistake in my previous calculations. The p-value is a probability, so it can only be between 0-1.

For part e: As previously stated, type 2 error’s occur when we don’t reject the null hypothesis, when actually the null hypothesis was false.

We are testing P(Type 2 error) = P(didn’t reject H0 | H0 is false) in other words P(accepted H0 | should have rejected H0). In numbers:.

I calculated the new z value, as we now know the true population mean, the denominator of the equation stayed the same. I received an answer of -0. Therefore P(T2error) = P(z<-0). The z table shows that -0 is equal to 0. Therefore, P(z<-0) = 0.

Problem 4)

For part a: The null hypothesis () is that the average capacity of a water storage tank is equal

to 4400 litres, in contrast, the alternate hypothesis () is that the average capacity of a water

storage tank is not equal to 4400 litres. Written as:

:

:

I will be using the z score due to the large sample number (105), meaning that the distribution is close to normal. The formula for finding a z-score is 

This is a two tailed test as we aren’t stating whether we are looking for above, or below 4400 litres. I found the critical value ( using the norm.s function, inputting the alpha value and dividing it by 2 as it is a two tailed test. The critical value is. Therefore, as our z-score of 2 is greater than our critical value of 1, we can reject the null hypothesis and accept the alternate hypothesis. This means that the average capacity of a water storage tank isn’t equal to 4400 litres at a 5% significance level.

For part b: For this test I used one-way ANOVA. I couldn’t use hypothesis testing with t or z score as it would have been multiple calculations, which would increase the chances of a family-wise error. In this question, the response variable is the capacity of the water tank, and the explanatory variable/treatment is the irregular shape of plastic. There are 3 groups, meaning that g=3 (plastic a, plastic b and plastic c).

My two hypotheses are:

I started by calculating the sample means for each group using the average function in excel, and then I worked out the overall sample mean by finding the average of the 3 individual sample means. I noticed that these means are very close together, so we will be expecting a low Mean Square Between (MSB).

I calculated the Sum of square between using the formula:

Then I calculated the MSB with the formula:

Then, using the sample variances, I calculated the sum of square due to error (SSE) using this formula:

Then I calculated the mean square error:

My F score is:

There are 2 degrees of freedom (DoF), the first is calculated g-1, so 3-1=2. The second is calculated N- g, so 105-3=102. Therefore, my distribution is F(x;2,102). I found the critical value using F.Inv, and I inputted the 0 significance level, and the 2 degrees of freedom – the critical value is 3.

Therefore, it can be concluded that as the f-score of 0 is less than the critical value is 3, we cannot reject the null hypothesis and we hypothesise that the 3 different types of plastic have the same capacity.

Therefore, as all my z-scores were 0, we can conclude that we cannot reject the null hypothesis, and accept that the new containers have an equal capacity to the expected capacity.

Problem 5)

For part a: As I am using one-way anova, I will be testing this question with the F distribution, where F~(x;d 1 ,d 2 ). My two hypotheses are:

Firstly I calculated the sample means, the overall sample mean and the sample variances. I calculated these using the average, and stdev function in excel. I noticed that the sample means are spread out quite a lot – suggesting a difference between them.

I calculated the Sum of square between using the formula:

This is shown in the table titled SSB.

Using the SSB, I then calculated the means square between (MSB) with the formula:

Where g = 3. The MSB answer is very large, which confirms that the sample means are spread out a lot.

I then calculated the sum of square due to error, using the sample variances, and this formula:

Then, the final step before finding the F-score was to calculate the MSE, with the following formula:

My resulting f-score was 14. To find the critical values I needed to calculate the 2 degrees of freedom, which can be shown in the DoF table. My F distribution is F~(x;2,15). I found the critical value using the f.inv function in excel, with the 0 significance level. I also confirmed this critical level of 3 by looking at the F table for 0.

I then computed the one way anova table, to double check my findings. The formula for the table is: Source SS df MS F Fcritial Between groups

SSB g-1 MSB FC

Error SSE N-g MSE Total SST N-

To conclude, as the f-score of 14 is greater than the critical value of 3, we can reject the null hypothesis. This means, as expected from the high MSB, there is a difference in the means and the model of car does impact the miles it can travel on a full tank.

For part b: I am using 2 way anova without replication for this question. Where g (rows) = 6 and d (columns/blocks) = 3. In excel, I have filled out the table below. I used the a significance level of 0% which equates to 3.

Source SS df MS F Fcritial Rows SSR g-1 MSR= CR Column SSC h-1 MSC= CC Error SSE (g-1)(h-1) MSE= Total SST gh-

As both test statistic values are greater than the critical values, we can reject the null hypothesis and claim there is a difference, and the model of the car does impact the miles it can travel on a full tank.

For part c: The sum of squares due to error (SSE) I found in part a was 19739, whereas in part b it was only 143, promoting a very large difference of 19596. SSE is the sum of the sample variances multiplied by the sample size – 1. This large difference is not clear to me, however I do know I should be looking at how they are calculated.

Was this document helpful?

Statistics Coursework Report

Module: Statistics for business I (MA20228)

3 Documents
Students shared 3 documents in this course

University: University of Bath

Was this document helpful?
Statistics Coursework Report
Problem 1
The first thing I did for problem 1 was creating my sample using the Data Analysis: Random Number
Generation tool. I used the following criteria:
As the sample size (n) is 45, that is the amount of random numbers I requested. I gathered the
information about the distribution, p-value and number of trials from Population B.
For part a:
This is a one sample, two tailed hypothesis test with a known variance.
We know the variance as the binomial distribution states the variance for a binomial distribution is
npq, we know that the population n is 200, p is 0.6, making q 0.4 (1-0.6) – giving us 48 for our
variance, and 6.93 as our standard deviation. We can also use the binomial distribution rule to find
the mean, with the formula np, giving us a population mean of 120.
We would expect that our population mean/variance, and our sample mean/variance are similar due
to central limit theorem – and this shows true. I calculated the sample mean using the ‘average’
function in excel, the sample standard deviation (s) using the ‘st.dev’ function, and the sample
variance () by squaring the sample standard deviation.
As the number of trials in the population is 200, if the coin was fair there would be a 0.5 chance of
heads or tails. As this is a binomial experiment, we can calculate the hypothesised mean as np,
where n is 200 and p is 0.5, giving us an answer of 100. Therefore, my hypotheses are:
H0:
H1:
This is a two tailed test because we aren’t stating whether the coin is biased towards heads or tails –
just that it is biased. The significance level I will be using is 0.01 (1%).
For this question I can use the Z-Test, this is justified by the sample size being greater than 30, and
the fact we know the population variance.
The formula for finding a z-score from a sample is: . My formula is:
Using the norm.s.inv we can find the critical value for this test. Norm.s.inv gives the inverse of the
standard normal distribution (z distribution) and returns the critical value if I correctly input the
information (Cosmosweb, 2022). As it is two tailed, the alpha of 0.01 must be divided by 2 in the