Assessment Task 2 QUESTIONS
MATH1309/MATH2142
TOTAL = 155 marks
Q1. DATA SET WBC (Marks = 60 marks)
In an imaging experiment 50 white blood cells (WBCs) from non-diseased and 50 white blood cells
from diseased patient group were analysed.
On each WBC the following characteristics of the WBC image was measured
• eccen Cell eccentricity
• arn Cell area
• perin Perimeter of the cell
• soln Solidity of the cell
• ext Extent of the cell
• diam Diameter of the cell
Figure 1: Images of WBCs on the RHS.
Part A:
The aim of the study on WBCs is to test whether the two groups have identical population mean
vectors.
NOTE: Show your SAS code and relevant formulae, outputs, and interpretation.
a) Find the group specific mean vectors, and the variance-covariance and the correlation
matrices. (5 marks)
b) State your null and alternative hypotheses, show the mathematical formulae for the
appropriate test statistic and the formulation for finding the critical value and associated p
values. (5 marks)
c) Carry out appropriate multivariate procedures to determine whether the WBC cells differ
across the diseased and non-diseased populations. Show your SAS code and relevant
formulae, outputs, and interpretation. (15 marks)
d) What underlying assumptions are involved in this test procedure above? (5 marks)
e) Now test which of the WBC characteristics differ significantly between the diseased cells
compared to the non-diseased. Use the pooled variance of the relevant variables. (5 marks)
f) Obtain the 95% simultaneous confidence intervals of the differences, state your alpha value.
Create a table of the resultant confidence intervals. (5 marks)
g) Obtain the analogous Bonferroni 95% Confidence Intervals of the differences. Create a table
of the resultant confidence intervals. (5 marks)
h) Upon which if any of the WBC measures do the disease and non-diseased cells differ
significantly, based on your answers in f)-g)? (5 marks)
Part B:
NOTE: Show your SAS code and relevant formulae, outputs, and interpretation.
a) For each group, plot the pairwise 90% prediction ellipses using PROC CORR for the pair of
variables most significantly and negatively correlated and the pair of variables most
significantly positively correlated. (5 marks)
b) Produce plots to test for multivariate normality of your data for each group. Is the data
normal? (5 marks)
NOTE: In all parts of the question ensure you show your SAS code and relevant formulae, outputs,
and interpretation.
Q2. DATA SET WBC (Marks = 70 marks)
For the data set analysed in Question 1 on WBCs:
For the diseased group:
a) Perform a principal component analysis (PCA). Show your full SAS code and all SAS output (5
marks)
b) Give (write out) the formulation of the first 3 principal components Prin j, j=1, …, 3 (PC1,
PC2, PC3). (3 marks)
c) Find the variance and the cumulative proportion explained by each of the full suite of
principal components. (3 marks)
d) Create the Principal Component Pattern Profile plot and interpret all the Principal
Components. Justify your answers carefully according to your Principal Component Pattern
Profile plot. (8 marks)
e) How many principal components (PC’s) would you retain based on the scree plot? Justify
your answer. (2 marks)
f) Perform formal statistical tests to ascertain the optimal number of principal components to
retain. HINT: Test the significance of the “larger” components, that is, the components
corresponding to the larger eigenvalues. (4 marks)
g) Construct the 95% CI for 1. Show your formula and working along with the result. (2 marks)
h) Construct the 95% CI for 2. Show your formula and working along with the result. (2 marks)
i) Which variables contribute the most to PC2? (1 mark)
For the non-diseased group:
a) Perform a principal component analysis (PCA). Show your full SAS code and all output (5
marks)
b) Give (write out) the formulation of the first 3 principal components Prin j, j=1, …, 3 (PC1,
PC2, PC3). (3 marks)
c) Find the variance and the cumulative proportion explained by each of the full suite principal
components. (3 marks)
d) Create the Principal Component Pattern Profile plot and interpret all the Principal
Components. Justify your answers carefully according to your Principal Component Pattern
Profile plot. (8 marks)
e) How many principal components (PC’s) would you retain based on the scree plot? Justify
your answer. (2 marks)
f) Perform formal statistical tests to ascertain the optimal number of principal components to
retain. HINT: Test the significance of the “larger” components, that is, the components
corresponding to the larger eigenvalues. (4 marks)
g) Construct the 95% CI for 1. Show your formula and working along with the result. (2 marks)
h) Construct the 95% CI for 2. Show your formula and working along with the result. (2 marks)
i) Which variables contribute the most to PC2? (1 mark)
j) Make comments about the differences and similarities between the PC analytic results
based on the diseased and non-diseased PC pattern profiles and the first 2 PCs found. (10
marks)
NOTE: In all parts of the question ensure you show your SAS code and relevant formulae, working
out, outputs, and write your conclusions and interpretation carefully.
Q3. DATA SET TWIN: (Marks = 25 marks)
A sample of identical twin’s personality traits (TCIs) as discussed in a psychometric case study
were investigated.
A total of 30 twin pairs were questioned. The following questions were put to the twins.
• X1: What is the level of Novelty Seeking (NS) you observe in your twin?
• X2: What level of Novelty Seeking NS) does your twin see in you?
• X3: What is the level of Harm Avoidance (HA) you observe in your twin?
• X4: What level of Harm Avoidance (HA) does your twin see in you?
Responses were recorded on the five-point scale. Responses included the following rank values
1. None of the trait in question
2. Very low level of the trait in question
3. Some level of the trait in question
4. A great deal of the trait in question
5. Huge level of the trait in question.
The aim of the study was to ascertain whether the twins accurately perceive/rank the NS and HA
levels of their twin.
a) Perform the appropriate Hotelling’s T-squared test – show your formula, SAS code, SAS
output, hypothesis tests being tested, test statistic and p value. (10 marks)
b) Provide via SAS the sample means and variances of the differences in responses between
the twins (5 marks).
c) Does the first twin accurately perceive the level of NS or HA of the second twin? Justify your
conclusion. (10 marks)
NOTE: In all parts of the question ensure you show your SAS code or IML code and relevant
formulae, outputs, working and interpretation. Write your conclusions out carefully.