Correlation, Regression, and Probability

101 views 10:50 am 0 Comments March 12, 2023

1
ACTIVITY 2 WORKSHEET
Activity #2: Correlation, Regression, and Probability
Please read the instructions carefully and complete the following 38 online questions within
the given time, 60 minutes. You have 3 attempts available in this activity. The highest score
counts.
Problem solving strategies:
1. Download the Activity 2 Worksheet.
2. Locate the saved data file used in the Activity 1.
3. Working on the questions in the activity offline.
4. Round answers to 4 decimal places if possible.
5. After completing the activity questions offline, fill in the question answers online
within the time limit.
6. Check the scores on each question and redo the questions for the second or the third
attempts.
7. REMEMBER! You have to complete the online questions! NOT submitting the
activity question file.
Please notice that if you are using the data file saved from the previous activities, you are
good to go for the questions. However, if you are downloading the original data file (Original
Data.xlsx), you need to clean up the outliers by following the procedures in the previous
activity.
The correct data file used in this activity should include 187 observations with 7 variables.
In this activity, you are going to learn about how two variables can be related to each other
and how one variable can be used to predict the outcomes of the other variable. Since
correlations can be applied only to those numerical variables, you need to transfer those
nominal variables into numbers.
To do this, please use the “
Evaluate Formula” function under the “Insertpull-down menu.
Copy and paste the following formula into the dialogue box and click “
Evaluate”.
if([Game]=”Yes”,1,2)
This function creates a new variable (at the end of the last variable) with 1 as “Yes” and 2 as
“No”.
Copy and paste the following formula into the dialogue box and click “
Evaluate”.
if([Computer]=”Windows”,1,2)
This function creates a new variable (at the end of the last variable) with 1 as “Windows” and
2 as “Mac”.
Copy and paste the following formula into the dialogue box and click “
Evaluate”.
if([Gender]=”Female”,1,2)
2
This function creates a new variable (at the end of the last variable) with 1 as “Female” and 2
as “Male”.
Please “
Rename” the variables (under the pull-down menu at each variable name) as
“Game1”, “Computer1”, and “Gender1”, respectively.
The next task is to create Scatterplots among those variables. Scatterplots use Cartesian
coordinates to show the relationship between two numerical variables. Use “
Scatterplot
function under the “
Graphics” pull-down menu. Enter proper X (the Independent (or
predictor) variable) and Y (the dependent (or response) variable) and then “
Calculate”.
(Since the new variables, “Game1”, “Computer1”, and “Gender1” have only two levels, the
scatterplot using these variables will be almost useless. Therefore, those scatterplots will not
be performed.) The remaining 4 variable can be paired-up to show 6 scatterplots.
Although to show the scatterplot results is not required in this activity, they are very useful to
detect for outliers.
Unfortunately, the scatterplot itself shows very little information as they do not include any
“statistics”. To be able to evaluate the relationship strength and the direction of the variable
pairs, you need to find the correlation coefficients among those interested variable pairs by
using the “
Correlation” function under the “Statistics” pull-down menu. A total of six
variables will be selected and then “
Calculate”. (GPA 2 will not be included because it is
depending on GPA 1.)
Q1: Find the correlation coefficients (r) of each variable pair. (Round answers to 4
decimal places.) (Includes 15 answers. Next question is Q16.)

Game Computer Gender Online Study
Computer
Gender
Online
Study
GPA1

3
With a sample size between 150 and 300, we need a correlation coefficient that is greater
than
0.1593 to claim that the relationship between two variables is significant at an alpha
level of 0.05. (We will talk about the alpha level in later chapters.)

Q16: With this standard, please identify those variable pairs which are significant and
positively related.
Q17: With this standard, please identify those variable pairs which are significant and
negative related.
Q18: With this standard, please identify those variable pairs which are not related.

Since there are some significant relationships found. Some Regression models can be used to
describe their relationships. We are going to test two regression models.
In CrunchIt, select
Statistics>Regression>Simple Linear and use set up proper independent
and dependent variables. Use
display numeric results to see the model details and use
display Fitted Plot to see the scatterplot with the fitted line.
The first model is to use Study as the independent (the X or the predictor) variable to predict
GPA1 (the dependent, the response, or the Y variable).
The second model is to use Online as the independent (the X or the predictor) variable to
predict GPA1 (the dependent, the response, or the Y variable).
Please make sure that you select the Dependent and Independent variables correctly.

Q19:
Q20:
Q21:
What is the slope first model? (Round answer to 4 decimal places.)
What is the y-intercept first model? (Round answer to 4 decimal places.)
What proportion of the variation of the “GPA 1” can be explained by using the
first Model? (Hint: it is the square of the r.) (Round answer to 4 decimal places.)
Q22: If a student spent 20 hours studying per week, what is the predicted GPA1 of
this student? (Round answer to 4 decimal places.)
Q23: If a student spent 15 hours studying per week, what is the predicted GPA1 of
this student? (Round answer to 4 decimal places.)
Q24:
Q25:
Q26:
What is the slope second model? (Round answer to 4 decimal places.)
What is the y-intercept second model? (Round answer to 4 decimal places.)
What proportion of the variation of the “GPA 1” can be explained by using the
second Model? (Hint: it is the square of the r.) (Round answer to 4 decimal places.)
Q27: If a student spent 20 hours surfing online per week, what is the predicted GPA1
of this student? (Round answer to 4 decimal places.)
Q28: If a student spent 15 hours surfing online per week, what is the predicted GPA1

of this student? (Round answer to 4 decimal places.)
4
Recall the functions we have used during the ACT 1 and answer the following question pairs.

Q29: What is the sample mean of GPA1 for male students? (Round answer to 4
decimal places.)
Q30: What is the sample mean of GPA1 for female students? (Round answer to 4

decimal places.)

Q31: What is the proportion of male students who play online games? (Round answer
to 4 decimal places.)
Q32: What is the proportion of female students who play online games? (Round

answer to 4 decimal places.)

Q33: What is the sample mean of GPA1 for students who play online games? (Round
answer to 4 decimal places.)
Q34: What is the sample mean of GPA1 for students who do not play online games?

(Round answer to 4 decimal places.)

Q35: What is the proportion of online games players who are using Windows
computers? (Round answer to 4 decimal places.)
Q36: What is the proportion of not online games players who are using Windows

computers? (Round answer to 4 decimal places.)
Q37: The sample mean of the GPA1 is 3.1775, after rounding. How big is the chance
that CMU students have a mean GPA equal to 3.1775?
a. Very big
b. 0
c. Cannot be determined!
Q38: According to this sample, do you think that CMU students have a mean GPA
greater than 3.15?
a. Yes
b. No
c. Cannot be determined!
Do you think the difference between each paired question above is large enough to make you
believe that the difference is significant, or the difference is merely due to sampling errors?
To find out the answers, we need to wait until the ACT 3 is done.