Module 6 Homework Assignment.
The data for this assignment comes from the 2014-15 NBA regular season. The data contained variables for each shot that LeBron James took during the season. The variables can be defined as:
Shot Result – shot was made or missed
Location – game was played at home or away
Shot Number – number of the observed shot based on the order in which each shot was taken during a game
Period – quarter the shot was taken in
Shot Clock – number of seconds remaining on the shot clock when the shot was taken (24 second shot clock per possession)
Dribbles – number of dribbles by the player before the shot was taken
Touch Time – number of seconds the player held the ball before shooting
Shot Dist – distance in feet between the basket and where the player shot the ball
Pts Type – whether the shot was a 2 point or 3 point attempt
Close Def Dist – the distance in feet that the closest defender was from the player when the shot was taken
Some of these variables are necessarily correlated. For example, Dribbles and Touch Time are correlated because the more dribbles a player takes before shooting the ball will lead to a longer touch time. Also, Shot Dist and Pts Type should be highly correlated because 3 point attempts are always shot from behind the 3 point line, so they are farther away than 2 point attempts. For the assignment, please do the following:
Create dummy variables to recode “Location” and “W”
Display the correlation matrix and comment about which variables have high correlations
Conduct a factor analysis of the numeric and dummy variables to confirm the correlation of these variables. Find the appropriate number of factors and give them names to describe each of them. Save the factors into your dataset.
Use those factors to build a logistic regression model that predicts Shot Result. Write out the final logistic regression model, interpret at least one odds ratio, and find the sensitivity/specificity using the ROC curve.