MARKETING ANALYTICS MCS 3500 DUE ON FEBRUARY 27TH BY MID–NIGHT
Assignment 2 Winter 2023 100
Data: boston housing.csv – use only 300 random samples from about 500 house prices.
For all three PARTS, use log(medv) as outcome variable (see data description)
Part 1: Multiple Linear Regression (50)
A. Identify most important variables using best subset algorithm to predict house prices using three model
selection criteria (RSS = residual sum of square; adjr^2 = adjusted R^2 and BIC =Bayesian information
criteria). 20
B. Provide descriptive statistics for identified important variables in Part 1 A. 10
C. Run multiple linear regression model with these identified variables (in Part 1 A). Write regression equation
and interpret any five regression coefficients. 20
Part 2: Multiple Linear Regression with Quadratic Effects (30)
D. Run multiple regression analysis with the variables identified in Part 1 with adding quadratic effects of
average numbers of rooms (rm) and provide average marginal effects (ame) with comments. 10
E. Plot and interpret prediction and average marginal effects of average number of rooms (rm) 20
Part 3: Multivariate Adaptive Splines (MARS) (20)
F. Run MARS using all variables in the housing data to investigate non-linear response patterns. Comments
on your findings that show non-linear patterns (if exist). 15
G. Are the identified variables using MARS the same or different from Part 1 A? Please comment,
if any. 5
Format: Generate program output as a word document (using knit function in Rmarkdown) and add your
additional responses in the same document. Your assignment should include codes, output, plots that are relevant
for the assignment questions. Drop non-relevant output as required.
Data Description
The Boston Housing Dataset
The Boston Housing Dataset is a derived from information collected by the U.S. Census Service concerning housing in
the area of Boston MA. The following describes the dataset columns:
• CRIM – per capita crime rate by town
• ZN – proportion of residential land zoned for lots over 25,000 sq.ft.
• INDUS – proportion of non-retail business acres per town.
• CHAS – Charles River dummy variable (1 if tract bounds river; 0 otherwise)
• NOX – nitric oxides concentration (parts per 10 million)
• RM – average number of rooms per dwelling
• AGE – proportion of owner-occupied units built prior to 1940
• DIS – weighted distances to five Boston employment centres
• RAD – index of accessibility to radial highways
• TAX – full-value property-tax rate per $10,000
• PTRATIO – pupil-teacher ratio by town
• B – 1000(Bk – 0.63)^2 where Bk is the proportion of blacks by town
• LSTAT – % lower status of the population
• MEDV – Median value of owner-occupied homes in $1000’s