Assignment #2 Due date: February 28, 2023 Instructions: You must provide your own unique solution. You may work with others, but each of you is responsible for submitting your own problem set solution. Question values are listed for each question. Submit solution through SafeAssign. Ideally you will submit your RMarkdown file output, preferably in pdf format, but word or html are acceptable. Blackboard won’t accept html files, so if submitting an html file, first zip it and submit the zipped version. For this assignment you will use three labour force surveys, from June 1977, June 1997 and June 2022. They are amalgamated and saved for you in the datafile Ifs3.rds, in the Bb data folder. All variables required will be referenced below. 1. Data cleaning and preparation. [5 marks] We need to create two variables, and adjust wages: a. Create a variable capturing part-time vs. full-time work status. The datafile contains a variable fiptmain. Recode this variable from its current four categories into two. Code it as a 1 for part-time and O for full-time. To confirm you have coded it correctly, generate a 2×2 table of the original variable and the new variable. b. Create a numeric version of age. The datafile contains an age variable age_12, coded as a factor variable with 12 levels. Create a numeric version of age_I2 using the as_numeric() function. Note that the variable age_12 consists of five-year age group- ings from 15-19 through 65-69, and then there is a catch-all category for 70 and over. Drop the category for 70 and over so that the numeric variable captures five-year age groupings. By converting a factor into a numeric with equal numeric spacing, it is a true linear representation of age, and we can use polynomial functions of age. To con- firm you have it correctly coded, you could generate a basic 2×2 table, but that will be large. Instead report the diagonal elements (function diag()) of the 2×2 table of the original age variable and the numeric variable of age. If you have done it correctly, all off-diagonal elements of this table will be zeroes, so the diagonal element will show the correct number of observations. c. Adjust wages for inflation by multiplying the wage variable Arlyearn by the ratio of the CPIs from June 2022 and June 1997, a ratio of 152.9/90.5. Multiplying the wages in 1997 by this ratio converts them into the same base year as for the 2022 data. Report the mean, median, maximum and minimum wages for 1997 (adjusted) and for 2022. Note: wages were not collected until 1997, so all earlier labour force surveys, including 1977, do not include wages. 2. Basic data descriptions of datafile Ifs3.rds [10 marks] Provide tables of counts of the following, and provide a brief paragraph explaining what you find for: oo op education (educ4) by sex by year industry (ind) by sex by year part-time/full-time status by sex and year wage rate (hrlyearn) by sex by year wage rate by age (original or numeric version) by year 3. Basic model of wage. [15 marks] Start with the following model: wage = f (age, education, sex, part-time status, year) making sure to use the numeric version of age generated in 1.b. above. Use educ4 for edu- cation, and part-time status is a variable generated in question 1. a. b. Run a regression on the basic wage model above. Report your regression results using the command stargazer(). Fully explain your regression results. Run the basic regression above again, but run the regression separately by year for both years 1997 and 2022 (which means you will drop the variable year from each regres- sion. Report both regression results using stargazer(). Fully explain your regression results. Compare results you get when running separately by year with those from the previous section where you constrained the coefficient estimates to be the same across both years. 4. Wage variation by age—modeling age as a polynomial term. [20 marks] a. Run the following regression: wage = f (age, age?, education, sex, part-timestatus, year) report and discuss regression results. Compare the fit of the linear model (3.a) and the quadratic model (4.a) by comparing the residual plots. Do you see any differences? . Add a cubic term for age to the regression above (4.a), run, report, discuss, and note any differences from the quadratic model. Add a quartic (fourth-order) term for age to the regression above (4.c), run, report, discuss, and note any differences from the quadratic or cubic models Rerun the above two regressions with cubic and quartic age terms (4.c and 4.d), but now use a scaled version of the numeric age variable. Create the new scaled age variable, ages, defined as Ra Report results and comment, and compare to unscaled estimates. Estimate the basic regression from question 3.a but use a fractional polynomial model on age. Compare regression results to the quadratic, cubic and quartic models (4.a, 4.c, and 4.d). To put all the models to a more practical test, generate four sets of predicted wages (emmeans) by age using quadratic, cubic, quartic and fractional polynomial models of age. Comment on results, and comment on how much explanation is provided by adding in the higher-order polynomial terms. 2 5. Allow for wage variation by age. [15 marks] Estimate the age-quadratic model of wages from 4.a above, but interact age (both age and age?) with year to see if wage response by age differs over the two periods. a. Report regression results and provide a brief interpretation. Compare these results to those of the quadratic model in 4.a. b. Generate predicted wages (emmeans) for each age category for both years. c. Summarize your findings. 6. Allow for wage variation by sex and education. [20 marks] a. Modify the age-quadratic model in Question 4.a by interacting sex and education i. Generate predicted wages (emmeans) for each level of education by sex and plot. ii. Test the differences in predicted wages (emmeans) using contrast() by A. education B. by sex iii. Summarize results of part a. b. Modify the model in part a. above to allow for a three-way interaction among sex, education and year. i. Generate predicted wages varying all three variables. Note this will generate 2 x 4 x 2 = 16 predicted values. Present your results and explain what you see. ii. Use contrast() to find the changes in wages from 1997 to 2022 for for females and males by level of education. This section has a lot of moving parts, so plan ahead and test methods before deciding how to present the effects. iii. Summarize your findings in this section. 7. Test for industry wage differentials. [15 marks] a. Add the industry categorical variable (ind) to the age-quadratic model of part 4.a. i. Report regression results and discuss. ii. Generate predicted wages by industry. Plot results. iii. Use contrast() to identify the industry wage-differentials. iv. Summarize findings. b. Modify the model in part 7.a above by interacting sex and ind. Run separate regressions by year. You will need to create two subsets of your dataframe, one for 1997 and one for 2022. i. Report regression results and discuss. ii. Use contrast() to identify the industry sex wage-differentials. Do this separately by year. iii. Summarize findings.