Problem Set - Australia Assignments

Name: Email ID: @psu.edu
Worked with these other students:
ECON306 Problem Set 3
INSTRUCTIONS: Solve the following questions to the best of your ability. Ask me if
you do not know how to solve any of these questions before the due date. I will work with
you if you are having trouble solving these.
To receive full credit for this assignment, the problem set needs to be submitted to Canvas
in a single PDF document containing your 1) Stata log file in a .pdf file and 2) any
written explanations and answers. All of these components need to be attached together in
that order. Late submissions will NOT be accepted. DO NOT email! No assignments will be
accepted via email.
ECON 306 Problem Set 3 Page 2
First of all, for this problem set, you will have to submit the Stata log file. Stata can
record your session into a file called a log file but does not start a log automatically; you must
tell Stata to record your session. By default, the resulting log file contains what you type
and what Stata produces in response, recorded in a format called Stata Markup and Control
Language (SMCL). The file can be printed or converted to plain text for incorporation
into documents you create with your word processor. You can find more information here:
https://www.stata.com/manuals13/u15.pdf.
So, in the beginning of your Stata .do file write the following command: log using PSX,
replace (or a different file name). Then, at the very end of your .do file, include log close and
then on a new line translate PSX.smcl PSX.pdf. This would translate your Stata SMCL log
files directly into PDF files and then use Adobe Acrobat to merge PDF files together. You
will need to turn in this log file to receive full credit for this assignment.
Please compile the log file in Stata after you have completed all of your code and
can run it smoothly without any errors. In that way, your log file would not contain
any lines of code that do not produce any results or any duplicate results. Please do your
best to include comments in your code (using the ∗ sign in your Stata .do file) and to make
the solutions to the different problems as clearly marked as possible. Otherwise, the graders
might have to penalize you, if they cannot follow your work. And then I will have to re-grade
your work and the whole process becomes highly inefficient.
More Fun with Earnings and Height
Using the same data as the EarningsHeight.dta from problem set 2, we will perform multiple
regression with dummy variables included. We will also explore 2 cases of omitted variable
bias.
1 Actual OVB
We have seen previously that taller peole earn more money. One potential explanation for this
is that there is an omitted variable! What if height is correlated with an omitted factor that
affects earnings? Perhaps cognitive ability is the omitted factor. How, you ask? Imagine that
poor nutrition and harmful environmental factors in utero and in early childhood negatively
affect both cognitive and physical development. Cognitive ability affects earnings later in
life but is not captured in the model.
a) Would the coefficient on height from earningsi = β0 + β1heighti + ui be biased upward
or downward? Why?
b) While we do not have a direct measure of cognitive ability in the data set, we do have a
variable that measures education (educ) and that is likely correlated with cognitive ability
because in our society, people with higher cognitive ability are more likely to attend school
longer. If education perfectly captures cognitive ability, then this would eliminate the
omitted variable bias. If education is highly correlated with cognitive ability, then this
will reduce the size of the bias from the omitted cognitive ability. Run the regression in
part a) then comment on how the results change when you add educ to the specification.
ECON 306 Problem Set 3 Page 3
2 Region Dummies
Generate dummy variables for region. There are a few ways you can do this in Stata. The
most obvious, brute force, way is to do something like:
generate northeast = 0
replace northeast = 1 if region==1
…and so on for all regions
Another option which works better when 1) there are a lot of categorizations and 2)
they’re labeled from the outset is to let Stata automatically generate all the dummies for
you. Try:
tabulate region, generate(dummy region)
You’ll notice that the numbers 1-4 are appended to the variable name. You will want to
rename these variables for clarity. For example:
rename dummy region1 northeast
and so on for all the regions.
3 Regression with Region Dummies
Include the dummy variables for region into the specification from part 1b), using South as
the reference group.
4 Comparing Regions
a) In which region of the country are earnings the lowest? Explain.
b) How much higher earnings can someone expect to make in the Northeast compared to
the Midwest? Show your work.
5 Lots of Variables
Create dummy variables for race. Also create a dummy variable called male” then run
a regression with dummies for region, race, and sex. This time, using Northeast as the
reference group for region, other as the reference group for race, and female as the reference
group for sex.
ECON 306 Problem Set 3 Page 4
6 Artificial OVB
Non-hispanic black earners are more likely to be found in the South region. Comment on
the bias this would cause if you did not have race data to include in the model.
7 Differential Effect of Education for Males
Beginning with the regression in part 5, check if an additional year of education has a different
effect on earnings for males.
8 F test in Stata
You should have 3 coefficients that are not statisticaly significant. Perform an F test in Stata
where all 3 of those coefficients are set to 0. Interpret your results.
9 F test by hand
Perform the F-test by hand. There are a couple different ways to perform the calculation of
your F-statistic but you only need to do it one way.
10 Save Data
Save these data separately from the EarningsHeight.dta that you used for Problem Set 2.
Name it EarningsHeight PS3.dta.