Nonlinear econometrics for finance - Australia Assignments

Nonlinear econometrics for finance
HOMEWORK 1
(Review of linear econometrics and review of
methods)
Problem 1 (Linear econometrics). (60 points) Household finance is a
growing field in finance. Rising health costs are not just impacting households’ finances, they are affecting an array of decisions, including the decision
to change (or retire from) an occupation which provides favorable health insurance subsidies. For a cross section of individuals, the file insurance.csv”
provides the following information:
• age: age of primary beneficiary of health insurance
• sex: gender of primary beneficiary of health insurance
• bmi: this is a measure of a person’s weight relative to height. It is
defined as bmi = kg=m2, where kg is the person’s weight and m2 is the
person’s height measured in squared meters. A bmi between 18.5 and
24.9 is considered healthy. More would be considered overweight”.
• children: number of children covered by health insurance
• smoker: whether the primary beneficiary is a smoker or not
• region: the primary beneficiary’s residential area in the US (northeast,
southeast, northwest, southwest)
• charges: medical costs billed to health insurance.
Given this information, you need to perform linear regression in Python to
understand the drivers of medical costs.
1
(1) (3 points) Generate an histogram of the medical costs and compute descriptive statistics (mean, median, standard deviation, minimum, maximum). Is the distribution symmetric? Why or why not, in your view?
(2) (3 points) Take a logarithmic transformation of the medical costs. Plot
the histogram of the log-costs. What do you notice now? How would
you explain the change?
Begin by excluding all categorical variables (sex, smoker and region).
(3) (4 points) Run a regression of the log-costs on the non-categorical explanatory variables:
log(costi) = θ0 + θ1agei + θ2bmii + θ3childreni + “i;
where “i is an error term.
(4) (3 points) Give an economic interpretation of the estimated coefficients
in the regression above. What does the model say about the determinants of medical costs?
(5) (4 points) We want to test whether the coefficient θ2 for bmi is statistically significant. Test the hypothesis using the relevant test statistic.
Does bmi have more or less explanatory power than age?
(6) (3 points) We want to test whether the coefficient θ2 for bmi is statistically significant. Test the hypothesis using the relevant p-value.
(7) (5 points) Test the single linear restriction θ1 = 3θ2 using the relevant
test statistic.
(8) (3 points) Test the single linear restriction θ1 = 3θ2 using the relevant
p-value.
(9) (5 points) Test the multiple linear restriction θ1 = 0:04 and θ2 = 0
using the relevant test statistic.
(10) (3 points) Test the multiple linear restriction θ1 = 0:04 and θ2 = 0
using the relevant p-value.
2
(11) (4 points) Using the estimated model, predict medical costs for a 50
year-old person with bmi = 36 and 4 children. Is the prediction lower or
higher than the mean of the distribution of the medical costs? (Recall
that the regression gives you a prediction for the log of the medical
costs (say, log(y)) not for the medical costs (say, y). Hence, after you
find the prediction for the log of the medical costs, you need to make
a transformation to find a prediction for the medical costs themselves.
Hint: if log(y) is normal, y is lognormal. What is E(y) for a log normal
random variable?)
Now, take the categorical variables into account using dummy variables
(https://en.wikipedia.org/wiki/Dummy_variable_(statistics)).
(12) (3 points) How much more (or less) do males spend relative to females
(controlling for all other variables)?
(13) (3 points) How much more (or less) do smokers spend relative to non
smokers (controlling for all other variables)?
(14) (3 points) In which region are medical costs higher (controlling for all
other variables)?
(15) (3 points) What is the difference in medical costs between the northeast
and the southwest (controlling for all other variables)?
(16) (4 points) Are the coefficients associated with the dummies individually
statistically significant?
(17) (4 points) Using your model, predict medical costs for a 50 year-old
male smoker with bmi = 36 who lives in the southwest and has 4
children.
Problem 2 (Review of methods). (40 points) Assume an iid sample
fx1; x2; :::; xT g from some distribution with expected value µ and variance σ2.
A natural estimator for the true variance (i.e., σ2) of the random variable
which generates the data is the sample variance, namely s2 x = T1 PT t=1(xt –
X)2, where X defines the sample mean, i.e., X = T1 PT t=1 xt.
First, let us focus on the finite-T (or finite-sample) properties of s2 x:
3
(1) (6 points) Show that the sample variance s2 x is biased for the true
variance σ2.
(2) (3 points) How would you correct the bias?
(3) (3 points) What is the bias of the infeasible variance estimator s2 x;inf =
1T
PT t=1(xt – µ)2. Why am I calling this estimator infeasible?
Now, let us turn to the large-T (or infinite-sample or asymptotic) properties
of s2
x. Write the following:
s2
x =
1 T
TX t
=1
(xt – X)2
=
1 T
TX t
=1
((xt – µ) – (X – µ))2
=
1 T
TX t
=1
(xt – µ)2
| {z }
(a)
– 2(X – µ) 1
T
TX t
=1
(xt – µ)
| {z }
(b)
+ (X – µ)2
| {z }
(c)
(1)
Now, subtract σ2 from the left-hand side and from the right-hand side of Eq.
(1) and standardize by pT to obtain:
pT (s2 x – σ2) = PT t=1((xtp–Tµ)2 – σ2)
| {z }
(a∗)
– 2(X – µ)p1T XT
t=1
(xt – µ)
| {z }
(b∗)
+pT (X – µ)2
| {z }
(c∗)
(2)
(4) (6 points) Show that s2 x is consistent for σ2 by applying the LLN to
(a), (b) and (c) in Eq (1).
(5) (6 points) Show that pT (s2 x – σ2) is asymptotically normal by applying the LLN, the CLT and Slutsky’s theorem to (a∗), (b∗) and (c∗)
in Eq. (2).
Notice that consistency is a statement about sample averages, like s2 x, converging (as T ! 1) to expected values. Asymptotic normality is a statement
about demeaned (by σ2, in our example) and standardized (by pT , in
our example) sample averages, like pT (s2 x – σ2), converging (as T ! 1) to
a mean-zero normal distribution.
4
(6) (8 points) Use my sample Python codes from Lecture 1 to write a code
which shows consistency of s2 x. You should draw your observations from
a random variable which is neither exponential nor normal.
(7) (8 points) Use my sample Python codes from Lecture 1 to write a code
which shows asymptotic normality of pT (s2 x – σ2). You should draw
your observations from a random variable which is neither exponential
nor normal.
5