Data Science assignment - Australia Assignments

This is a Data Science assignment, It is mostly math related questions. Please show all the calculations for each question and there is no coding required for these questions.

We would like to know if the age of a child is related to the number of cavities he or she has. The data are shown below. If there is a significant relationship, predict the number of cavities for a child of 11.

(20 points)

Age of child x	6	8	9	10	12	14
No. of cavities y	2	1	3	4	6	5

Assume we gathered a random sample of the following dataset, where the independent variable (x) represents the number of hours a student studies, and the dependent variable y represents the exam score of the student. Is there a correlation between the two variables, and if so, how strong this correlation is?

(20 points)

Hours of study(X)	Exam score(Y)
6	40
10	50
18	100
15	80
12	65
16	90

The average age of a vehicle registered in the United States is 8 years, or 96 months. Assume that the standard deviation is 16 months. If a random sample of 36 vehicles is selected, find the probability that the mean of their ages is between 90 and 100 months. (10 points)

Hint: need to use the concept of the normal distribution and z score.

Assume we gathered a random sample of the following dataset. Each column represents weekly sales of two stores. We would like to decide which store (A or B) most likely to predict their weekly sales with more certainty. (20 points)

Store A	Store B
2000	2500
4500	6500
3000	2000
1500	5000
6000	1200
4200	7000

Assume we gather a random sample of the following dataset. We are trying to predict the body fat % of a person based on his/her weight in kg.

(30 points)

Error! Filename not specified.

Find the best fitted line of the given data above.

Find the R-squared value.

Find the F value of the best fitted line.

Why your best fitted line does better in predicting comparing to this line equation:

Y = 0.5x + 3.

Build a Decision Tree Classification based on the following dataset. There are three independent variables (a1, a2, a3) that will help with the prediction, and the ‘Classification’ column is the dependent variable. (40 points)

Error! Filename not specified.

Consider the following confusion matrix:

(10 points)

	Predicted Yes	Predicted No
Actual Yes	95	5
Actual No	5	45

Calculate the sensitivity, precision, and accuracy of the confusion matrix

Define (give the values of) type I and type II errors in the given confusion matrix and explain the difference between the two.