ENG335 Machine Learning Assignment

112 views 10:18 am 0 Comments October 16, 2023

Question 1
Download the Lyft Inc dataset from Kaggle (https://www.kaggle.com/datasets/dermisfit/lyft-inc-dataset). Understand the dataset by performing exploratory analysis. Prepare a new dataset by excluding the “date and day” and “year” attributes. You need to also drop TWO (02) more attributes. If you don’t exclude these two attributes, you will get a perfect/ideal estimator. Design a linear regression model to estimate the bike demand using only FOUR (04) best attributes from the newly constructed dataset. Discuss your results and the relevant metrics. If you include all the features of the new dataset, does that give a better model. Would you use the model that employs all the features for the prediction of the bike demand?
(20 marks)

Question 2
Load the Wrestling World Tournament dataset from Kaggle
(https://www.kaggle.com/datasets/julienjta/wrestling-world-tournament). The
objective is to detect the gender of the wrestler given the other parameters. Perform exploratory data analysis. Analyze and drop the appropriate features and suitably encode the categorical features. Design a simple neural network classifier with ONE (01) hidden layer. Construct the Naïve Bayes classifier for the above problem. Adjust the parameters of the neural network algorithm such that it has the same or better performance than the Naïve Bayes classifier.

Question 3
Download the Sloan Digital Sky Survey DR16 dataset available in Kaggle
(https://www.kaggle.com/datasets/muhakabartay/sloan-digital-sky-survey-dr16). Prepare the dataset by dropping the features [‘objid’, ‘run’, ‘rerun’, ‘camcol’, ‘plate’, ‘field’, ‘mjd’, ‘fiberid’, ‘specobjid’, ‘redshift’] and perform exploratory data analysis. Propose optimal values for the depth and number of trees in the random forest.