SOST30062 Data Science Modelling – Final report Semester 2, 2022/23 Administrative arrangements The final report contributes 60% of your total mark for SOST30062 Data Science Modelling. Please submit your work using the link to the TurnItIn anti-plagiarism service on the course Blackboard. The submission link will be placed on the “Assessment / Assignments, tasks” page of Blackboard, but you will be notified about the exact location before submission opens. You can also find detailed instructions there on how to upload your assignment to TurnItIn. The report is due by 2pm (UK time) on Tuesday, 23th of May, 2023. Your report should not be longer than 2,000 words. (Figure/table captions, bibliography, and appendices do NOT count towards the word limit). You are advised to keep a copy of the work you hand in. Your attention is drawn to the sections regarding late submissions, mitigating circumstances and plagiarism in the Course Outline (available on the “Essential Information / Course outline” page in Blackboard). If you have any questions concerning this assignment, you should email Tanja at: [email protected]. Description of the task We provide you with a large social science dataset, which you will need to analyse in your report. You can download the dataset and additional description from the “Assessment / Assignments, tasks” page of the course Blackboard. Using the dataset, you are asked to 1. selectonevariable,Y,tobeexplainedbytheothervariables,theXs,inthe dataset; 2. motivateandformulatearesearchquestionaboutYandtheXs(wepresent many examples of possible research questions throughout the course); 3. applyanunsupervisedlearningtechnique(lectureweek9)toexplorethe dataset, such as PCA, clustering techniques (Method I – exploration); 1 4. chooseasupervisedlearningtechnique(lectureweeks3-5and8)thatis appropriate for Y and answers your research question, such as regression models, splines, LDA, trees-based methods (Method II – inference); 5. applythemethodselectedinthelaststepinasuitableadvancedanalytic approach (lecture weeks 6-7), such as subset selection, ridge regression, cross-validation, bootstrap, LASSO (Method III – the “twist”); 6. writeareportontheabovesteps,producingnomorethanonedescriptive figure or table and one inferential figure or table that sum up your results. Overview of the task Structure of the report We suggest the following section structure for your report (you may choose to structure your report differently): 1. Introduction:brieflydescribetheempiricalcontextandpresentthedataset 2. Researchquestion:motivateandformulateyourresearchquestion 3. Methods:brieflypresentyourmethodchoices(MethodsI-III)andthestepsof your analysis 4. Results:presentandinterpretyourresults,withtwokeytables/figures 5. Conclusion:brieflydiscusswhattheresultsimplyforyourresearchquestion, discuss one or two key limitations of your analysis 6. Appendices:addanyimportantadditionalfigures/tables,includeRcode(so that your analyses can be reproduced) 2 Marking criteria 20% – Data presentation and research question (sections 1-2 from the above structure) • – Do you present the context of the data (e.g. topic) and the dataset (e.g. types of variables, number of observations you use) clearly? • – Do you formulate your research question clearly? • – Do you explain why you think the research question is interesting and relevant? • – Can you answer your research question using the available variables? 20% – Method choices (section 3) 20% – Data analysis (section 4, appendix R code) 20% – Interpretation of results (sections 4-5) 20% – Visual presentation of results (section 4) • – Are the chosen methods appropriate for your Y and X variables? • – Do you motivate their use (explain why they are appropriate for Y and the Xs)? • – Do you explain the steps of your analysis clearly? • – Can you answer your research question with your planned analysis? • – Is your application of the methods to the data technically correct (e.g. are the variables transformed if needed and used in the correct role in the analysis)? • – Do you use the R packages and functions most appropriate for your analysis? • – Do you provide a reproducible R code in the appendix? • – Are the interpretations of your results correct and clearly explained? • – Do you discuss what the results imply for your research question? • – Is your conclusion about your question correct in light of the results? • – Do you discuss one or two important limitations of your approach in answering the research question? • – Are the tables and figures readable and clear (e.g. they are clearly annotated, do not have overlapping labels)? • – Are they appropriate representations of your findings? • – Do you interpret them correctly in the text? 3 Additional notes on marking • will value original and well-motivated research questions. • will also reward thoughtful ideas about the limitations of your analysis (and how one could possibly overcome them). Though your R codes will not be assessed, we appreciate it if you provide a clear and understandable code in the appendix. • the marking weight of your two main figures/tables is quite high (20%), we will evaluate these critically. • will provide you with good examples and further guidelines for the different elements of the report throughout the semester (for example, about writing clean R codes and making readable figures).
Tags: assignmentexpert, assignmenthelp, assignmenthelpaustralia, assignmenthelper, assignmenthelpuk, assignmenthelpusa, plagiarismfreework