Technological advances in science, business, and engineering have led to an explosion of data in every aspect of our lives. Understanding the information provided in this data is crucial as it enables informed decision-making in many fields including market intelligence and science. DATA2002 is an intermediate unit of statistics and data science that focuses on learning data analysis skills for a variety of problems and data. In this unit you will learn how to ingest, combine and summarize data from a variety of data models commonly encountered in data science projects, and enhance your programming skills with experience with statistical programming languages. You’ll also be exposed to the concepts of statistical machine learning and develop the skills to analyze various types of data to answer scientific questions. From this unit you will develop the knowledge and skills to enable you to take on the data analysis challenges that arise from everyday problems.
Three Learning Outcomes
- LO1 . Formulate domain/context-specific questions and determine appropriate statistical analysis
- LO2 . Extract and combine data from multiple data sources
- LO3 . Construct, interpret and compare numerical and graphical summaries of different data types, including large and/or complex data sets
- LO4 . Familiar with the use of software version control system
- LO5 . Identify, justify, and implement appropriate parametric or nonparametric statistical tests
- LO6 . Develop, evaluate, and interpret appropriate linear models to describe relationships among multiple factors
- LO7 . Perform statistical machine learning with a given classifier and create a cross-validation scheme to calculate predictive accuracy
- LO8 . Create reproducible reports to communicate results using programming languages