Diploma in Data Science

114 views 8:35 am 0 Comments April 24, 2023

QUALIFI ASSESSMENT DOCUMENT

Qualification

Qualifi Level 7 Diploma in Data Science

Qualification No (RQF) Unit Name

Unit Reference

No of Credits

J/618/4970

Exploratory Data Analysis

DS01

20 Credits

Introduction

Prior to attempting this coursework assignment, learners must familiarise themselves with the following policies:

Centre Specification

o Can be found at https://qualifi.net/qualifi-level-7-diploma-data-science/

Qualifi Quality Assurance Standards

Qualifi Quality Policy Statement

Plagiarism and Collusion

In submitting the assignment Learner’s must complete a statement of authenticity confirming that the work submitted for all tasks is their own. The statement should also include the word count.

Your accredited study centre will direct you to the appropriate software that checks the level of similarity. Qualifi recommends the use of https://www.turnitin.com as a part of the assessment.

Plagiarism and collusion are treated very seriously. Plagiarism involves presenting work, excerpts, ideas or passages of another author without appropriate referencing and attribution.

Collusion occurs when two or more learners submit work which is so alike in ideas, content, wording and/or structure that the similarity goes beyond what might have been mere coincidence

Please familiarise yourself on Qualifi’s Malpractice and Maladministration policy, where you can find further information

Referencing

A professional approach to work is expected from all learners. Learners must therefore identify and acknowledge ALL sources/methodologies/applications used.

The learner must use an appropriate referencing system to achieve this. Marks are not awarded for the use of English; however, the learner must express ideas clearly and ensure that appropriate terminology is used to convey accuracy in meaning.

Qualifi recommends using Harvard Style of Referencing throughout your work.

Appendices

You may include appendices to support your work, however appendices must only contain additional supporting information, and must be clearly referenced in your assignment.

You may also include tables, graphs, diagrams, Gantt chart and flowcharts that support the main report should be incorporated into the back of the assignment report that is submitted.

Any published secondary information such as annual reports and company literature, should be referenced in the main text of the assignment, in accordance of Harvard Style Referencing, and referenced at the end of the assignment.

Confidentiality

Where a Learner is using organisational information that deals with sensitive material or issues, they must seek the advice and permission from that organisation about its inclusion.

Where confidentiality is an issue, Learners are advised to anonymise their assignment report so that it cannot be attributed to that particular organisation.

Word Count Policy

Learners must comply with the required word count, within a margin of +10%. These rules exclude the index, headings, tables, images, footnotes, appendices and information contained within references and bibliographies.

When an assessment task requires learners to produce presentation slides with supporting notes, the word count applies to the supporting notes only.

Submission of Assignments

All work to be submitted on the due date as per Centre’s advice.

All work must be submitted in a single electronic document (.doc file), or via Turnitin, where applicable.

This should go to the tutor and Centre Manager/Programme Director, plus one hard copy posted to the Centre Manager (if required)

Marking and grades

Qualifi uses a standard marking rubric for all assignments, and you can find the details at the end of this document.

Unless stated elsewhere, Learners must answer all questions in this document.

Assignment Question

Task 1.1

Handle and manage multiple datasets within R and Python environments.

1.1 Work smoothly in R and Python development environments.

1.2 Import and export data sets and create data frames within R and Python in accordance with instructions.

1.3 Sort, merge, aggregate and append data sets in accordance with instructions.

Assessment Criteria

Learning UI about basics rule of programming in both R and Python

Create and import external datasets in R and python

Export R data frames into external flat files

Data Management in R and Python (Sort, merge, aggregate and subset)

Task 1.2

Use measures of central tendency to summarize data and assess both the symmetry and variation in the data.

1.1 Differentiate between variable types and measurement scales.

1.2 Calculate the most appropriate (mean, median or mode etc.) measure of central tendency based on variable type.

1.3 Compare variation in two datasets using the coefficient of variation.

1.4 Assess symmetry of data using measures of skewness

Assessment Criteria

Introduction to basic concepts of Statistics, such as measures of central tendency, variation, skewness, kurtosis

Task 1.3

Present and summarise distributions of data and the relationships between variables graphically.

1.1 Select the most appropriate graph to present the data.

1.2 Assess distribution using Box-Plot and Histogram.

1.3 Visualize bivariate relationships using scatter-plots.

1.4 Present time-series data using motion charts.

Assessment Criteria

Frequency tables crosstabs and bivariate correlation analysis

Data visualization: what and why? Grammar of graphics, handling data for visualization

Commonly used charts and graphs using ggplot2 package in R and matplotlib in python

Advanced graphics in R and Python Data Management in R and Python (Sort, merge, aggregate and subset)

Data Management in R and Python (Sort, merge, aggregate and subset)

Task 1.4

Evaluate standard discrete and standard continuous distributions.

Analyse the statistical distribution of a discrete random variable.

Calculate probabilities using R for Binomial and Poisson Distribution.

Fit Binomial and Poisson distributions to observed data.

Evaluate the properties of Normal and Log Normal distributions.

Calculate probabilities using R for normal and Log normal distributions.

Fit normal, Log normal and exponential distributions to observed data.

1.7 Evaluate the concept of sampling distribution (t, F and Chi Square).

Assessment Criteria

Concept of random variables and statistical distribution

Discrete vs. Continuous Random Variables

t tests (one sample, independent samples, paired sample)

Standard discrete distributions-Bernoulli, Binomial and Poisson

Using R to calculate probabilities

Fitting of discrete distributions to observed data

Standard continuous distributions-Normal, Log Normal, Exponential

Introduction to sampling distributions

Task 1.5

Formulate research hypotheses and perform hypothesis testing.

1.1 Write R and Python programmes that evaluate appropriate hypothesis tests

1.2 Draw statistical inference using output in R.

1.3 Translate research problems into statistical hypotheses.

 

1.4 Assess the most appropriate statistical test for a hypothesis

Assessment Criteria

Statistical Hypothesis Testing-concepts and terminology

Parameter, test statistics, level of significance, power, critical region

Parametric vs. non-Parametric Tests

Z tests for proportions (single and independent samples)

Non-parametric tests (Mann-Whitney U, Wilcoxon’s signed rank)

Tests for Normality, Q-Q plot

Task 2.1

Analyse the concept of variance (ANOVA) and an select an appropriate ANOVA or ANCOVA model.

2.1 Define variable, factor and level for a given research problem.

2.2 Evaluate the sources of variation, explained variation and unexplained variation.

2.3 Define a linear model for ANOVA/ANCOVA.

2.4 Confirm the validity of assumption based on definitions and analysis of variation.

2.5 Perform analysis using R and Python programs to confirm validity of assumptions.

2.6 Draw inferences from statistical analysis of the research problem.

Assessment Criteria

What is analysis of variance?

Definitions: Variable, factor, levels

One Way Analysis of Variance

Two Way Analysis of Variance (including interaction effects)

Multi Way Analysis of Variance

Analysis of Covariance

Kruskal-Wallis Test

Friedman Test

Task 2.2

Carry out global and individual testing of parameters used in defining predictive models.

2.1 Evaluate dependent variables and predictors.

2.2 Develop linear models using the lm function in R and the .ols function in Python.

2.3 Interpret signs and values of estimated regression coefficients.

2.4 Interpret output of global testing using F distributions.

2.5 Identify significant and insignificant variables.

Assessment Criteria

Concept of random variables and statistical distribution

Concept of a statistical model

Estimation of model parameters using Least Square Method

Interpreting regression coefficients

Assessing the goodness of fit of a model

Global hypothesis testing using F distribution

Individual testing using t distributions

Task 2.3

Validate assumptions in multiple linear regression.

2.1 Resolve multicollinearity problems.

2.2 Revise a model after resolving the problem.

2.3 Assess the performance of the ridge regression model.

2.4 Perform residual analysis – graphically & using statistical tests to analyse results.

2.5 Resolve problems of non-normality of errors and heteroscedasticity.

Assessment Criteria

Concept of Multicollinearity

Calculating Variance Inflation Factors

Resolving problem by dropping variables

Ridge regression method

Stepwise regression as a strategy

Residual analysis

Shapiro Wilk test, K-S test and Q-Q plot for residuals

White’s test and Breusch-Pagan Test

Partitioning data using the caret package

Task 2.4

Validate models via data partitioning, out of sample testing and cross-validation.

2.1 Develop models and implement them on testing data in accordance with the specification.

2.2 Evaluate the stability of the models using k-fold cross validation.

2.3 Evaluate influential observations using Cook’s distance and hat matrix.

Assessment Criteria

Model development on training data

Model validation on testing data using R squared and RMSE

Concept of k-fold cross validation

Performing k-fold cross validation using the caret package

Identifying influential observations

Task 2.5

Develop models using binary logistic regression and assess their performance.

Evaluate when to use Binary Linear Regression correctly.

2.2 Develop realistic models using functions in R and Python.

2.3 Interpret output of global testing using Linear Regression Testing in order to assess the results.

2.4 Perform out of sample validation that tests predictive quality of the model.

Assessment Criteria

Model definition and parameter estimation

Estimation of model parameters using MLE

Interpreting regression coefficients and odds ratio

Assessing goodness of fit of the model

Global hypothesis testing using LRT distribution

Individual testing using Wald’s test

Task 3.1

Develop applications of multinomial logistic regression and ordinal logistic regression.

3.1 Select method for modelling categorical variable.

3.2 Develop models for nominal and ordinal scaled dependent variable in R and Python correctly.

Assessment Criteria

Classification table

ROC curve

K-S Statistic

Multinomial and Ordinal Logistic Regression – model building and parameter estimation

Interpretation of regression coefficients

Classification table and deviance test

Task 3.2

Develop generalised linear models and carry out survival analysis and Cox regression.

3.1 Evaluate the concept of generalised linear models.

3.2 Apply the Poisson regression model and negative binomial regression to count data correctly.

3.3. Model ‘time to event’ variable using cox regression.

Assessment Criteria

Concept of GLM and link function and .GLM

Poisson Regression

Negative Binomial Regression

Survival Analysis Introduction

Cox Regression

Task 3.3

Assess the concepts and uses of time series analysis and test for stationarity in time series data.

3.1 Create time series object in R and Python correctly including decomposing time series and assessing different components.

Assess whether a time series is stationary.

Transform non-stationary time series data into stationary time series data.

Assessment Criteria

Components of time series

Seasonal decomposition

Trend analysis

Auto-correlogram

Partial auto-correlogram

Dickey-Fuller test

Converting non-stationary time series data into stationary time series data

Task 3.4

Validate ARIMA (Auto Regressive Integrated Moving Average) models and use estimation.

3.1 Identify p, d and q of ARIMA model using ACF (auto-correlation function) and a PACF (partial auto-correlation function) to describe how well values are related.

3.2 Develop ARIMA models using R and python and evaluate whether errors follow the white noise process.

3.3 Finalize the model and forecast n-period ahead to make accurate predictions.

Assessment Criteria

Concepts of AR, MA and ARIMA models

Model identification using ACF and PACF

Parameter estimation

Residual analysis (testing for white noise process)

Selection of optimal model

Task 3.5

Implement panel data regression methods.

3.1 Evaluate the concept of panel data regression.

3.2 Analyse the features of panel data.

3.3 Build panel data regression models in a range of contexts.

3.4 Evaluate the difference between fixed effect and random effect models.

Assessment Criteria

What is Panel data?

Need for different models for Panel data

Panel data regression methods

What is Panel data?

Need for different models for Panel data

Panel data regression methods

Task 4.1

Define Principal Component Analysis (PCA) and its derivations and assess their application.

4.1 Evaluate the need for data reduction.

4.2 Perform principal component analysis and develop scoring models using R and python to minimise data loss and improve interpretability of data.

4.3 Resolve multi-collinearity using Principal Component Regression.

Assessment Criteria

Concept of Data reduction

Definition of first, second, … ph principal component

Deriving principal component using Eigenvectors

Deciding optimum number of principal components

Developing scoring models using PCA

Principal component regression

Task 4.2

Understand hierarchical and non-hierarchical cluster analysis and assess their outputs.

4.1 Perform data reduction and derive interpretable factors and use factor scores to interpret the data set.

4.2 Obtain a brand perception map using multi-dimensional scaling.

Assessment Criteria

Orthogonal factor model

Estimation of loading matrix

Interpreting factor solution

Deciding optimum number of factors

Using factor scores for further analysis

Factor rotation

Concept of MDS

Variable reduction using MDS

Task 4.3

Evaluate the concept of panel data regression and implement panel data methods.

4.1 Evaluate the need for cluster analysis.

4.2 Obtain clusters using suitable methods.

4.3 Interpret cluster solutions and analyse the use of clusters for business strategies.

Assessment Criteria

Concept of cluster analysis

Hierarchical cluster analysis methods (linkage methods)

Using dendrogram to estimate optimum number of clusters

k-means clustering methods

Using k-means runs function in R and Python to find optimum number of k

Task 4.4

Appraise classification methods including Naïve Bayes and the support vector machine algorithm.

4.1 Evaluate different methods of classification and the performance of classifiers.

4.2 Design optimum classification rules to achieve minimum error rates.

Assessment Criteria

Bayes theorem and its applications

Constructing classifier using Naïve Bayes method

Concept of Hyperlane

Support vector machine algorithm

Comparison with Binary Logistic Regression

Task 4.5

Apply decision tree and random forest algorithms to classification and regression problems.

4.1 Use decision trees for classification and regression problems in comparison with classical methodologies.

4.2 Analyse concepts of bootstrapping and bagging.

4.3 Apply the random forest method in a range of business and social contexts .

Assessment Criteria

Basics of Decision Tree

Concept of CART

CHAID algorithm

ctree function in R

Bootstrapping and bagging

Random forest algorithm

Task 5.1

Analyse Market Baskets and apply neural networks to classification problems.

5.1 Analyse transactions data for possible associations and derive baskets of associated products.

5.2 Apply neural networks to a classification problem in domains such as speech recognition, image recognition and document categorisation.

Assessment Criteria

Definitions of support, confidence and lift

Aprioiri algorithm for market basket analysis

Neural network problem for classification problem

Task 5.2

Perform text mining on social media data.

5.1 Appraise the concepts and techniques used in text mining.

5.2 Analyse unstructured data and perform sentiment analysis of Twitter data to identify the positive, negative or neutral tone of the text.

Assessment Criteria

What is text mining?

Term Document Matrix

Word cloud

Establishing connection with Twitter using twitteR package and Tweepy in Python

Task 5.3

Develop web pages using the SHINY package.

5.1 Build interpretable dashboards using the SHINY package.

5.2 Host standalone applications on a web page to present the results of data analysis.

Assessment Criteria

Introduction to SHINY

Introduction to R Markdown

Build dashboards

Host standalone apps on a webpage or embed them in R Markdown documents or build dashboards.

Task 5.4

Apply the Hadoop framework in Big Data Analytics.

5.1 Evaluate core concepts of Hadoop.

5.2 Appraise applications of Big Data Analytics in various industries.

5.3 Evaluate the use of the HADOOP platform for performing Big Data Analytics.

Assessment Criteria

What is Big Data?

Features of Big Data (Volume, Velocity and Variety)

Big Data in different industries (Healthcare, Telecom, etc.)

HADOOP architecture

Introduction to R HADOOP package

Task 5.5

Evaluate the fundamental concepts of artificial intelligence.

5.1 Build a simple AI model using common machine learning algorithms that support business analysis and decision-making. In comparison with traditional assumptions from business theory.

Assessment Criteria

What is AI and Theory behind AI

What is Q learning

The Monte Carlo theory

Task 6.1

Use SQL programming for data analysis.

6.1 Evaluate core SQL for data analytics.

6.2 Carry out data wrangling and analysis in SQL to uncover insights in underutilized data.

Assessment Criteria

SQL programming Basics

Data Wrangling and analysis

Text mining of Twitter data

Task 6.2

Evaluate the concept of transformation and the key technologies that drive it.

6.1 Analyse the technologies that underpin digital transformation.

6.2 Assess the managerial challenges associated with implementing digital transformation successfully.

Assessment Criteria

Fundamentals of Cloud Computing

Compare and contrast cloud computing with traditional computing models

Task 6.3

Assess the strategic impact of the application of Big Data and Artificial Intelligence on business organisations.

6.1 Evaluate theories of strategy and their application to the digital economy and business.

6.2 Analyse examples of the application of Artificial intelligence on business operations or strategy.

Assessment Criteria

Software as a Service

Platform as a Services

Infrastructure as a Service

Business impact of Cloud Computing

Historical development of Artificial Intelligence

Task 6.4

Appraise theories of innovation and distinguish between disruptive and incremental change.

6.1 Evaluate theories of disruptive innovation and how they explain the impact of innovation on industries.

6.2 Evaluate the managerial challenges of promoting and implementing innovation within organizations.

Assessment Criteria

Vs of data – Volume, velocity, variety, veracity and value

Christensen’s theory of disruptive innovation

Task 6.5

Evaluate ethics practices within organisations and how they relate to issues in Data Science.

6.1 Assess the role that codes of ethics play in the operation and sustainability of organisations.

6.2 Evaluate the importance of reporting and disclosure for ethical practice.

Assessment Criteria

Ethical dilemmas and issues in Artificial Intelligence and Big Data

Distinguished

Excellent

Good

Proficient

Basic

Marginal

Unacceptable

Criteria

80+

70

60

50

40

30

0

Content (alignment with assessment criteria)

Extensive evaluation and synthesis of ideas; includes substantial original thinking

Comprehensive critical evaluation and synthesis of ideas; includes coherent original thinking

Adequate

evaluation and synthesis of key ideas beyond basic descriptions; includes original thinking

Describes main ideas with evidence of evaluation; includes some original thinking

Describes some of the main ideas but omits some concepts; limited evidence of evaluation;

confused original

Largely incomplete description of main issues; misses key concepts; no original thinking

Inadequate information or containing information not relevant to the topic

thinking

In-depth, detailed

Clear and relevant

application of theory; fully integrates literature to support ideas and concepts

Appropriate

Adequate

Confused application of theory; does not use literature for support

Little or no evidence of application of theory and relevant literature

Application of

and relevant

application of

application of

Limited application

Theory and

application of

theory; integrates

theory; uses

of theory; refers to

Literature

theory; expertly

literature to support

literature to support

literature but may

integrates literature

ideas and concepts

ideas and concepts

not use it

to support ideas and

concept

consistently

Knowledge and Understanding

Extensive depth of understanding and exploration beyond key principles and concepts

Comprehensive knowledge and depth of understanding key principles and concepts

Sound understanding of principles and concepts

Basic Knowledge and understanding of key concepts and principles

Limited and superficial knowledge and understanding of key concepts and principles

Confused or inadequate knowledge and understanding of key concepts and principles

Little or no evidence of knowledge or understanding of key concepts and principles

Logical, coherent

Somewhat weak presentation; errors in mechanics and syntax may interfere with meaning

and polished

Logical, coherent

Logical structure to

Confused

Illogical

presentation

presentation

presentation; makes

Orderly

presentation; errors

presentation lacking

Presentation and

exceeding

demonstrating

few errors in

presentation; minor

in mechanics and

cohesion; contains

Writing Skills

expectations at this

mastery; free from

mechanics and

errors in mechanics

syntax often

significant errors

level; free from

errors in mechanics

syntax which do not

and syntax

interfere with

that interfere with

errors in mechanics

and syntax

prohibit meaning

meaning

meaning

and syntax

Referencing

Advanced use of in- text citation and references

Mastery of in-text citation and referencing

Appropriate use of in-text citation and referencing

Adequate use of in- text citation and referencing

Limited use of in- text citation and referencing

Inadequate use of citation and referencing

Little or no evidence of appropriate referencing or use

of source

Page 8 of 11

Page 9 of 11

Directions:

For each of the criteria listed in the first column, circle one box in the corresponding column to the right which best reflects the student’s work on this particular assessment activity (e.g., project, presentation, essay).

Provide specific feedback to a student about each of the criteria scores he/she earned by writing comments and suggestions for improvement in the last row titled “Instructor’s comments.”

To arrive at a mark, total the boxes and divide by 5 to arrive at final mark.

Example:

Distinguished

Excellent

Good

Proficient

Basic

Marginal

Unacceptable

Range

80-100

70-79

60-69

50-59

40-49

35-39

0-34

Criteria

Score

Content

50

Application of Theory and Literature

40

Knowledge and Understanding

50

Presentation/Writing Skills

40

Referencing

40

Total Score

220/5 = 44, Basic

Page 10 of 11

HEAD OFFICE

7 Acorn Business Park Commercial Gate, Nottingham Nottinghamshire

NG18 1EX

LONDON OFFICE

Golden Cross House

8 Duncannon Street, London WC2N 4JF [email protected]

Copyright 2019 Qualifi Ltd

Page 11 of 11

QUALIFI ASSESSMENT DOCUMENT

Qualification

Qualifi Level 7 Diploma in Data Science

Qualification No (RQF) Unit Name

Unit Reference

No of Credits

J/618/4970

Exploratory Data Analysis

DS01

20 Credits

Introduction

Prior to attempting this coursework assignment, learners must familiarise themselves with the following policies:

Centre Specification

o Can be found at https://qualifi.net/qualifi-level-7-diploma-data-science/

Qualifi Quality Assurance Standards

Qualifi Quality Policy Statement

Plagiarism and Collusion

In submitting the assignment Learner’s must complete a statement of authenticity confirming that the work submitted for all tasks is their own. The statement should also include the word count.

Your accredited study centre will direct you to the appropriate software that checks the level of similarity. Qualifi recommends the use of https://www.turnitin.com as a part of the assessment.

Plagiarism and collusion are treated very seriously. Plagiarism involves presenting work, excerpts, ideas or passages of another author without appropriate referencing and attribution.

Collusion occurs when two or more learners submit work which is so alike in ideas, content, wording and/or structure that the similarity goes beyond what might have been mere coincidence

Please familiarise yourself on Qualifi’s Malpractice and Maladministration policy, where you can find further information

Referencing

A professional approach to work is expected from all learners. Learners must therefore identify and acknowledge ALL sources/methodologies/applications used.

The learner must use an appropriate referencing system to achieve this. Marks are not awarded for the use of English; however, the learner must express ideas clearly and ensure that appropriate terminology is used to convey accuracy in meaning.

Qualifi recommends using Harvard Style of Referencing throughout your work.

Appendices

You may include appendices to support your work, however appendices must only contain additional supporting information, and must be clearly referenced in your assignment.

You may also include tables, graphs, diagrams, Gantt chart and flowcharts that support the main report should be incorporated into the back of the assignment report that is submitted.

Any published secondary information such as annual reports and company literature, should be referenced in the main text of the assignment, in accordance of Harvard Style Referencing, and referenced at the end of the assignment.

Confidentiality

Where a Learner is using organisational information that deals with sensitive material or issues, they must seek the advice and permission from that organisation about its inclusion.

Where confidentiality is an issue, Learners are advised to anonymise their assignment report so that it cannot be attributed to that particular organisation.

Word Count Policy

Learners must comply with the required word count, within a margin of +10%. These rules exclude the index, headings, tables, images, footnotes, appendices and information contained within references and bibliographies.

When an assessment task requires learners to produce presentation slides with supporting notes, the word count applies to the supporting notes only.

Submission of Assignments

All work to be submitted on the due date as per Centre’s advice.

All work must be submitted in a single electronic document (.doc file), or via Turnitin, where applicable.

This should go to the tutor and Centre Manager/Programme Director, plus one hard copy posted to the Centre Manager (if required)

Marking and grades

Qualifi uses a standard marking rubric for all assignments, and you can find the details at the end of this document.

Unless stated elsewhere, Learners must answer all questions in this document.

Assignment Question

Task 1.1

Handle and manage multiple datasets within R and Python environments.

1.1 Work smoothly in R and Python development environments.

1.2 Import and export data sets and create data frames within R and Python in accordance with instructions.

1.3 Sort, merge, aggregate and append data sets in accordance with instructions.

Assessment Criteria

Learning UI about basics rule of programming in both R and Python

Create and import external datasets in R and python

Export R data frames into external flat files

Data Management in R and Python (Sort, merge, aggregate and subset)

Task 1.2

Use measures of central tendency to summarize data and assess both the symmetry and variation in the data.

1.1 Differentiate between variable types and measurement scales.

1.2 Calculate the most appropriate (mean, median or mode etc.) measure of central tendency based on variable type.

1.3 Compare variation in two datasets using the coefficient of variation.

1.4 Assess symmetry of data using measures of skewness

Assessment Criteria

Introduction to basic concepts of Statistics, such as measures of central tendency, variation, skewness, kurtosis

Task 1.3

Present and summarise distributions of data and the relationships between variables graphically.

1.1 Select the most appropriate graph to present the data.

1.2 Assess distribution using Box-Plot and Histogram.

1.3 Visualize bivariate relationships using scatter-plots.

1.4 Present time-series data using motion charts.

Assessment Criteria

Frequency tables crosstabs and bivariate correlation analysis

Data visualization: what and why? Grammar of graphics, handling data for visualization

Commonly used charts and graphs using ggplot2 package in R and matplotlib in python

Advanced graphics in R and Python Data Management in R and Python (Sort, merge, aggregate and subset)

Data Management in R and Python (Sort, merge, aggregate and subset)

Task 1.4

Evaluate standard discrete and standard continuous distributions.

Analyse the statistical distribution of a discrete random variable.

Calculate probabilities using R for Binomial and Poisson Distribution.

Fit Binomial and Poisson distributions to observed data.

Evaluate the properties of Normal and Log Normal distributions.

Calculate probabilities using R for normal and Log normal distributions.

Fit normal, Log normal and exponential distributions to observed data.

1.7 Evaluate the concept of sampling distribution (t, F and Chi Square).

Assessment Criteria

Concept of random variables and statistical distribution

Discrete vs. Continuous Random Variables

t tests (one sample, independent samples, paired sample)

Standard discrete distributions-Bernoulli, Binomial and Poisson

Using R to calculate probabilities

Fitting of discrete distributions to observed data

Standard continuous distributions-Normal, Log Normal, Exponential

Introduction to sampling distributions

Task 1.5

Formulate research hypotheses and perform hypothesis testing.

1.1 Write R and Python programmes that evaluate appropriate hypothesis tests

1.2 Draw statistical inference using output in R.

1.3 Translate research problems into statistical hypotheses.

 

1.4 Assess the most appropriate statistical test for a hypothesis

Assessment Criteria

Statistical Hypothesis Testing-concepts and terminology

Parameter, test statistics, level of significance, power, critical region

Parametric vs. non-Parametric Tests

Z tests for proportions (single and independent samples)

Non-parametric tests (Mann-Whitney U, Wilcoxon’s signed rank)

Tests for Normality, Q-Q plot

Task 2.1

Analyse the concept of variance (ANOVA) and an select an appropriate ANOVA or ANCOVA model.

2.1 Define variable, factor and level for a given research problem.

2.2 Evaluate the sources of variation, explained variation and unexplained variation.

2.3 Define a linear model for ANOVA/ANCOVA.

2.4 Confirm the validity of assumption based on definitions and analysis of variation.

2.5 Perform analysis using R and Python programs to confirm validity of assumptions.

2.6 Draw inferences from statistical analysis of the research problem.

Assessment Criteria

What is analysis of variance?

Definitions: Variable, factor, levels

One Way Analysis of Variance

Two Way Analysis of Variance (including interaction effects)

Multi Way Analysis of Variance

Analysis of Covariance

Kruskal-Wallis Test

Friedman Test

Task 2.2

Carry out global and individual testing of parameters used in defining predictive models.

2.1 Evaluate dependent variables and predictors.

2.2 Develop linear models using the lm function in R and the .ols function in Python.

2.3 Interpret signs and values of estimated regression coefficients.

2.4 Interpret output of global testing using F distributions.

2.5 Identify significant and insignificant variables.

Assessment Criteria

Concept of random variables and statistical distribution

Concept of a statistical model

Estimation of model parameters using Least Square Method

Interpreting regression coefficients

Assessing the goodness of fit of a model

Global hypothesis testing using F distribution

Individual testing using t distributions

Task 2.3

Validate assumptions in multiple linear regression.

2.1 Resolve multicollinearity problems.

2.2 Revise a model after resolving the problem.

2.3 Assess the performance of the ridge regression model.

2.4 Perform residual analysis – graphically & using statistical tests to analyse results.

2.5 Resolve problems of non-normality of errors and heteroscedasticity.

Assessment Criteria

Concept of Multicollinearity

Calculating Variance Inflation Factors

Resolving problem by dropping variables

Ridge regression method

Stepwise regression as a strategy

Residual analysis

Shapiro Wilk test, K-S test and Q-Q plot for residuals

White’s test and Breusch-Pagan Test

Partitioning data using the caret package

Task 2.4

Validate models via data partitioning, out of sample testing and cross-validation.

2.1 Develop models and implement them on testing data in accordance with the specification.

2.2 Evaluate the stability of the models using k-fold cross validation.

2.3 Evaluate influential observations using Cook’s distance and hat matrix.

Assessment Criteria

Model development on training data

Model validation on testing data using R squared and RMSE

Concept of k-fold cross validation

Performing k-fold cross validation using the caret package

Identifying influential observations

Task 2.5

Develop models using binary logistic regression and assess their performance.

Evaluate when to use Binary Linear Regression correctly.

2.2 Develop realistic models using functions in R and Python.

2.3 Interpret output of global testing using Linear Regression Testing in order to assess the results.

2.4 Perform out of sample validation that tests predictive quality of the model.

Assessment Criteria

Model definition and parameter estimation

Estimation of model parameters using MLE

Interpreting regression coefficients and odds ratio

Assessing goodness of fit of the model

Global hypothesis testing using LRT distribution

Individual testing using Wald’s test

Task 3.1

Develop applications of multinomial logistic regression and ordinal logistic regression.

3.1 Select method for modelling categorical variable.

3.2 Develop models for nominal and ordinal scaled dependent variable in R and Python correctly.

Assessment Criteria

Classification table

ROC curve

K-S Statistic

Multinomial and Ordinal Logistic Regression – model building and parameter estimation

Interpretation of regression coefficients

Classification table and deviance test

Task 3.2

Develop generalised linear models and carry out survival analysis and Cox regression.

3.1 Evaluate the concept of generalised linear models.

3.2 Apply the Poisson regression model and negative binomial regression to count data correctly.

3.3. Model ‘time to event’ variable using cox regression.

Assessment Criteria

Concept of GLM and link function and .GLM

Poisson Regression

Negative Binomial Regression

Survival Analysis Introduction

Cox Regression

Task 3.3

Assess the concepts and uses of time series analysis and test for stationarity in time series data.

3.1 Create time series object in R and Python correctly including decomposing time series and assessing different components.

Assess whether a time series is stationary.

Transform non-stationary time series data into stationary time series data.

Assessment Criteria

Components of time series

Seasonal decomposition

Trend analysis

Auto-correlogram

Partial auto-correlogram

Dickey-Fuller test

Converting non-stationary time series data into stationary time series data

Task 3.4

Validate ARIMA (Auto Regressive Integrated Moving Average) models and use estimation.

3.1 Identify p, d and q of ARIMA model using ACF (auto-correlation function) and a PACF (partial auto-correlation function) to describe how well values are related.

3.2 Develop ARIMA models using R and python and evaluate whether errors follow the white noise process.

3.3 Finalize the model and forecast n-period ahead to make accurate predictions.

Assessment Criteria

Concepts of AR, MA and ARIMA models

Model identification using ACF and PACF

Parameter estimation

Residual analysis (testing for white noise process)

Selection of optimal model

Task 3.5

Implement panel data regression methods.

3.1 Evaluate the concept of panel data regression.

3.2 Analyse the features of panel data.

3.3 Build panel data regression models in a range of contexts.

3.4 Evaluate the difference between fixed effect and random effect models.

Assessment Criteria

What is Panel data?

Need for different models for Panel data

Panel data regression methods

What is Panel data?

Need for different models for Panel data

Panel data regression methods

Task 4.1

Define Principal Component Analysis (PCA) and its derivations and assess their application.

4.1 Evaluate the need for data reduction.

4.2 Perform principal component analysis and develop scoring models using R and python to minimise data loss and improve interpretability of data.

4.3 Resolve multi-collinearity using Principal Component Regression.

Assessment Criteria

Concept of Data reduction

Definition of first, second, … ph principal component

Deriving principal component using Eigenvectors

Deciding optimum number of principal components

Developing scoring models using PCA

Principal component regression

Task 4.2

Understand hierarchical and non-hierarchical cluster analysis and assess their outputs.

4.1 Perform data reduction and derive interpretable factors and use factor scores to interpret the data set.

4.2 Obtain a brand perception map using multi-dimensional scaling.

Assessment Criteria

Orthogonal factor model

Estimation of loading matrix

Interpreting factor solution

Deciding optimum number of factors

Using factor scores for further analysis

Factor rotation

Concept of MDS

Variable reduction using MDS

Task 4.3

Evaluate the concept of panel data regression and implement panel data methods.

4.1 Evaluate the need for cluster analysis.

4.2 Obtain clusters using suitable methods.

4.3 Interpret cluster solutions and analyse the use of clusters for business strategies.

Assessment Criteria

Concept of cluster analysis

Hierarchical cluster analysis methods (linkage methods)

Using dendrogram to estimate optimum number of clusters

k-means clustering methods

Using k-means runs function in R and Python to find optimum number of k

Task 4.4

Appraise classification methods including Naïve Bayes and the support vector machine algorithm.

4.1 Evaluate different methods of classification and the performance of classifiers.

4.2 Design optimum classification rules to achieve minimum error rates.

Assessment Criteria

Bayes theorem and its applications

Constructing classifier using Naïve Bayes method

Concept of Hyperlane

Support vector machine algorithm

Comparison with Binary Logistic Regression

Task 4.5

Apply decision tree and random forest algorithms to classification and regression problems.

4.1 Use decision trees for classification and regression problems in comparison with classical methodologies.

4.2 Analyse concepts of bootstrapping and bagging.

4.3 Apply the random forest method in a range of business and social contexts .

Assessment Criteria

Basics of Decision Tree

Concept of CART

CHAID algorithm

ctree function in R

Bootstrapping and bagging

Random forest algorithm

Task 5.1

Analyse Market Baskets and apply neural networks to classification problems.

5.1 Analyse transactions data for possible associations and derive baskets of associated products.

5.2 Apply neural networks to a classification problem in domains such as speech recognition, image recognition and document categorisation.

Assessment Criteria

Definitions of support, confidence and lift

Aprioiri algorithm for market basket analysis

Neural network problem for classification problem

Task 5.2

Perform text mining on social media data.

5.1 Appraise the concepts and techniques used in text mining.

5.2 Analyse unstructured data and perform sentiment analysis of Twitter data to identify the positive, negative or neutral tone of the text.

Assessment Criteria

What is text mining?

Term Document Matrix

Word cloud

Establishing connection with Twitter using twitteR package and Tweepy in Python

Task 5.3

Develop web pages using the SHINY package.

5.1 Build interpretable dashboards using the SHINY package.

5.2 Host standalone applications on a web page to present the results of data analysis.

Assessment Criteria

Introduction to SHINY

Introduction to R Markdown

Build dashboards

Host standalone apps on a webpage or embed them in R Markdown documents or build dashboards.

Task 5.4

Apply the Hadoop framework in Big Data Analytics.

5.1 Evaluate core concepts of Hadoop.

5.2 Appraise applications of Big Data Analytics in various industries.

5.3 Evaluate the use of the HADOOP platform for performing Big Data Analytics.

Assessment Criteria

What is Big Data?

Features of Big Data (Volume, Velocity and Variety)

Big Data in different industries (Healthcare, Telecom, etc.)

HADOOP architecture

Introduction to R HADOOP package

Task 5.5

Evaluate the fundamental concepts of artificial intelligence.

5.1 Build a simple AI model using common machine learning algorithms that support business analysis and decision-making. In comparison with traditional assumptions from business theory.

Assessment Criteria

What is AI and Theory behind AI

What is Q learning

The Monte Carlo theory

Task 6.1

Use SQL programming for data analysis.

6.1 Evaluate core SQL for data analytics.

6.2 Carry out data wrangling and analysis in SQL to uncover insights in underutilized data.

Assessment Criteria

SQL programming Basics

Data Wrangling and analysis

Text mining of Twitter data

Task 6.2

Evaluate the concept of transformation and the key technologies that drive it.

6.1 Analyse the technologies that underpin digital transformation.

6.2 Assess the managerial challenges associated with implementing digital transformation successfully.

Assessment Criteria

Fundamentals of Cloud Computing

Compare and contrast cloud computing with traditional computing models

Task 6.3

Assess the strategic impact of the application of Big Data and Artificial Intelligence on business organisations.

6.1 Evaluate theories of strategy and their application to the digital economy and business.

6.2 Analyse examples of the application of Artificial intelligence on business operations or strategy.

Assessment Criteria

Software as a Service

Platform as a Services

Infrastructure as a Service

Business impact of Cloud Computing

Historical development of Artificial Intelligence

Task 6.4

Appraise theories of innovation and distinguish between disruptive and incremental change.

6.1 Evaluate theories of disruptive innovation and how they explain the impact of innovation on industries.

6.2 Evaluate the managerial challenges of promoting and implementing innovation within organizations.

Assessment Criteria

Vs of data – Volume, velocity, variety, veracity and value

Christensen’s theory of disruptive innovation

Task 6.5

Evaluate ethics practices within organisations and how they relate to issues in Data Science.

6.1 Assess the role that codes of ethics play in the operation and sustainability of organisations.

6.2 Evaluate the importance of reporting and disclosure for ethical practice.

Assessment Criteria

Ethical dilemmas and issues in Artificial Intelligence and Big Data

Distinguished

Excellent

Good

Proficient

Basic

Marginal

Unacceptable

Criteria

80+

70

60

50

40

30

0

Content (alignment with assessment criteria)

Extensive evaluation and synthesis of ideas; includes substantial original thinking

Comprehensive critical evaluation and synthesis of ideas; includes coherent original thinking

Adequate

evaluation and synthesis of key ideas beyond basic descriptions; includes original thinking

Describes main ideas with evidence of evaluation; includes some original thinking

Describes some of the main ideas but omits some concepts; limited evidence of evaluation;

confused original

Largely incomplete description of main issues; misses key concepts; no original thinking

Inadequate information or containing information not relevant to the topic

thinking

In-depth, detailed

Clear and relevant

application of theory; fully integrates literature to support ideas and concepts

Appropriate

Adequate

Confused application of theory; does not use literature for support

Little or no evidence of application of theory and relevant literature

Application of

and relevant

application of

application of

Limited application

Theory and

application of

theory; integrates

theory; uses

of theory; refers to

Literature

theory; expertly

literature to support

literature to support

literature but may

integrates literature

ideas and concepts

ideas and concepts

not use it

to support ideas and

concept

consistently

Knowledge and Understanding

Extensive depth of understanding and exploration beyond key principles and concepts

Comprehensive knowledge and depth of understanding key principles and concepts

Sound understanding of principles and concepts

Basic Knowledge and understanding of key concepts and principles

Limited and superficial knowledge and understanding of key concepts and principles

Confused or inadequate knowledge and understanding of key concepts and principles

Little or no evidence of knowledge or understanding of key concepts and principles

Logical, coherent

Somewhat weak presentation; errors in mechanics and syntax may interfere with meaning

and polished

Logical, coherent

Logical structure to

Confused

Illogical

presentation

presentation

presentation; makes

Orderly

presentation; errors

presentation lacking

Presentation and

exceeding

demonstrating

few errors in

presentation; minor

in mechanics and

cohesion; contains

Writing Skills

expectations at this

mastery; free from

mechanics and

errors in mechanics

syntax often

significant errors

level; free from

errors in mechanics

syntax which do not

and syntax

interfere with

that interfere with

errors in mechanics

and syntax

prohibit meaning

meaning

meaning

and syntax

Referencing

Advanced use of in- text citation and references

Mastery of in-text citation and referencing

Appropriate use of in-text citation and referencing

Adequate use of in- text citation and referencing

Limited use of in- text citation and referencing

Inadequate use of citation and referencing

Little or no evidence of appropriate referencing or use

of source

Page 8 of 11

Page 9 of 11

Directions:

For each of the criteria listed in the first column, circle one box in the corresponding column to the right which best reflects the student’s work on this particular assessment activity (e.g., project, presentation, essay).

Provide specific feedback to a student about each of the criteria scores he/she earned by writing comments and suggestions for improvement in the last row titled “Instructor’s comments.”

To arrive at a mark, total the boxes and divide by 5 to arrive at final mark.

Example:

Distinguished

Excellent

Good

Proficient

Basic

Marginal

Unacceptable

Range

80-100

70-79

60-69

50-59

40-49

35-39

0-34

Criteria

Score

Content

50

Application of Theory and Literature

40

Knowledge and Understanding

50

Presentation/Writing Skills

40

Referencing

40

Total Score

220/5 = 44, Basic

Page 10 of 11

HEAD OFFICE

7 Acorn Business Park Commercial Gate, Nottingham Nottinghamshire

NG18 1EX

LONDON OFFICE

Golden Cross House

8 Duncannon Street, London WC2N 4JF [email protected]

Copyright 2019 Qualifi Ltd

Page 11 of 11