QUALIFI ASSESSMENT DOCUMENT
-
Qualification
Qualifi Level 7 Diploma in Data Science
Qualification No (RQF) Unit Name
Unit Reference
No of Credits
J/618/4970
Exploratory Data Analysis
DS01
20 Credits
Introduction
Prior to attempting this coursework assignment, learners must familiarise themselves with the following policies:
Centre Specification
o Can be found at https://qualifi.net/qualifi-level-7-diploma-data-science/
Qualifi Quality Assurance Standards
Qualifi Quality Policy Statement
Plagiarism and Collusion
In submitting the assignment Learner’s must complete a statement of authenticity confirming that the work submitted for all tasks is their own. The statement should also include the word count.
Your accredited study centre will direct you to the appropriate software that checks the level of similarity. Qualifi recommends the use of https://www.turnitin.com as a part of the assessment.
Plagiarism and collusion are treated very seriously. Plagiarism involves presenting work, excerpts, ideas or passages of another author without appropriate referencing and attribution.
Collusion occurs when two or more learners submit work which is so alike in ideas, content, wording and/or structure that the similarity goes beyond what might have been mere coincidence
Please familiarise yourself on Qualifi’s Malpractice and Maladministration policy, where you can find further information
Referencing
A professional approach to work is expected from all learners. Learners must therefore identify and acknowledge ALL sources/methodologies/applications used.
The learner must use an appropriate referencing system to achieve this. Marks are not awarded for the use of English; however, the learner must express ideas clearly and ensure that appropriate terminology is used to convey accuracy in meaning.
Qualifi recommends using Harvard Style of Referencing throughout your work.
Appendices
You may include appendices to support your work, however appendices must only contain additional supporting information, and must be clearly referenced in your assignment.
You may also include tables, graphs, diagrams, Gantt chart and flowcharts that support the main report should be incorporated into the back of the assignment report that is submitted.
Any published secondary information such as annual reports and company literature, should be referenced in the main text of the assignment, in accordance of Harvard Style Referencing, and referenced at the end of the assignment.
Confidentiality
Where a Learner is using organisational information that deals with sensitive material or issues, they must seek the advice and permission from that organisation about its inclusion.
Where confidentiality is an issue, Learners are advised to anonymise their assignment report so that it cannot be attributed to that particular organisation.
Word Count Policy
Learners must comply with the required word count, within a margin of +10%. These rules exclude the index, headings, tables, images, footnotes, appendices and information contained within references and bibliographies.
When an assessment task requires learners to produce presentation slides with supporting notes, the word count applies to the supporting notes only.
Submission of Assignments
All work to be submitted on the due date as per Centre’s advice.
All work must be submitted in a single electronic document (.doc file), or via Turnitin, where applicable.
This should go to the tutor and Centre Manager/Programme Director, plus one hard copy posted to the Centre Manager (if required)
Marking and grades
Qualifi uses a standard marking rubric for all assignments, and you can find the details at the end of this document.
Unless stated elsewhere, Learners must answer all questions in this document.
Assignment Question
Task 1.1
Handle and manage multiple datasets within R and Python environments.
1.1 Work smoothly in R and Python development environments.
1.2 Import and export data sets and create data frames within R and Python in accordance with instructions.
1.3 Sort, merge, aggregate and append data sets in accordance with instructions.
Assessment Criteria
Learning UI about basics rule of programming in both R and Python
Create and import external datasets in R and python
Export R data frames into external flat files
Data Management in R and Python (Sort, merge, aggregate and subset)
Task 1.2
Use measures of central tendency to summarize data and assess both the symmetry and variation in the data.
1.1 Differentiate between variable types and measurement scales.
1.2 Calculate the most appropriate (mean, median or mode etc.) measure of central tendency based on variable type.
1.3 Compare variation in two datasets using the coefficient of variation.
1.4 Assess symmetry of data using measures of skewness
Assessment Criteria
Introduction to basic concepts of Statistics, such as measures of central tendency, variation, skewness, kurtosis
Task 1.3
Present and summarise distributions of data and the relationships between variables graphically.
1.1 Select the most appropriate graph to present the data.
1.2 Assess distribution using Box-Plot and Histogram.
1.3 Visualize bivariate relationships using scatter-plots.
1.4 Present time-series data using motion charts.
Assessment Criteria
Frequency tables crosstabs and bivariate correlation analysis
Data visualization: what and why? Grammar of graphics, handling data for visualization
Commonly used charts and graphs using ggplot2 package in R and matplotlib in python
Advanced graphics in R and Python Data Management in R and Python (Sort, merge, aggregate and subset)
Data Management in R and Python (Sort, merge, aggregate and subset)
Task 1.4
Evaluate standard discrete and standard continuous distributions.
Analyse the statistical distribution of a discrete random variable.
Calculate probabilities using R for Binomial and Poisson Distribution.
Fit Binomial and Poisson distributions to observed data.
Evaluate the properties of Normal and Log Normal distributions.
Calculate probabilities using R for normal and Log normal distributions.
Fit normal, Log normal and exponential distributions to observed data.
1.7 Evaluate the concept of sampling distribution (t, F and Chi Square).
Assessment Criteria
Concept of random variables and statistical distribution
Discrete vs. Continuous Random Variables
t tests (one sample, independent samples, paired sample)
Standard discrete distributions-Bernoulli, Binomial and Poisson
Using R to calculate probabilities
Fitting of discrete distributions to observed data
Standard continuous distributions-Normal, Log Normal, Exponential
Introduction to sampling distributions
Task 1.5
Formulate research hypotheses and perform hypothesis testing.
1.1 Write R and Python programmes that evaluate appropriate hypothesis tests
1.2 Draw statistical inference using output in R.
1.3 Translate research problems into statistical hypotheses.
1.4 Assess the most appropriate statistical test for a hypothesis
Assessment Criteria
Statistical Hypothesis Testing-concepts and terminology
Parameter, test statistics, level of significance, power, critical region
Parametric vs. non-Parametric Tests
Z tests for proportions (single and independent samples)
Non-parametric tests (Mann-Whitney U, Wilcoxon’s signed rank)
Tests for Normality, Q-Q plot
Task 2.1
Analyse the concept of variance (ANOVA) and an select an appropriate ANOVA or ANCOVA model.
2.1 Define variable, factor and level for a given research problem.
2.2 Evaluate the sources of variation, explained variation and unexplained variation.
2.3 Define a linear model for ANOVA/ANCOVA.
2.4 Confirm the validity of assumption based on definitions and analysis of variation.
2.5 Perform analysis using R and Python programs to confirm validity of assumptions.
2.6 Draw inferences from statistical analysis of the research problem.
Assessment Criteria
What is analysis of variance?
Definitions: Variable, factor, levels
One Way Analysis of Variance
Two Way Analysis of Variance (including interaction effects)
Multi Way Analysis of Variance
Analysis of Covariance
Kruskal-Wallis Test
Friedman Test
Task 2.2
Carry out global and individual testing of parameters used in defining predictive models.
2.1 Evaluate dependent variables and predictors.
2.2 Develop linear models using the lm function in R and the .ols function in Python.
2.3 Interpret signs and values of estimated regression coefficients.
2.4 Interpret output of global testing using F distributions.
2.5 Identify significant and insignificant variables.
Assessment Criteria
Concept of random variables and statistical distribution
Concept of a statistical model
Estimation of model parameters using Least Square Method
Interpreting regression coefficients
Assessing the goodness of fit of a model
Global hypothesis testing using F distribution
Individual testing using t distributions
Task 2.3
Validate assumptions in multiple linear regression.
2.1 Resolve multicollinearity problems.
2.2 Revise a model after resolving the problem.
2.3 Assess the performance of the ridge regression model.
2.4 Perform residual analysis – graphically & using statistical tests to analyse results.
2.5 Resolve problems of non-normality of errors and heteroscedasticity.
Assessment Criteria
Concept of Multicollinearity
Calculating Variance Inflation Factors
Resolving problem by dropping variables
Ridge regression method
Stepwise regression as a strategy
Residual analysis
Shapiro Wilk test, K-S test and Q-Q plot for residuals
White’s test and Breusch-Pagan Test
Partitioning data using the caret package
Task 2.4
Validate models via data partitioning, out of sample testing and cross-validation.
2.1 Develop models and implement them on testing data in accordance with the specification.
2.2 Evaluate the stability of the models using k-fold cross validation.
2.3 Evaluate influential observations using Cook’s distance and hat matrix.
Assessment Criteria
Model development on training data
Model validation on testing data using R squared and RMSE
Concept of k-fold cross validation
Performing k-fold cross validation using the caret package
Identifying influential observations
Task 2.5
Develop models using binary logistic regression and assess their performance.
Evaluate when to use Binary Linear Regression correctly.
2.2 Develop realistic models using functions in R and Python.
2.3 Interpret output of global testing using Linear Regression Testing in order to assess the results.
2.4 Perform out of sample validation that tests predictive quality of the model.
Assessment Criteria
Model definition and parameter estimation
Estimation of model parameters using MLE
Interpreting regression coefficients and odds ratio
Assessing goodness of fit of the model
Global hypothesis testing using LRT distribution
Individual testing using Wald’s test
Task 3.1
Develop applications of multinomial logistic regression and ordinal logistic regression.
3.1 Select method for modelling categorical variable.
3.2 Develop models for nominal and ordinal scaled dependent variable in R and Python correctly.
Assessment Criteria
Classification table
ROC curve
K-S Statistic
Multinomial and Ordinal Logistic Regression – model building and parameter estimation
Interpretation of regression coefficients
Classification table and deviance test
Task 3.2
Develop generalised linear models and carry out survival analysis and Cox regression.
3.1 Evaluate the concept of generalised linear models.
3.2 Apply the Poisson regression model and negative binomial regression to count data correctly.
3.3. Model ‘time to event’ variable using cox regression.
Assessment Criteria
Concept of GLM and link function and .GLM
Poisson Regression
Negative Binomial Regression
Survival Analysis Introduction
Cox Regression
Task 3.3
Assess the concepts and uses of time series analysis and test for stationarity in time series data.
3.1 Create time series object in R and Python correctly including decomposing time series and assessing different components.
Assess whether a time series is stationary.
Transform non-stationary time series data into stationary time series data.
Assessment Criteria
Components of time series
Seasonal decomposition
Trend analysis
Auto-correlogram
Partial auto-correlogram
Dickey-Fuller test
Converting non-stationary time series data into stationary time series data
Task 3.4
Validate ARIMA (Auto Regressive Integrated Moving Average) models and use estimation.
3.1 Identify p, d and q of ARIMA model using ACF (auto-correlation function) and a PACF (partial auto-correlation function) to describe how well values are related.
3.2 Develop ARIMA models using R and python and evaluate whether errors follow the white noise process.
3.3 Finalize the model and forecast n-period ahead to make accurate predictions.
Assessment Criteria
Concepts of AR, MA and ARIMA models
Model identification using ACF and PACF
Parameter estimation
Residual analysis (testing for white noise process)
Selection of optimal model
Task 3.5
Implement panel data regression methods.
3.1 Evaluate the concept of panel data regression.
3.2 Analyse the features of panel data.
3.3 Build panel data regression models in a range of contexts.
3.4 Evaluate the difference between fixed effect and random effect models.
Assessment Criteria
What is Panel data?
Need for different models for Panel data
Panel data regression methods
What is Panel data?
Need for different models for Panel data
Panel data regression methods
Task 4.1
Define Principal Component Analysis (PCA) and its derivations and assess their application.
4.1 Evaluate the need for data reduction.
4.2 Perform principal component analysis and develop scoring models using R and python to minimise data loss and improve interpretability of data.
4.3 Resolve multi-collinearity using Principal Component Regression.
Assessment Criteria
Concept of Data reduction
Definition of first, second, … ph principal component
Deriving principal component using Eigenvectors
Deciding optimum number of principal components
Developing scoring models using PCA
Principal component regression
Task 4.2
Understand hierarchical and non-hierarchical cluster analysis and assess their outputs.
4.1 Perform data reduction and derive interpretable factors and use factor scores to interpret the data set.
4.2 Obtain a brand perception map using multi-dimensional scaling.
Assessment Criteria
Orthogonal factor model
Estimation of loading matrix
Interpreting factor solution
Deciding optimum number of factors
Using factor scores for further analysis
Factor rotation
Concept of MDS
Variable reduction using MDS
Task 4.3
Evaluate the concept of panel data regression and implement panel data methods.
4.1 Evaluate the need for cluster analysis.
4.2 Obtain clusters using suitable methods.
4.3 Interpret cluster solutions and analyse the use of clusters for business strategies.
Assessment Criteria
Concept of cluster analysis
Hierarchical cluster analysis methods (linkage methods)
Using dendrogram to estimate optimum number of clusters
k-means clustering methods
Using k-means runs function in R and Python to find optimum number of k
Task 4.4
Appraise classification methods including Naïve Bayes and the support vector machine algorithm.
4.1 Evaluate different methods of classification and the performance of classifiers.
4.2 Design optimum classification rules to achieve minimum error rates.
Assessment Criteria
Bayes theorem and its applications
Constructing classifier using Naïve Bayes method
Concept of Hyperlane
Support vector machine algorithm
Comparison with Binary Logistic Regression
Task 4.5
Apply decision tree and random forest algorithms to classification and regression problems.
4.1 Use decision trees for classification and regression problems in comparison with classical methodologies.
4.2 Analyse concepts of bootstrapping and bagging.
4.3 Apply the random forest method in a range of business and social contexts .
Assessment Criteria
Basics of Decision Tree
Concept of CART
CHAID algorithm
ctree function in R
Bootstrapping and bagging
Random forest algorithm
Task 5.1
Analyse Market Baskets and apply neural networks to classification problems.
5.1 Analyse transactions data for possible associations and derive baskets of associated products.
5.2 Apply neural networks to a classification problem in domains such as speech recognition, image recognition and document categorisation.
Assessment Criteria
Definitions of support, confidence and lift
Aprioiri algorithm for market basket analysis
Neural network problem for classification problem
Task 5.2
Perform text mining on social media data.
5.1 Appraise the concepts and techniques used in text mining.
5.2 Analyse unstructured data and perform sentiment analysis of Twitter data to identify the positive, negative or neutral tone of the text.
Assessment Criteria
What is text mining?
Term Document Matrix
Word cloud
Establishing connection with Twitter using twitteR package and Tweepy in Python
Task 5.3
Develop web pages using the SHINY package.
5.1 Build interpretable dashboards using the SHINY package.
5.2 Host standalone applications on a web page to present the results of data analysis.
Assessment Criteria
Introduction to SHINY
Introduction to R Markdown
Build dashboards
Host standalone apps on a webpage or embed them in R Markdown documents or build dashboards.
Task 5.4
Apply the Hadoop framework in Big Data Analytics.
5.1 Evaluate core concepts of Hadoop.
5.2 Appraise applications of Big Data Analytics in various industries.
5.3 Evaluate the use of the HADOOP platform for performing Big Data Analytics.
Assessment Criteria
What is Big Data?
Features of Big Data (Volume, Velocity and Variety)
Big Data in different industries (Healthcare, Telecom, etc.)
HADOOP architecture
Introduction to R HADOOP package
Task 5.5
Evaluate the fundamental concepts of artificial intelligence.
5.1 Build a simple AI model using common machine learning algorithms that support business analysis and decision-making. In comparison with traditional assumptions from business theory.
Assessment Criteria
What is AI and Theory behind AI
What is Q learning
The Monte Carlo theory
Task 6.1
Use SQL programming for data analysis.
6.1 Evaluate core SQL for data analytics.
6.2 Carry out data wrangling and analysis in SQL to uncover insights in underutilized data.
Assessment Criteria
SQL programming Basics
Data Wrangling and analysis
Text mining of Twitter data
Task 6.2
Evaluate the concept of transformation and the key technologies that drive it.
6.1 Analyse the technologies that underpin digital transformation.
6.2 Assess the managerial challenges associated with implementing digital transformation successfully.
Assessment Criteria
Fundamentals of Cloud Computing
Compare and contrast cloud computing with traditional computing models
Task 6.3
Assess the strategic impact of the application of Big Data and Artificial Intelligence on business organisations.
6.1 Evaluate theories of strategy and their application to the digital economy and business.
6.2 Analyse examples of the application of Artificial intelligence on business operations or strategy.
Assessment Criteria
Software as a Service
Platform as a Services
Infrastructure as a Service
Business impact of Cloud Computing
Historical development of Artificial Intelligence
Task 6.4
Appraise theories of innovation and distinguish between disruptive and incremental change.
6.1 Evaluate theories of disruptive innovation and how they explain the impact of innovation on industries.
6.2 Evaluate the managerial challenges of promoting and implementing innovation within organizations.
Assessment Criteria
Vs of data – Volume, velocity, variety, veracity and value
Christensen’s theory of disruptive innovation
Task 6.5
Evaluate ethics practices within organisations and how they relate to issues in Data Science.
6.1 Assess the role that codes of ethics play in the operation and sustainability of organisations.
6.2 Evaluate the importance of reporting and disclosure for ethical practice.
Assessment Criteria
Ethical dilemmas and issues in Artificial Intelligence and Big Data
Distinguished |
Excellent |
Good |
Proficient |
Basic |
Marginal |
Unacceptable |
|
Criteria |
80+ |
70 |
60 |
50 |
40 |
30 |
0 |
Content (alignment with assessment criteria) |
Extensive evaluation and synthesis of ideas; includes substantial original thinking |
Comprehensive critical evaluation and synthesis of ideas; includes coherent original thinking |
Adequate evaluation and synthesis of key ideas beyond basic descriptions; includes original thinking |
Describes main ideas with evidence of evaluation; includes some original thinking |
Describes some of the main ideas but omits some concepts; limited evidence of evaluation; confused original |
Largely incomplete description of main issues; misses key concepts; no original thinking |
Inadequate information or containing information not relevant to the topic |
thinking |
|||||||
In-depth, detailed |
Clear and relevant application of theory; fully integrates literature to support ideas and concepts |
Appropriate |
Adequate |
Confused application of theory; does not use literature for support |
Little or no evidence of application of theory and relevant literature |
||
Application of |
and relevant |
application of |
application of |
Limited application |
|||
Theory and |
application of |
theory; integrates |
theory; uses |
of theory; refers to |
|||
Literature |
theory; expertly |
literature to support |
literature to support |
literature but may |
|||
integrates literature |
ideas and concepts |
ideas and concepts |
not use it |
||||
to support ideas and concept |
consistently |
||||||
Knowledge and Understanding |
Extensive depth of understanding and exploration beyond key principles and concepts |
Comprehensive knowledge and depth of understanding key principles and concepts |
Sound understanding of principles and concepts |
Basic Knowledge and understanding of key concepts and principles |
Limited and superficial knowledge and understanding of key concepts and principles |
Confused or inadequate knowledge and understanding of key concepts and principles |
Little or no evidence of knowledge or understanding of key concepts and principles |
Logical, coherent |
Somewhat weak presentation; errors in mechanics and syntax may interfere with meaning |
||||||
and polished |
Logical, coherent |
Logical structure to |
Confused |
Illogical |
|||
presentation |
presentation |
presentation; makes |
Orderly |
presentation; errors |
presentation lacking |
||
Presentation and |
exceeding |
demonstrating |
few errors in |
presentation; minor |
in mechanics and |
cohesion; contains |
|
Writing Skills |
expectations at this |
mastery; free from |
mechanics and |
errors in mechanics |
syntax often |
significant errors |
|
level; free from |
errors in mechanics |
syntax which do not |
and syntax |
interfere with |
that interfere with |
||
errors in mechanics |
and syntax |
prohibit meaning |
meaning |
meaning |
|||
and syntax |
|||||||
Referencing |
Advanced use of in- text citation and references |
Mastery of in-text citation and referencing |
Appropriate use of in-text citation and referencing |
Adequate use of in- text citation and referencing |
Limited use of in- text citation and referencing |
Inadequate use of citation and referencing |
Little or no evidence of appropriate referencing or use of source |
Page 8 of 11
Page 9 of 11
Directions:
For each of the criteria listed in the first column, circle one box in the corresponding column to the right which best reflects the student’s work on this particular assessment activity (e.g., project, presentation, essay).
Provide specific feedback to a student about each of the criteria scores he/she earned by writing comments and suggestions for improvement in the last row titled “Instructor’s comments.”
To arrive at a mark, total the boxes and divide by 5 to arrive at final mark.
Example:
-
Distinguished
Excellent
Good
Proficient
Basic
Marginal
Unacceptable
Range
80-100
70-79
60-69
50-59
40-49
35-39
0-34
-
Criteria
Score
Content
50
Application of Theory and Literature
40
Knowledge and Understanding
50
Presentation/Writing Skills
40
Referencing
40
Total Score
220/5 = 44, Basic
Page 10 of 11
HEAD OFFICE
7 Acorn Business Park Commercial Gate, Nottingham Nottinghamshire
NG18 1EX
LONDON OFFICE
Golden Cross House
8 Duncannon Street, London WC2N 4JF [email protected]
Copyright 2019 Qualifi Ltd
Page 11 of 11
QUALIFI ASSESSMENT DOCUMENT
-
Qualification
Qualifi Level 7 Diploma in Data Science
Qualification No (RQF) Unit Name
Unit Reference
No of Credits
J/618/4970
Exploratory Data Analysis
DS01
20 Credits
Introduction
Prior to attempting this coursework assignment, learners must familiarise themselves with the following policies:
Centre Specification
o Can be found at https://qualifi.net/qualifi-level-7-diploma-data-science/
Qualifi Quality Assurance Standards
Qualifi Quality Policy Statement
Plagiarism and Collusion
In submitting the assignment Learner’s must complete a statement of authenticity confirming that the work submitted for all tasks is their own. The statement should also include the word count.
Your accredited study centre will direct you to the appropriate software that checks the level of similarity. Qualifi recommends the use of https://www.turnitin.com as a part of the assessment.
Plagiarism and collusion are treated very seriously. Plagiarism involves presenting work, excerpts, ideas or passages of another author without appropriate referencing and attribution.
Collusion occurs when two or more learners submit work which is so alike in ideas, content, wording and/or structure that the similarity goes beyond what might have been mere coincidence
Please familiarise yourself on Qualifi’s Malpractice and Maladministration policy, where you can find further information
Referencing
A professional approach to work is expected from all learners. Learners must therefore identify and acknowledge ALL sources/methodologies/applications used.
The learner must use an appropriate referencing system to achieve this. Marks are not awarded for the use of English; however, the learner must express ideas clearly and ensure that appropriate terminology is used to convey accuracy in meaning.
Qualifi recommends using Harvard Style of Referencing throughout your work.
Appendices
You may include appendices to support your work, however appendices must only contain additional supporting information, and must be clearly referenced in your assignment.
You may also include tables, graphs, diagrams, Gantt chart and flowcharts that support the main report should be incorporated into the back of the assignment report that is submitted.
Any published secondary information such as annual reports and company literature, should be referenced in the main text of the assignment, in accordance of Harvard Style Referencing, and referenced at the end of the assignment.
Confidentiality
Where a Learner is using organisational information that deals with sensitive material or issues, they must seek the advice and permission from that organisation about its inclusion.
Where confidentiality is an issue, Learners are advised to anonymise their assignment report so that it cannot be attributed to that particular organisation.
Word Count Policy
Learners must comply with the required word count, within a margin of +10%. These rules exclude the index, headings, tables, images, footnotes, appendices and information contained within references and bibliographies.
When an assessment task requires learners to produce presentation slides with supporting notes, the word count applies to the supporting notes only.
Submission of Assignments
All work to be submitted on the due date as per Centre’s advice.
All work must be submitted in a single electronic document (.doc file), or via Turnitin, where applicable.
This should go to the tutor and Centre Manager/Programme Director, plus one hard copy posted to the Centre Manager (if required)
Marking and grades
Qualifi uses a standard marking rubric for all assignments, and you can find the details at the end of this document.
Unless stated elsewhere, Learners must answer all questions in this document.
Assignment Question
Task 1.1
Handle and manage multiple datasets within R and Python environments.
1.1 Work smoothly in R and Python development environments.
1.2 Import and export data sets and create data frames within R and Python in accordance with instructions.
1.3 Sort, merge, aggregate and append data sets in accordance with instructions.
Assessment Criteria
Learning UI about basics rule of programming in both R and Python
Create and import external datasets in R and python
Export R data frames into external flat files
Data Management in R and Python (Sort, merge, aggregate and subset)
Task 1.2
Use measures of central tendency to summarize data and assess both the symmetry and variation in the data.
1.1 Differentiate between variable types and measurement scales.
1.2 Calculate the most appropriate (mean, median or mode etc.) measure of central tendency based on variable type.
1.3 Compare variation in two datasets using the coefficient of variation.
1.4 Assess symmetry of data using measures of skewness
Assessment Criteria
Introduction to basic concepts of Statistics, such as measures of central tendency, variation, skewness, kurtosis
Task 1.3
Present and summarise distributions of data and the relationships between variables graphically.
1.1 Select the most appropriate graph to present the data.
1.2 Assess distribution using Box-Plot and Histogram.
1.3 Visualize bivariate relationships using scatter-plots.
1.4 Present time-series data using motion charts.
Assessment Criteria
Frequency tables crosstabs and bivariate correlation analysis
Data visualization: what and why? Grammar of graphics, handling data for visualization
Commonly used charts and graphs using ggplot2 package in R and matplotlib in python
Advanced graphics in R and Python Data Management in R and Python (Sort, merge, aggregate and subset)
Data Management in R and Python (Sort, merge, aggregate and subset)
Task 1.4
Evaluate standard discrete and standard continuous distributions.
Analyse the statistical distribution of a discrete random variable.
Calculate probabilities using R for Binomial and Poisson Distribution.
Fit Binomial and Poisson distributions to observed data.
Evaluate the properties of Normal and Log Normal distributions.
Calculate probabilities using R for normal and Log normal distributions.
Fit normal, Log normal and exponential distributions to observed data.
1.7 Evaluate the concept of sampling distribution (t, F and Chi Square).
Assessment Criteria
Concept of random variables and statistical distribution
Discrete vs. Continuous Random Variables
t tests (one sample, independent samples, paired sample)
Standard discrete distributions-Bernoulli, Binomial and Poisson
Using R to calculate probabilities
Fitting of discrete distributions to observed data
Standard continuous distributions-Normal, Log Normal, Exponential
Introduction to sampling distributions
Task 1.5
Formulate research hypotheses and perform hypothesis testing.
1.1 Write R and Python programmes that evaluate appropriate hypothesis tests
1.2 Draw statistical inference using output in R.
1.3 Translate research problems into statistical hypotheses.
1.4 Assess the most appropriate statistical test for a hypothesis
Assessment Criteria
Statistical Hypothesis Testing-concepts and terminology
Parameter, test statistics, level of significance, power, critical region
Parametric vs. non-Parametric Tests
Z tests for proportions (single and independent samples)
Non-parametric tests (Mann-Whitney U, Wilcoxon’s signed rank)
Tests for Normality, Q-Q plot
Task 2.1
Analyse the concept of variance (ANOVA) and an select an appropriate ANOVA or ANCOVA model.
2.1 Define variable, factor and level for a given research problem.
2.2 Evaluate the sources of variation, explained variation and unexplained variation.
2.3 Define a linear model for ANOVA/ANCOVA.
2.4 Confirm the validity of assumption based on definitions and analysis of variation.
2.5 Perform analysis using R and Python programs to confirm validity of assumptions.
2.6 Draw inferences from statistical analysis of the research problem.
Assessment Criteria
What is analysis of variance?
Definitions: Variable, factor, levels
One Way Analysis of Variance
Two Way Analysis of Variance (including interaction effects)
Multi Way Analysis of Variance
Analysis of Covariance
Kruskal-Wallis Test
Friedman Test
Task 2.2
Carry out global and individual testing of parameters used in defining predictive models.
2.1 Evaluate dependent variables and predictors.
2.2 Develop linear models using the lm function in R and the .ols function in Python.
2.3 Interpret signs and values of estimated regression coefficients.
2.4 Interpret output of global testing using F distributions.
2.5 Identify significant and insignificant variables.
Assessment Criteria
Concept of random variables and statistical distribution
Concept of a statistical model
Estimation of model parameters using Least Square Method
Interpreting regression coefficients
Assessing the goodness of fit of a model
Global hypothesis testing using F distribution
Individual testing using t distributions
Task 2.3
Validate assumptions in multiple linear regression.
2.1 Resolve multicollinearity problems.
2.2 Revise a model after resolving the problem.
2.3 Assess the performance of the ridge regression model.
2.4 Perform residual analysis – graphically & using statistical tests to analyse results.
2.5 Resolve problems of non-normality of errors and heteroscedasticity.
Assessment Criteria
Concept of Multicollinearity
Calculating Variance Inflation Factors
Resolving problem by dropping variables
Ridge regression method
Stepwise regression as a strategy
Residual analysis
Shapiro Wilk test, K-S test and Q-Q plot for residuals
White’s test and Breusch-Pagan Test
Partitioning data using the caret package
Task 2.4
Validate models via data partitioning, out of sample testing and cross-validation.
2.1 Develop models and implement them on testing data in accordance with the specification.
2.2 Evaluate the stability of the models using k-fold cross validation.
2.3 Evaluate influential observations using Cook’s distance and hat matrix.
Assessment Criteria
Model development on training data
Model validation on testing data using R squared and RMSE
Concept of k-fold cross validation
Performing k-fold cross validation using the caret package
Identifying influential observations
Task 2.5
Develop models using binary logistic regression and assess their performance.
Evaluate when to use Binary Linear Regression correctly.
2.2 Develop realistic models using functions in R and Python.
2.3 Interpret output of global testing using Linear Regression Testing in order to assess the results.
2.4 Perform out of sample validation that tests predictive quality of the model.
Assessment Criteria
Model definition and parameter estimation
Estimation of model parameters using MLE
Interpreting regression coefficients and odds ratio
Assessing goodness of fit of the model
Global hypothesis testing using LRT distribution
Individual testing using Wald’s test
Task 3.1
Develop applications of multinomial logistic regression and ordinal logistic regression.
3.1 Select method for modelling categorical variable.
3.2 Develop models for nominal and ordinal scaled dependent variable in R and Python correctly.
Assessment Criteria
Classification table
ROC curve
K-S Statistic
Multinomial and Ordinal Logistic Regression – model building and parameter estimation
Interpretation of regression coefficients
Classification table and deviance test
Task 3.2
Develop generalised linear models and carry out survival analysis and Cox regression.
3.1 Evaluate the concept of generalised linear models.
3.2 Apply the Poisson regression model and negative binomial regression to count data correctly.
3.3. Model ‘time to event’ variable using cox regression.
Assessment Criteria
Concept of GLM and link function and .GLM
Poisson Regression
Negative Binomial Regression
Survival Analysis Introduction
Cox Regression
Task 3.3
Assess the concepts and uses of time series analysis and test for stationarity in time series data.
3.1 Create time series object in R and Python correctly including decomposing time series and assessing different components.
Assess whether a time series is stationary.
Transform non-stationary time series data into stationary time series data.
Assessment Criteria
Components of time series
Seasonal decomposition
Trend analysis
Auto-correlogram
Partial auto-correlogram
Dickey-Fuller test
Converting non-stationary time series data into stationary time series data
Task 3.4
Validate ARIMA (Auto Regressive Integrated Moving Average) models and use estimation.
3.1 Identify p, d and q of ARIMA model using ACF (auto-correlation function) and a PACF (partial auto-correlation function) to describe how well values are related.
3.2 Develop ARIMA models using R and python and evaluate whether errors follow the white noise process.
3.3 Finalize the model and forecast n-period ahead to make accurate predictions.
Assessment Criteria
Concepts of AR, MA and ARIMA models
Model identification using ACF and PACF
Parameter estimation
Residual analysis (testing for white noise process)
Selection of optimal model
Task 3.5
Implement panel data regression methods.
3.1 Evaluate the concept of panel data regression.
3.2 Analyse the features of panel data.
3.3 Build panel data regression models in a range of contexts.
3.4 Evaluate the difference between fixed effect and random effect models.
Assessment Criteria
What is Panel data?
Need for different models for Panel data
Panel data regression methods
What is Panel data?
Need for different models for Panel data
Panel data regression methods
Task 4.1
Define Principal Component Analysis (PCA) and its derivations and assess their application.
4.1 Evaluate the need for data reduction.
4.2 Perform principal component analysis and develop scoring models using R and python to minimise data loss and improve interpretability of data.
4.3 Resolve multi-collinearity using Principal Component Regression.
Assessment Criteria
Concept of Data reduction
Definition of first, second, … ph principal component
Deriving principal component using Eigenvectors
Deciding optimum number of principal components
Developing scoring models using PCA
Principal component regression
Task 4.2
Understand hierarchical and non-hierarchical cluster analysis and assess their outputs.
4.1 Perform data reduction and derive interpretable factors and use factor scores to interpret the data set.
4.2 Obtain a brand perception map using multi-dimensional scaling.
Assessment Criteria
Orthogonal factor model
Estimation of loading matrix
Interpreting factor solution
Deciding optimum number of factors
Using factor scores for further analysis
Factor rotation
Concept of MDS
Variable reduction using MDS
Task 4.3
Evaluate the concept of panel data regression and implement panel data methods.
4.1 Evaluate the need for cluster analysis.
4.2 Obtain clusters using suitable methods.
4.3 Interpret cluster solutions and analyse the use of clusters for business strategies.
Assessment Criteria
Concept of cluster analysis
Hierarchical cluster analysis methods (linkage methods)
Using dendrogram to estimate optimum number of clusters
k-means clustering methods
Using k-means runs function in R and Python to find optimum number of k
Task 4.4
Appraise classification methods including Naïve Bayes and the support vector machine algorithm.
4.1 Evaluate different methods of classification and the performance of classifiers.
4.2 Design optimum classification rules to achieve minimum error rates.
Assessment Criteria
Bayes theorem and its applications
Constructing classifier using Naïve Bayes method
Concept of Hyperlane
Support vector machine algorithm
Comparison with Binary Logistic Regression
Task 4.5
Apply decision tree and random forest algorithms to classification and regression problems.
4.1 Use decision trees for classification and regression problems in comparison with classical methodologies.
4.2 Analyse concepts of bootstrapping and bagging.
4.3 Apply the random forest method in a range of business and social contexts .
Assessment Criteria
Basics of Decision Tree
Concept of CART
CHAID algorithm
ctree function in R
Bootstrapping and bagging
Random forest algorithm
Task 5.1
Analyse Market Baskets and apply neural networks to classification problems.
5.1 Analyse transactions data for possible associations and derive baskets of associated products.
5.2 Apply neural networks to a classification problem in domains such as speech recognition, image recognition and document categorisation.
Assessment Criteria
Definitions of support, confidence and lift
Aprioiri algorithm for market basket analysis
Neural network problem for classification problem
Task 5.2
Perform text mining on social media data.
5.1 Appraise the concepts and techniques used in text mining.
5.2 Analyse unstructured data and perform sentiment analysis of Twitter data to identify the positive, negative or neutral tone of the text.
Assessment Criteria
What is text mining?
Term Document Matrix
Word cloud
Establishing connection with Twitter using twitteR package and Tweepy in Python
Task 5.3
Develop web pages using the SHINY package.
5.1 Build interpretable dashboards using the SHINY package.
5.2 Host standalone applications on a web page to present the results of data analysis.
Assessment Criteria
Introduction to SHINY
Introduction to R Markdown
Build dashboards
Host standalone apps on a webpage or embed them in R Markdown documents or build dashboards.
Task 5.4
Apply the Hadoop framework in Big Data Analytics.
5.1 Evaluate core concepts of Hadoop.
5.2 Appraise applications of Big Data Analytics in various industries.
5.3 Evaluate the use of the HADOOP platform for performing Big Data Analytics.
Assessment Criteria
What is Big Data?
Features of Big Data (Volume, Velocity and Variety)
Big Data in different industries (Healthcare, Telecom, etc.)
HADOOP architecture
Introduction to R HADOOP package
Task 5.5
Evaluate the fundamental concepts of artificial intelligence.
5.1 Build a simple AI model using common machine learning algorithms that support business analysis and decision-making. In comparison with traditional assumptions from business theory.
Assessment Criteria
What is AI and Theory behind AI
What is Q learning
The Monte Carlo theory
Task 6.1
Use SQL programming for data analysis.
6.1 Evaluate core SQL for data analytics.
6.2 Carry out data wrangling and analysis in SQL to uncover insights in underutilized data.
Assessment Criteria
SQL programming Basics
Data Wrangling and analysis
Text mining of Twitter data
Task 6.2
Evaluate the concept of transformation and the key technologies that drive it.
6.1 Analyse the technologies that underpin digital transformation.
6.2 Assess the managerial challenges associated with implementing digital transformation successfully.
Assessment Criteria
Fundamentals of Cloud Computing
Compare and contrast cloud computing with traditional computing models
Task 6.3
Assess the strategic impact of the application of Big Data and Artificial Intelligence on business organisations.
6.1 Evaluate theories of strategy and their application to the digital economy and business.
6.2 Analyse examples of the application of Artificial intelligence on business operations or strategy.
Assessment Criteria
Software as a Service
Platform as a Services
Infrastructure as a Service
Business impact of Cloud Computing
Historical development of Artificial Intelligence
Task 6.4
Appraise theories of innovation and distinguish between disruptive and incremental change.
6.1 Evaluate theories of disruptive innovation and how they explain the impact of innovation on industries.
6.2 Evaluate the managerial challenges of promoting and implementing innovation within organizations.
Assessment Criteria
Vs of data – Volume, velocity, variety, veracity and value
Christensen’s theory of disruptive innovation
Task 6.5
Evaluate ethics practices within organisations and how they relate to issues in Data Science.
6.1 Assess the role that codes of ethics play in the operation and sustainability of organisations.
6.2 Evaluate the importance of reporting and disclosure for ethical practice.
Assessment Criteria
Ethical dilemmas and issues in Artificial Intelligence and Big Data
Distinguished |
Excellent |
Good |
Proficient |
Basic |
Marginal |
Unacceptable |
|
Criteria |
80+ |
70 |
60 |
50 |
40 |
30 |
0 |
Content (alignment with assessment criteria) |
Extensive evaluation and synthesis of ideas; includes substantial original thinking |
Comprehensive critical evaluation and synthesis of ideas; includes coherent original thinking |
Adequate evaluation and synthesis of key ideas beyond basic descriptions; includes original thinking |
Describes main ideas with evidence of evaluation; includes some original thinking |
Describes some of the main ideas but omits some concepts; limited evidence of evaluation; confused original |
Largely incomplete description of main issues; misses key concepts; no original thinking |
Inadequate information or containing information not relevant to the topic |
thinking |
|||||||
In-depth, detailed |
Clear and relevant application of theory; fully integrates literature to support ideas and concepts |
Appropriate |
Adequate |
Confused application of theory; does not use literature for support |
Little or no evidence of application of theory and relevant literature |
||
Application of |
and relevant |
application of |
application of |
Limited application |
|||
Theory and |
application of |
theory; integrates |
theory; uses |
of theory; refers to |
|||
Literature |
theory; expertly |
literature to support |
literature to support |
literature but may |
|||
integrates literature |
ideas and concepts |
ideas and concepts |
not use it |
||||
to support ideas and concept |
consistently |
||||||
Knowledge and Understanding |
Extensive depth of understanding and exploration beyond key principles and concepts |
Comprehensive knowledge and depth of understanding key principles and concepts |
Sound understanding of principles and concepts |
Basic Knowledge and understanding of key concepts and principles |
Limited and superficial knowledge and understanding of key concepts and principles |
Confused or inadequate knowledge and understanding of key concepts and principles |
Little or no evidence of knowledge or understanding of key concepts and principles |
Logical, coherent |
Somewhat weak presentation; errors in mechanics and syntax may interfere with meaning |
||||||
and polished |
Logical, coherent |
Logical structure to |
Confused |
Illogical |
|||
presentation |
presentation |
presentation; makes |
Orderly |
presentation; errors |
presentation lacking |
||
Presentation and |
exceeding |
demonstrating |
few errors in |
presentation; minor |
in mechanics and |
cohesion; contains |
|
Writing Skills |
expectations at this |
mastery; free from |
mechanics and |
errors in mechanics |
syntax often |
significant errors |
|
level; free from |
errors in mechanics |
syntax which do not |
and syntax |
interfere with |
that interfere with |
||
errors in mechanics |
and syntax |
prohibit meaning |
meaning |
meaning |
|||
and syntax |
|||||||
Referencing |
Advanced use of in- text citation and references |
Mastery of in-text citation and referencing |
Appropriate use of in-text citation and referencing |
Adequate use of in- text citation and referencing |
Limited use of in- text citation and referencing |
Inadequate use of citation and referencing |
Little or no evidence of appropriate referencing or use of source |
Page 8 of 11
Page 9 of 11
Directions:
For each of the criteria listed in the first column, circle one box in the corresponding column to the right which best reflects the student’s work on this particular assessment activity (e.g., project, presentation, essay).
Provide specific feedback to a student about each of the criteria scores he/she earned by writing comments and suggestions for improvement in the last row titled “Instructor’s comments.”
To arrive at a mark, total the boxes and divide by 5 to arrive at final mark.
Example:
-
Distinguished
Excellent
Good
Proficient
Basic
Marginal
Unacceptable
Range
80-100
70-79
60-69
50-59
40-49
35-39
0-34
-
Criteria
Score
Content
50
Application of Theory and Literature
40
Knowledge and Understanding
50
Presentation/Writing Skills
40
Referencing
40
Total Score
220/5 = 44, Basic
Page 10 of 11
HEAD OFFICE
7 Acorn Business Park Commercial Gate, Nottingham Nottinghamshire
NG18 1EX
LONDON OFFICE
Golden Cross House
8 Duncannon Street, London WC2N 4JF [email protected]
Copyright 2019 Qualifi Ltd
Page 11 of 11