School of Computing, Napier University
Assessment Brief
1. Module number | SET11121 002 |
2. Module title | Data Wrangling |
3. Module leader | Yanchao Yu |
4. Tutor with responsibility for this Assessment |
Yanchao Yu ([email protected]) Jeff Mitchell ([email protected]) |
5. Assessment | Coursework |
6. Weighting | 70% of module assessment |
7. Size and/or time limits for assessment |
Up to 1000 words. You can additionally include an unlimited number of tables, figures and a reference list. |
8. Deadline of submission Your attention is drawn to the penalties for late submission |
15/04/22 – 3 pm UK time |
9. Arrangements for submission |
Your Coursework must be submitted via Moodle. Further submission instructions are included in the attached specification and on Moodle |
10. Assessment Regulations | All assessments are subject to the University Regulations. |
11. The requirements for the assessment |
See Attached |
12. Special instructions | See Attached |
13. Return of work | Feedback and marks will be provided within four weeks of submission. |
14. Assessment criteria | Your coursework will be marked using the marking sheet attached as Appendix A. This specifies the criteria that will be used to mark your work. Further discussion of criteria is also included in the coursework specification attached. |
SET11121 002 – Data Wrangling
Assessment Brief
The assignment aims to cover the learning outcomes specified for the module:
LO1: Analyse the concepts and process of data analysis, including pre-processing and preparation
of data.
LO2: Analyse and evaluate modelling methods and techniques in data analysis.
LO3: Integrate data analysis algorithms to conduct data analysis and visualisation.
LO4: Critically interpret and evaluate results generated by analysis techniques.
LO5: Integrate specialised techniques for dealing with heterogeneous data sets.
The goal of this assignment is to use machine learning approaches to solve one of the two tasks
described below.
Task 1: Computer Vision
Using the provided dataset (see below), develop a deep neural network (i.e. a Convolutional
Neural Network) for the image classification problem. Your proposed model must be a
Convolutional Neural Network with an architecture proposed by you. You should compare the
performance (i.e. classification accuracy) against a pre-trained model already provided within
Keras. Specifically:
1. Create a convolutional neural network (CNN). You must motivate the choice of CNN
architecture.
2. Provide evidence of parameters tuning, such as learning rate, filter sizes, fully-connected
layers.
3. Provide evidence of a comparison of your model, using appropriate evaluation metrics of
your choice, against one of the predefined models within Keras. You MUST choose one
from this list.
4. Write a report outlining your solution, including a description of the model architecture
(including a graphical representation of the architecture), evidence of parameters tuning
etc, as well as the evaluation results (including relevant tables and graphs) and critical
evaluation/reflection.
Data
For this assignment you should use the CIFAR-10 dataset which can be obtained by the Keras
library as follows:
from keras.datasets import cifar10
The goal of this exercise is not to produce a state-of-the-art model. If your model performs poorly
with respect to the predefined model you have chosen, do not worry—this is not what we are
testing. It should however be appropriately motivated and evaluated (you will be tested on those
aspects).
Task 2: Natural Language Processing
SET11121 002 – Data Wrangling
Using the provided dataset on Moodle, develop a machine learning algorithm of your choice for
text classification. You can use any combination of text representation and machine learning
approaches. Specifically:
1. Choose and implement a text representation technique (such as a bag of words, word
embeddings etc.). Justify your choice.
2. Using the text representation from (1), build a machine learning model for text classification
(e.g. decision tree, neural network etc.). Justify your choice.
3. Evaluate your model using the appropriate metrics of your choice.
4. Write a report outlining your solution, including a description of the machine learning
setup (including a description and motivation of the text representation technique, an
appropriate graphical representation if relevant), evidence of parameters tuning (if
relevant), as well as the evaluation results (including relevant tables and graphs) and
critical evaluation/reflection.
Data
For this assignment, you should use the dataset provided on Moodle.
Again, the goal of this task is not to produce a state-of-the-art model. If your chosen model
performs poorly by your selected metrics, do not worry—this is not what we are testing. It should
however be appropriately motivated and evaluated (you will be tested on those aspects).
Tips and Clarifications
– If you are struggling to make something work with the volume of data present, you can
subsample (for instance, randomly pick a proportion of the dataset).
– You must use Python and its libraries to tackle this task. You are strongly encouraged to make
use of existing libraries for model building and evaluation, rather than writing your own unless
you specifically need to do something with no library support (e.g. if you want to do feature
engineering).
Your report can include images, such as your neural network architecture design (for Task 1 or for
Task 2 if you choose a deep learning approach), and plots. For architecture design, many programs
can be used. You can draw your neural network architecture with PowerPoint.
Deadline: Friday 15 April at 3 pm (UK time).
You will submit:
1. One .pdf file of up to 1000 words excluding tables, references etc as outlined earlier.
2. The code of your solution as a notebook. If you do any pre-processing to the data, please also
include the script you use to do this (or a list of the commands run).
Marking:
Both tasks will be marked as follows:
25% Method/Model;
10% Evaluation
For the report [35%]:
5% Structure;
SET11121 002 – Data Wrangling
15% Content;
15% Criticality/Discussion.
See Appendix A for more explanations.
Feedback:
Apart from the markings, you will also receive text feedback from Moodle 3 – 5 weeks after the
submission deadline. The feedback will contain a further explanation about what you have done
good and what needs to improve, corresponding to your marks.
Late submission policy
Coursework submitted after the agreed deadline will be marked at a maximum of 40%.
Coursework submitted over five working days after the agreed deadline will be given 0%.
Extensions
If you require an extension, please contact the module leader before the deadline. Extensions are
only provided for exceptional circumstances and evidence may be required. See the Fit to Sit
regulations for more details.
Plagiarism
Plagiarised work will be dealt with according to the university’s guidelines (Please read –
especially if this is the first time in a UK university): http://www2.napier.ac.uk/ed/plagiarism/
Appendix A: Marking Scheme
No
Submissi
on
Very poor Inadequate Adequate Good Very good Excellent Outstanding
B1 Method / Model 25% |
No work submitted Code with bugs and algorithm /model not well described Code with bugs but algorithm /model well described Code with a minor bug but algorithm /model not well described and justified Code with a minor bug but algorithm /model well described and justified Code without bugs but algorithm /model not described or justified Code without bugs but algorithm /model not described and justified in great detail Code without bugs and algorithm /model described and justified in detail |
B2 Evaluation 10% |
No work submitted Not appropriate evaluation metric chosen Neither the evaluation setup nor the results are described appropriatel y Evaluation setup is not justified but almost correctly executed and results are mentioned Evaluation setup is not justified but correctly executed and results are mentioned Evaluation setup is somewhat justified and results are somewhat mentioned and discussed Evaluation setup is somewhat justified but results fully described and discussed Evaluation setup is justified and results fully described and discussed |
SET11121 002 – Data Wrangling
B3 Report / Reflection 35% |
No work submitted Poor report that misses essential parts, and results/meth od not critically described Poor report that misses essential parts, and results/met hod and not adequate reflection provided Results adequately described and analysed, although the report would benefit from some further critical analysis. Good report and reflection. However, more explanatio ns would be needed. Good report, reflection and explanation of results. Excellent report, reflection and explanation of results, providing detailed analysis of results. Excellent report, reflection and excellent insights of results (including outlining limitations and strengths). |
SET11121 002 – Data Wrangling