Data Wrangling - Australia Assignments

School of Computing, Napier University
Assessment Brief

1. Module number	SET11121 002
2. Module title	Data Wrangling
3. Module leader	Yanchao Yu
4. Tutor with responsibility for this Assessment	Yanchao Yu ([email protected]) Jeff Mitchell ([email protected])
5. Assessment	Coursework
6. Weighting	70% of module assessment
7. Size and/or time limits for assessment	Up to 1000 words. You can additionally include an unlimited number of tables, figures and a reference list.
8. Deadline of submission Your attention is drawn to the penalties for late submission	15/04/22 – 3 pm UK time
9. Arrangements for submission	Your Coursework must be submitted via Moodle. Further submission instructions are included in the attached specification and on Moodle
10. Assessment Regulations	All assessments are subject to the University Regulations.
11. The requirements for the assessment	See Attached
12. Special instructions	See Attached
13. Return of work	Feedback and marks will be provided within four weeks of submission.
14. Assessment criteria	Your coursework will be marked using the marking sheet attached as Appendix A. This specifies the criteria that will be used to mark your work. Further discussion of criteria is also included in the coursework specification attached.

SET11121 002 – Data Wrangling
Assessment Brief
The assignment aims to cover the learning outcomes specified for the module:
LO1: Analyse the concepts and process of data analysis, including pre-processing and preparation
of data.
LO2: Analyse and evaluate modelling methods and techniques in data analysis.
LO3: Integrate data analysis algorithms to conduct data analysis and visualisation.
LO4: Critically interpret and evaluate results generated by analysis techniques.
LO5: Integrate specialised techniques for dealing with heterogeneous data sets.
The goal of this assignment is to use machine learning approaches to solve one of the two tasks
described below.
Task 1: Computer Vision
Using the provided dataset (see below), develop a deep neural network (i.e. a Convolutional
Neural Network) for the image classification problem. Your proposed model must be a
Convolutional Neural Network with an architecture proposed by you. You should compare the
performance (i.e. classification accuracy) against a pre-trained model already provided within
Keras. Specifically:
1. Create a convolutional neural network (CNN). You must motivate the choice of CNN
architecture.
2. Provide evidence of parameters tuning, such as learning rate, filter sizes, fully-connected
layers.
3. Provide evidence of a comparison of your model, using appropriate evaluation metrics of
your choice, against one of the predefined models within Keras. You MUST choose one
from this list.
4. Write a report outlining your solution, including a description of the model architecture
(including a graphical representation of the architecture), evidence of parameters tuning
etc, as well as the evaluation results (including relevant tables and graphs) and critical
evaluation/reflection.
Data
For this assignment you should use the CIFAR-10 dataset which can be obtained by the Keras
library as follows:
from keras.datasets import cifar10
The goal of this exercise is not to produce a state-of-the-art model. If your model performs poorly
with respect to the predefined model you have chosen, do not worry—this is not what we are
testing. It should however be appropriately motivated and evaluated (you will be tested on those
aspects).
Task 2: Natural Language Processing
SET11121 002 – Data Wrangling
Using the provided dataset on Moodle, develop a machine learning algorithm of your choice for
text classification. You can use any combination of text representation and machine learning
approaches. Specifically:
1. Choose and implement a text representation technique (such as a bag of words, word
embeddings etc.). Justify your choice.
2. Using the text representation from (1), build a machine learning model for text classification
(e.g. decision tree, neural network etc.). Justify your choice.
3. Evaluate your model using the appropriate metrics of your choice.
4. Write a report outlining your solution, including a description of the machine learning
setup (including a description and motivation of the text representation technique, an
appropriate graphical representation if relevant), evidence of parameters tuning (if
relevant), as well as the evaluation results (including relevant tables and graphs) and
critical evaluation/reflection.
Data
For this assignment, you should use the dataset provided on Moodle.
Again, the goal of this task is not to produce a state-of-the-art model. If your chosen model
performs poorly by your selected metrics, do not worry—this is not what we are testing. It should
however be appropriately motivated and evaluated (you will be tested on those aspects).
Tips and Clarifications
– If you are struggling to make something work with the volume of data present, you can
subsample (for instance, randomly pick a proportion of the dataset).
– You must use Python and its libraries to tackle this task. You are strongly encouraged to make
use of existing libraries for model building and evaluation, rather than writing your own unless
you specifically need to do something with no library support (e.g. if you want to do feature
engineering).
Your report can include images, such as your neural network architecture design (for Task 1 or for
Task 2 if you choose a deep learning approach), and plots. For architecture design, many programs
can be used. You can draw your neural network architecture with PowerPoint.
Deadline: Friday 15 April at 3 pm (UK time).
You will submit:
1. One .pdf file of up to 1000 words excluding tables, references etc as outlined earlier.
2. The code of your solution as a notebook. If you do any pre-processing to the data, please also
include the script you use to do this (or a list of the commands run).
Marking:
Both tasks will be marked as follows:
25% Method/Model;
10% Evaluation
For the report [35%]:
5% Structure;
SET11121 002 – Data Wrangling
15% Content;
15% Criticality/Discussion.
See Appendix A for more explanations.
Feedback:
Apart from the markings, you will also receive text feedback from Moodle 3 – 5 weeks after the
submission deadline. The feedback will contain a further explanation about what you have done
good and what needs to improve, corresponding to your marks.
Late submission policy
Coursework submitted after the agreed deadline will be marked at a maximum of 40%.
Coursework submitted over five working days after the agreed deadline will be given 0%.
Extensions
If you require an extension, please contact the module leader before the deadline. Extensions are
only provided for exceptional circumstances and evidence may be required. See the Fit to Sit
regulations for more details.
Plagiarism
Plagiarised work will be dealt with according to the university’s guidelines (Please read –
especially if this is the first time in a UK university): http://www2.napier.ac.uk/ed/plagiarism/
Appendix A: Marking Scheme
No
Submissi
on
Very poor Inadequate Adequate Good Very good Excellent Outstanding

B1
Method /
Model
25%

No work
submitted
Code with
bugs and
algorithm
/model not
well
described
Code with
bugs but
algorithm
/model well
described
Code with a
minor bug
but algorithm
/model not
well
described and
justified
Code with
a minor
bug but
algorithm
/model
well
described
and
justified
Code
without
bugs but
algorithm
/model not
described
or justified
Code
without
bugs but
algorithm
/model not
described
and justified
in great
detail
Code without
bugs and
algorithm
/model
described
and justified
in detail

B2
Evaluation
10%

No work
submitted
Not
appropriate
evaluation
metric
chosen
Neither the
evaluation
setup nor
the results
are
described
appropriatel
y
Evaluation
setup is not
justified but
almost
correctly
executed and
results are
mentioned
Evaluation
setup is
not
justified
but
correctly
executed
and results
are
mentioned
Evaluation
setup is
somewhat
justified
and results
are
somewhat
mentioned
and
discussed
Evaluation
setup is
somewhat
justified but
results fully
described
and
discussed
Evaluation
setup is
justified and
results fully
described
and
discussed

SET11121 002 – Data Wrangling

B3
Report /
Reflection
35%

No work
submitted
Poor report
that misses
essential
parts, and
results/meth
od not
critically
described
Poor report
that misses
essential
parts, and
results/met
hod and not
adequate
reflection
provided
Results
adequately
described and
analysed,
although the
report would
benefit from
some further
critical
analysis.
Good
report and
reflection.
However,
more
explanatio
ns would
be needed.
Good
report,
reflection
and
explanation
of results.
Excellent
report,
reflection
and
explanation
of results,
providing
detailed
analysis of
results.
Excellent
report,
reflection and
excellent
insights of
results
(including
outlining
limitations
and
strengths).

SET11121 002 – Data Wrangling

Tags: Assignment Help for Students, Assignment Help Free, Assignment Help Online Free, Assignment Help Websites, assignmenthelp, AssignmentHelpOnline, BestAssignmentHelp, myassignmenthelp, OnlineAssignmentHelp, Student Assignment Help, University Assignment Help