Cyber Security Analytics

116 views 9:00 am 0 Comments April 14, 2023

UFCFFY-15-M Cyber Security Analytics
Portfolio Assignment: Worksheet 1
Conduct an investigation on an organisation’s web server
application to identify malicious attack activity using Python data
science libraries
For this task, the company “UWEcyberSolutions” have enlisted your support as a security data analyst. They know that
they have suffered an attack on their web server application, however they are unable to diagnose what has happened
exactly, or which of their users have caused the attack. The company have provided you with their recent log data records,
and you will need to identify any suspicious activities that has occurred in the dataset, based on your knowledge and
understanding of web application security, and report back to the company on your findings.
Dataset: You will be randomly issued a unique dataset based on your UWE username – failure to use the dataset
assigned to your username will result in a zero grade
. Please see the folder *“Portfolio Assignment”* under the
Assignment tab on Blackboard for further detail related to the access and download of the necessary dataset.
Hint: The TryHackMe room “HTTP in detail” may help your research for what to investigate within this large dataset. More
information about Microsoft Internet Information Services (IIS) can also be found at the following URL:
https://docs.microsoft.com/en-us/previous-versions/iis/6.0-sdk/ms525410(v=vs.90)
Assessment and Marking
The completion of this worksheet is worth 20% of your portfolio assignment for the UFCFFY-15-M Cyber Security
Analytics (CSA) module.
For Part A, the set of
guided questions carry individual marks for the successful completion of each task, with a maximum
of 12 marks available. Where a question is worth more than 1 mark, a partial solution to the question may warrant partial
marks.
For Part B, the single question is an
unguided task that will be graded against three core criteria:
Criteria 0 1 2 3 4
Identifying the
suspicious activity
No or very little
evidence of
progress
Limited attempt to
address this criteria
A possible solution but
with weaknesses
A good solution
with some
justification
An excellent solution
with clear justification
Analytical reasoning
to uncover the
activity
No or very little
evidence of
progress
Limited attempt to
address this criteria
Some fair attempt but
with weaknesses
A good solution
with some
justification
An excellent solution
with clear justification
Clarity and
presentation
No or very little
evidence of
progress
Limited attempt to
address this criteria
A reasonable attempt
but with some
weaknesses
Good detail and
presentation
Excellent detail,
professional
presentation
Submission Documents
Your submission for this task should include:
1 Jupyter Notebook exported in PDFviaHTML format:
You should complete your work using the iPYNB file provided (i.e., this document). Once you have completed your work,
you should use the export function in Jupyter to save your notebook as an HTML document (“File”, “Save and Export
Notebook As”, “PDFviaHTML”). *
Do not submit your ipynb file – we will not execute any code during marking.
Therefore, you must ensure that all code cell output is presented clearly in your PDF document before you make
your final submission.
*
The deadline for your portfolio submission is
TUESDAY 2ND MAY @ 14:00. This assignment is eligible for the 5-day late
window policy
, however module staff will not be able to assist with any queries after the deadline.
The portfolio will be submitted to Blackboard as 4 independent documents:
*
STUDENT_ID-TASK1.pdf* (a PDF document exported from your Jupyter notebook)
*
STUDENT_ID-TASK2.pdf* (a PDF document exported from your Jupyter notebook)
*
STUDENT_ID-TASK3.pdf* (a PDF report of your research investigation)
*
STUDENT_ID-TASK4.mp4* or *STUDENT_ID-TASK4.txt* (either the video file of your presentation, or a text file
that contains instructions for accessing your video online)
Contact
Questions about this assignment should be directed to your module leader ([email protected]). You should use the
online Q&A form to ask questions related to this module and this assignment, as well as utilising the on-site teaching
sessions.
Student ID: -ENTER STUDENT NUMBERBy submitting this assignment to Blackboard as part of your portfolio, I declare that the submission is my
own work.
In the cell below, you will need to change data_file to your own specific data filename. The example data file is purely
to demonstrate some initial steps for your investigation and should not be used.
Part A:
Please answer the following questions by providing suitable Python code
Questions 1-6 should require only a single line of code per answer. Question 8 should be answered in two lines of
code
only. These questions make up 12 possible marks towards the assignment.
Question 1: How many unique machines (defined by client IP address ‘c-ip’) have
accessed this web server application? (1 Mark)
Question 2: How many unique usernames (defined by ‘cs-username’) have
accessed this web server application? (1 Mark)
Question 3: Which URLs (defined by ‘cs(Referer)’) have been accessed the most
number of times? (1 Mark)
Question 4: What is the minimum value in the ‘sc-status’ column? (1 Marks)
Question 5: How many entries in the data column ‘cs-uri-query’ start with the
string ‘v=’? (2 Marks)
Question 6: How many entries in the data column ‘cs(User-Agent)’ contain the
term ‘Win64? (2 Marks)
Question 7: Which file extension occurs the most within the ‘cs-uri-stem’ column?
(2 Marks)
Question 8: How many entries return a ‘sc-status’ value of 404 before 06:00AM? (2
Marks)
date time

s-ip cs
method

cs-uri-stem cs-ur queri- y por s-t username cs- c-ip

0 2022-
01-01 04:36:00 192.67.2.200
GET bjgstfyo.js v=596413 443 – 194.79.31.2 (Windows+NT+10.0;+
1 2022-
01-01 04:36:00 192.67.2.200
2 2022-
01-01 04:36:15 192.67.2.200
3 2022-
01-01 04:36:15 192.67.2.200
GET
GET
GET
index.aspx
osivymrb.css
laepfxqk.css
– 443
– 443
– 443
– 194.79.31.2 (Windows+NT+10.0;+
– 194.79.31.2 (Windows+NT+10.0;+
– 194.79.31.2 (Windows+NT+10.0;+
4 2022-
01-01 04:36:15 192.67.2.200
GET
template.css v=alngoccj 443 – 194.79.31.2 (Windows+NT+10.0;+
… …
69545 2022-
01-30 23:54:29 192.67.2.200
GET transactions.aspx page=4 443 ew361149 81.161.226.136 (X11;+Linux+x86_6
69546 2022-
01-30 23:54:32 192.67.2.200
GET template.css v=nhxjnpwa 443 ew361149 81.161.226.136 (X11;+Linux+x86_6
69547 2022-
01-30 23:54:32 192.67.2.200
GET favico.ico – 443 ew361149 81.161.226.136 (X11;+Linux+x86_6
69548 2022-
01-30 23:54:32 192.67.2.200
GET template.css v=eaftdmgp 443 ew361149 81.161.226.136 (X11;+Linux+x86_6
69549 2022-
01-30 23:54:32 192.67.2.200
GET transactions.aspx page=5 443 ew361149 81.161.226.136 (X11;+Linux+x86_6

69550 rows × 16 columns
In [3]: # Import libraries as required
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
pd
.set_option(‘display.max_rows’, 10)
In [4]: data_file = ‘YOUR-USERNAME’
In [178… # Load in the data set as required
data = pd.read_csv(data_file, delim_whitespace=True)
temp_df = data[data.columns[:1]]
temp_df.columns = data.columns[1:]
data = temp_df
data
[‘datetime’] = pd.to_datetime(data[‘date’] + ” ” + data[‘time’])
data
Out[178]:
In [209…
# ANSWER
In [208… # ANSWER
In [207… # ANSWER
In [206… # ANSWER
In [205… # ANSWER
In [204… # ANSWER
In [203… # ANSWER
In [211… # ANSWER
Part B:
Investigate the dataset further to uncover the suspicious activity.
This unguided question will be graded against the following criteria:
Identifying the suspicious activity
(4 Marks)
Analytical reasoning to uncover the activity (4 Marks)
Clarity and presentation (4 Marks)
You should state all suspicious IP addresses that you have identified as part of your conclusion, and you should explain in
clear written English how you have uncovered this information, based on how you have used Python code for data
investigation. This should be clear and concise, and you only need to include code that helped you to solve the challenge.
In [1]: # ANSWER