MIS171 Assignment 3

139 views 3:38 pm 0 Comments September 24, 2023

Description
This assignment requires you to analyse a data set (including performing a multiple regression
analysis), interpret, and draw conclusions from your analysis. You are required to explain the
Summary Output from your multiple regression analysis in a written report, which must also include
TWO recommendations to the City of Greater Geelong Endangered Bat Project.
The assignment must be completed individually and must be submitted electronically in CloudDeakin
by the due date. When submitting electronically, you must check that you have submitted the work
correctly by following the instructions provided in CloudDeakin. Hard copies or assignments
submitted via email will NOT be accepted.
The assignment uses the file 2023 T2 MIS171 Assignment 3 Data.xlsx which can be downloaded from
CloudDeakin. The assignment focuses on materials presented up to and including Week 11. The Excel
file which has been provided has different worksheets explaining and containing the City of Greater
Geelong (CoGG) council dataset. Following is an introduction to this scenario and detailed guidelines.
Context/Scenario: Endangered Bat Population Analysis
The Grey-headed Flying-fox, a vulnerable (endangered) species, is one of the most efficient
pollinators and seed dispersers of native Australian trees. Large colonies of grey-headed flying-foxes
live in the City of Greater Geelong (CoGG). The population of Geelong’s grey-headed flying fox colony
is in decline. The CoGG Parks and Gardens team have observed that grey-headed flying-foxes can
experience dehydration, hyperthermia, and death during major heat events. To address this issue,

2
CoGG’s Parks and Gardens team has monitored the bat population, tree coverage, and weather
details in the Geelong region. Following consultation with native animal experts the CoGG Parks and
Gardens team have installed new technology including specialist sprinkler systems that spray a fine
mist over large bat colonies to cool grey-headed flying-foxes during major heat events. The CoGG
Council has requested a series of reports on the project to protect grey-headed flying-foxes in the
Geelong area.
Assume that you are a business analyst and you have received an email from Rizwan, the City
Intelligence Analyst. Your response will be used as part of a report to the Council. Rizwan’s email
together with guidelines (shown in blue) are presented below:
Email from Rizwan
To: You
From: Rizwan, City Intelligence Analyst, CoGG
Subject: Analysis of the endangered bat population dataset
Hi …,
The Council wants a detailed understanding of some of the key aspects related to the bat population,
including weather, habitat, predators and the impact of the sprinkler installation. I have attached an
Excel file with key data and included some guidelines (shown in blue) to direct your work.
Please provide answers to the following questions. Return the Excel file to me. As I have training in
business analytics, I am comfortable with technical language. The Council wants a report from you
which explains the outcome of your analysis. As they do not have the benefit of training in business
analytics, your report must present the results of your analysis in plain, straight-forward language. I
have provided a template for you to use.
1. Hypothesis Testing (cŽŶƐŝĚĞƌɲсρй)
It has been suggested to the Council that the average Bat Population is not 100 bats per
location regardless of Tree Health. Does the Endangered Bat Population sample data confirm
this hypothesis?
To answer this question, you will need to conduct an appropriate hypothesis test on Bat
Population for each category of Tree Health.

3

2. Multiple Linear Regression Modelling (cŽŶƐŝĚĞƌɲсρй)
Bat Population is the key outcome for the Endangered Bat Project. Build a multiple regression
model to predict Bat Population. Your model should provide insights into which factors have a
significant influence on Bat Population, as well as the ability to predict Bat Population for
various scenarios.
For this analysis, you will need to build a multiple regression model using Bat Population as the
dependent variable. All other variables in the Endangered Bat Project data-set should be
included in the model, except Location ID, Population Band, Tree Age Band, and Tree Height
Band (i.e. exclude Location ID, Population Band, Tree Age Band, and Tree Height Band from your
regression model).
Follow the model building process introduced in the lectures and seminars.
Carefully consider the following:
(a) Transform categorical variables into suitable dummy variables
(i.e. Urbanisation, Tree Health, Predators, and Sprinkler).
Copy the Endangered Bat Population data set to the “Correlation” spreadsheet in the Excel
file that has been provided (no earlier than Column AA – be careful not to overwrite the
Conclusion, Correlation Table and Scatter Diagram frames). Delete the Location ID column.
i. When transforming Urbanisation into dummy variables, consider Moderately as the
baseline category; meaning the created dummy variables for Urbanisation should only
include Highly (Yes and No), Slightly (Yes and No) and Rural (Yes and No).
ii. When transforming Tree Health into dummy variables, consider Good as the baseline
category.
iii. When transforming Predatorsinto dummy variables, consider Medium as the baseline
category.
iv. When transforming Sprinkler into dummy variables, consider No as the baseline
category.

4
Complete the Dummy Variables Summary table which is in the Conclusion section of the
Correlation worksheet. The table summarises the results of your transformation of
categorical variables into dummy variables.
(b) Using the Endangered Bat Project data set (which now includes transformed dummy
variables) as your reference, complete the following steps:
i. Correlation – in the section marked “Correlation Table” (below the Conclusion
section on the “Correlation” worksheet) generate a correlation table.
Use the “Correlation” option in Excel’s Data Analysis ToolPak.
ii. On the correlation table, identify and clearly indicate the Independent Variables
which are (virtually) uncorrelated with the Dependent Variable (i.e. all IVs which
have a correlation coefficient with the DV of between -0.050 and 0.050).
These IVs are to be removed from the model prior to running the first iteration of the
regression model.
iii. Complete the Uncorrelated Independent Variables summary table which is in the
Conclusion section of the Correlation worksheet. This table summarises which
Independent Variables are to be eliminated from the regression model due to being
(virtually) uncorrelated with Bat Population (DV).
iv. Multi-collinearity – review the correlation table for instances of multi-collinearity
between Independent Variables (IV). In cases of multicollinearity, identify and clearly
indicate the IVs with the weakest correlation with the Dependent Variable.
These IVs are to be removed from the model prior to running the first iteration of the
regression model.
v. Complete the Multi-Collinearity summary table which is in the Conclusion section of
the Correlation worksheet. This table summarises which Independent Variables are
to be eliminated from the regression model due to multi-collinearity.
vi. Scatter diagrams – in the section marked “Scatter Diagrams” (below the Correlation
Table section on the “Correlation” worksheet) generate three scatter diagrams, for:
x Bat Population (Dependent Variable, DV) and the numerical (not dummy
categorical) Independent Variable (IV) which has the highest correlation with the
DV.

5

Include a calculation of the correlation coefficient.
Format the diagram, and include a linear trendline, and the coefficient of
determination.
x Bat Population (DV) and the numerical (not dummy categorical) Independent
Variable (IV) which has the highest inverse (i.e. most negative) correlation with
the DV.
Include a calculation of the correlation coefficient.
Format the diagram, and include a linear trendline, and the coefficient of
determination.
x Bat Population (DV) and the numerical (not dummy categorical) Independent
Variable (IV) that is closest to being uncorrelated with the DV (i.e. correlation
coefficient closest to zero).
Include a calculation of the correlation coefficient.
Format the diagram, and include a linear trendline, and the coefficient of
determination.

(c) On the “Regression Model” spreadsheet in the Excel file that has been provided, and using
the Endangered Bat Population data set as your reference (the data set includes the
dummy variables you have created and excludes the Independent Variables which have
been eliminated due to multi-collinearity or being uncorrelated with the Dependent
Variable or Location ID), complete the following steps:
i. Using the “Regression” option in Excel’s Data Analysis ToolPak build a multiple
regression model.
x Assess the model for overall significance
(F test with alpha set at 0.05, i.e. Confidence Level = 95%).
ii. If your first iteration of the overall model is found to be significant, in a step-wise
fashion, sequentially (one at a time) remove the Independent Variables that are least
likely to be contributing to any significant change in the Dependent Variable.
x You will need to conduct t-tests (i.e., check p values) with alpha set at 0.05 to
determine the significance of the various IVs you exclude and include in your
model.

6
(d) Once you have created a regression model where all the remaining Independent Variables
are contributing significantly to a change in Bat Population copy the Summary Output of
your final multiple regression model and paste it into the Outputsection of the “Regression
Model” spreadsheet in the Excel file that has been provided,
i. In the Conclusion section of the “Regression Model” spreadsheet,
x Write the (final) multiple regression equation.
Use the format: } сڋ0 нڋ1X1 нڋ2X2 нڋ3X3 нڋ4X4…

x Explain (interpret) the (final) multiple regression equation/model.
(e) Using the final multiple regression equation (from the previous step),
i. In the Predictions section of the “Regression Model” spreadsheet in the Excel file
that has been provided, for the scenario outlined below:
x Calculate a Point Estimate for Bat Population (DV),
x Calculate a Prediction Interval for Bat Population (DV),
x Calculate a Confidence Interval for Bat Population (DV),
ii. In the Conclusion section of the “Regression Model” spreadsheet in the Excel file
that has been provided, for the scenario outlined below:
x Interpret the Point Estimate calculation
x Interpret the Prediction Interval calculation
x Interpret the Confidence Interval calculation
Independent Variables Scenario
Urbanisation Highly Urbanised
Location Size 650 m2
Tree Number 5 trees
Tree Age 10 years

7

Tree Height 10 metres
Tree Health Good
Temperature 19 degrees
Humidity 75%
Wind Speed 25 km per hour
Predators Low
Sprinklers Yes

I look forward to receiving details of your analysis, and your report, by Monday 2 October, 2023.
Sincerely,

Rizwan

Data Description
The provided data file includes multiple sheets, labelled “Data Description”, “Data” and several other
worksheets for the above questions. The “Data Description” sheet describes all the variables used in
the “Data” set and is copied below for your convenience.

Variable Description
Location ID A unique ID for each location
Bat Variables

Population The bat population count (number of bats in the location)
Population Band Population represented as Low (less than 90 bats), Medium (90-120 bats),

and High (more than 120 bats)

Weather Variables

Temperature The average temperature of the location in degrees Celsius
Humidity The average humidity of the location in percentage
Wind The average wind speed in the location in km/h
Habitat Variables

8
Urbanisation Indicating the level of urbanisation of the location: Highly Urbanised, Moderately

Urbanised, Slightly Urbanised, Non-Urbanised
Size The size of the location (in 10 square metres)
Tree Number The total number of trees in the location
Tree Age The average age of the trees in the location (in years)
Tree Age Band Tree Age represented as Young (less than 10 years), Mature (10-20 years),

and Veteran (more than 20 years)

Tree Height The average height of the trees in the location (in metres)
Tree Height Band Tree Height represented as Large (more than 10 metres), Medium (5-10 metres),

Small (less than 5 metres)

Tree Health The overall health of the trees in the location: Good, Fair, Poor

Other Important Variables

Predators Indicating the degree of predator presence in the location: High, Medium, Low
Sprinkler Presence of the new specially designed irrigating system that cools a location: Yes, No

Assignment instructions
The assignment consists of two parts.
Part 1: Data Analysis
Your data analysis must be performed on the Assignment 3 Excel file.
The file includes tabs (spreadsheets) for:
x Data Description
x Endangered Bat Population Project data set
x Analysis for Hypothesis Testing
x Correlation, which includes:
o creating Dummy variables,
o creating a correlation table,
o eliminating uncorrelated Independent Variables (IVs), and
o eliminating IVs where multi-collinearity is present

9
x Regression Model – building the regression model, including multiple iterations, and
o reporting the Summary Output of the final regression model,
o identifying the final equation, and explaining/interpreting the final equation, and
o calculating and explaining the Point Estimate, Prediction Interval, and Confidence
Interval for the scenario provided.

The only worksheets in the Excel file provided by Rizwan that will be marked are Hypothesis Testing,
Correlation, and Regression Model.
Follow the steps (in blue) in Rizwan’s email for what to include in the reporting sections that will be
marked (e.g. Output and Conclusion sections in the Hypothesis Testing worksheet). There should be
no need to create additional worksheets.
When conducting the hypothesis tests, you need to apply techniques and perform calculations as
outlined in the lectures and seminars including, as appropriate, descriptive analytics and summary
measures, pivot (and other) tables, and visualisations (charts/diagrams). You will also need to draft
appropriate conclusions.
Use the Outputsection to perform your hypothesis test as directed in Rizwan’s email, which supports
your response to his questions (which you will write in the Conclusion section). Analysis in the Output
section will be marked, please make sure your analysis is complete, clear, and easy to follow.
In the Conclusion section of the Hypothesis Testing worksheet there is space allocated for you to
write a succinct response to the hypothesis posed in Rizwan’s email (above). When drafting your
Conclusion, make sure that you directly answer the questions asked. Cite (state) the important
features of the analysis in your Output section. Responses in the Conclusion section will be marked
(use technical language here).
Use the Workings section for calculations and workings that support your analysis. The Workings
section will not be marked.
When performing the multiple regression carefully follow the steps outlined in the lecture and
seminars, and set out in the steps (in blue) in Rizwan’s email.
Make sure your analysis and process is complete, clear, and easy to follow. You may need to add (or
delete, or widen/narrow) rows or columns to present your analysis clearly and completely. Poorly
presented, disorganised analysis or excessive output will be penalised.

Tags: , , , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *