Council has requested a series of reports on the project to protect grey-headed flying-foxes in the Geelong area.
Assume that you are a business analyst and you have received an email from Rizwan, the City Intelligence Analyst. Your response will be used as part of a report to the Council. Rizwan’s email together with guidelines (shown in blue) are presented below:
Email from Rizwan
To: From: Subject: Hi …,
You
Rizwan, City Intelligence Analyst, CoGG
Analysis of the endangered bat population dataset
The Council wants a detailed understanding of some of the key aspects related to the bat population, including weather and habitat. I have attached an Excel file with key data and included some guidelines (shown in blue) to direct your work.
Please provide answers to the following questions. Return the Excel file to me. As I have training in business analytics, I am comfortable with technical language. The Council wants a report from you which explains the outcome of your analysis. As they do not have the benefit of training in business analytics, your report must present the results of your analysis in plain, straight-forward language. I have provided a template for you to use.
1. Univariate Analysis:
Categorical Variables
1.1. Provide a profile of the categorical variable Predators.
Our presumption is that there was an even spread (similar proportions) across all predator levels. If there was not an even spread of across all predator levels, advise which was the most frequent (and least frequent) level.
You will need to create a suitable table which includes the number and proportion of predator levels.
Create an appropriate graph to illustrate your analysis.
Numerical Variables
1.2. A key measure for the Council is Temperature. Provide an analysis of Temperature. Provide THREE significant observations from your analysis.
2
You will need to generate the appropriate Descriptive/Summary Statistics for Temperature. Also include quartile details, and the interquartile range. Using an appropriate technique, determine whether or not there are any outliers.
Create appropriate graph(s) to illustrate your analysis.
2. Bivariate Analysis:
Categorical/Categorical Variables
2.1. We are interested to understand more about our trees, and any potential relationship between TreeAgeBand and TreeHeightBand. We need you to provide THREE key observations from your analysis.
You will need to create four cross-tabulation tables (pivot-table format will be accepted) that identifies:
- the number of Tree Height Bands in each Tree Age Band,
- the proportion of Tree Height Bands in each Tree Age Band (% of row total),
- the proportion of Tree Height Bands in each Tree Age Band (% of column total), and
- the proportion of Tree Height Bands in each Tree Age Band (% of grand total).
Apply heat-map formatting to each cross-tabulation.
Categorical/Numerical Variables
2.2. We are interested to understand more about Humidity, and any potential relationship between Urbanisation and Humidity. We need you to record some key observations from your analysis in the provided table (in the Excel file).
You will need to create appropriate (pivot) table(s) and/or heat map(s) that identifies, for each Urbanisation level, the key Humidity variables to complete the table.
Create appropriate graphs to illustrate your analysis.
Numerical/Numerical Variables
2.3. Our working assumption is that the Size of a location is strongly positively correlated with the number of trees (TreeNum) in that location. We have also assumed that the number of trees is not correlated with the number of bats (Population). We need you to advise if the data supports our assumptions – i.e. analyse the nature of the relationship (if any) between the following:
- a) Location Size and Tree Number, and
- b) Tree Number and Bat Population
You will need to calculate suitable association measures. Create appropriate graphs to illustrate your analysis.
3. Probability:
3.1.
Assuming that the Temperature is approximately normally distributed, advise which Urbanisation level has the highest probability of having a Temperature of less than 18.5 degrees Celsius.
To answer this question, you will need to do separate probability calculations for each Urbanisation level.
3
3.2. Assuming that the bat Population is approximately normally distributed, advise which Urbanisation level has the lowest probability of having a Bat Population between 95 and 105.
To answer this question, you will need to do separate probability calculations for each Urbanisation level.
4. Confidence Intervals:
The Bat Population is a critical measure for the project. Knowing that the data only contains a sample of all locations in the City of Greater Geelong:
- a) provide an overall estimate of the average number of bats (Population) in each Urbanisation level. Which Urbanisation level appears to have the highest (average) number of Bats? Which Urbanisation level appears to have the lowest (average) number of Bats?
- b) Advise if the Highly Urbanised and Rural locations have a population of 100 bats or more.
You will need to produce a comparative table of descriptive/summary statistics of the Bat Population for each Urbanisation level. Then, you will need to calculate a 95% confidence interval for the average Bat Population in each Urbanisation level.
Create an appropriate graph to illustrate your analysis.
I look forward to receiving details of your analysis, and your report. Sincerely,
Rizwan
Data description
The provided data file includes multiple sheets, labelled “Data Description”, “Data” and several other worksheets for the above questions. The “Data Description” sheet describes all the variables used in the “Data” set and is copied below for your convenience.
Variable
Location ID
Population Population Band
Temperature
Description
A unique ID for each location
Bat Variables
The bat population count (number of bats in the location)
Population represented as Low (less than 90 bats), Medium (90-120 bats), and High (more than 120 bats)
Weather Variables
The average temperature of the location in degrees Celsius
4
Humidity Wind
Urbanisation
Size
Tree Number Tree Age
Tree Age Band Tree Height
Tree Height Band Tree Health
Predators Sprinkler
The average humidity of the location in percentage The average wind speed in the location in km/h
Habitat Variables
Indicating the level of urbanisation of the location: Highly Urbanised, Moderately Urbanised, Slightly Urbanised, Non-Urbanised
The size of the location in 10 square metres
The total number of trees in the location
The average age of the trees in the location in years
Tree Age represented as Young (less than 10 years), Mature (10-20 years), and Veteran (more than 20 years)
The average height of the trees in the location in metres
Tree Height represented as Large (more than 10 metres), Medium (5-10 metres), Small (less than 5 metres)
The overall health of the trees in the location: Good, Fair, Poor Other Important Variables
Indicating the degree of predator presence in the location: High, Medium, Low Presence of the new specially designed irrigating system that cools a location: Yes, No
Assignment instructions
The assignment consists of two parts.
Part 1: Data Analysis
Your data analysis must be performed on the Assignment 2 Excel file. The file includes tabs for: Data Description
Data
Analysis for questions 1, 2, 3, and 4
When conducting the analysis, you need to apply techniques from descriptive analytics, visualisations, probabilities, and confidence interval calculations. You will need to use the appropriate (pivot and other) tables, graphs, and summary measures.
5
The analysis section you submit should be limited to the Q1 to Q4 worksheets of the Excel file. These are the only worksheets which will be marked. Your analysis should be clearly labelled and grouped around each question. Typically, poorly presented, unorganised analysis or excessive output does not gain maximum marks.
In the Conclusion section of each worksheet there is space allocated for you to write a succinct response to the questions posed in Rizwan’s email (above). When drafting your Conclusion, make sure that you directly answer the questions asked. Cite (state) the important features of the analysis in your Output section. Responses in the Conclusion section will be marked (use technical language here).
Use the Output section for your analysis to complete the analysis as directed in Rizwan’s email and supports your response to his questions (which you will write in the Conclusion section). Analysis in the Output section will be marked, please make sure your analysis is complete, clear, and easy to follow. You may need to add rows or columns to present your analysis clearly and completely.
It is useful to produce both numerical and graphical analysis. Sometimes something is revealed in one that is not obvious in the other.
Use the Workings section for calculations and workings that support your analysis. The Workings section will not be marked.