Data Model and Design

45 views 9:56 am 0 Comments March 12, 2023

CST2355 – Database Systems

Assignment #2: Data Model and Design [15%]

This assignment relates to the following Course Learning Requirement:

CLR 2 – Develop Advanced Database Design and Normalization

Background

You work for a small Canadian e-commerce business. When your company first started, it used a vendor and managed its data off-site. Today they’re moving all that data on-site and will manage that data using a new database. The vendor sent you two extracts of your data. You will take this denormalized data use it to create a data model and database design. You will use a visual tool to create your conceptual data model; and use the Oracle tools to create your database design, your Oracle database objects, and to import your data. You will prepare a series of scripts to help with cleaning and importing.

Operating Model

The operating model is how people, process, and technology operate together to delivery value to customers. You are provided with the following information by interviewing key stakeholders in the company. This will help you determine your data model and database design.

One department is supported by one or more employee(s)

One employee applies one or more skill(s)

One customer submits one or more purchase(s)

One product can be delivered for one or more purchase(s)

Data

You are provided with 2 spreadsheets of data. This data came from data dumps from the existing off-site system. The data includes: (see bottom of spreadsheet tabs)

Employee data

Purchase data

The data in each spreadsheet tab is denormalized. If a name or address in one sheet is the same as in the other sheet that doesn’t mean they are the same person or address. Assume the data in the Employee sheet tab is independent from the data in the Purchase sheet tab.

Instructions

Execute the following steps. Please submit your work only where the instructions specify to do so. If you do not follow instructions, your submission will be rejected.. For this assignment, use the tools covered in the Tool Overview. You can use any visual tool for your conceptual data model as long as you use Crowfoot notation and can show the correct maximum and minimum cardinal using that Crowfoot notation. An example of a visual tool you could use is in the Data Model Visual Tool Overview.

Please keep each script you created. You may need to rerun that script in a later activity.

Activity

1

Conceptual Data Models

For GOOGLE and industry, the term data model can be ambiguous. To avoid this ambiguity this course differentiates between data models and database designs. Data Models (conceptual design) are designs used to describe how the business sees and uses the data. Database designs (logical design) are used to show how the data is organized on the computer.

Data models guide and drive database design

You create a data model and transform it into database design.

You NEVER create the database design first and then use it to design the data model. This results in you needing to provide more documentation, more training materials, and more support to manage the life and sunset of that database.

Data models identify what the business wants to track. The business wants to track entities and the relationship these entities have with other things. The biggest mistake a designer makes is confusing entities with tables. Though the diagrams look the same, the meaning of the diagram can be drastically different.

Interpretation: The product is sold to one or to a group of customers.

Example: One product is bought by 100 customers.

Interpretation 1: Can be interpreted that the product is sold to different customers. Each paid their own price.

Interpretation 2: Can also be interpreted that the product is sold to one or to a group of customers. Example: One product is bought by 100 customers.

Interpretation: The product is sold to different individual customers. Not groups of customers. Use this structure for this assignment.

Another major mistake designer make is assuming everything an entity. An entity is where the business shows interest in tracking a thing by collecting data and running reports on that thing. If the business has no interest in it that thing is likely not an entity. It would be modeled as an attribute.

Entity – CUSTOMER, INVOICE, PURCHASE, EMPLOYEE, DEPARTMENT

Attribute – Address, City, Province

2

Install Visual Data Modeling Tool

Feel free to use any visual data modeling tool. It must support Crows Foot and minimum/maximum cardinalities.

3

Create Conceptual Data Model

The data model is a tool for the business to explain how they use their data. You will create an EMPLOYEE entity along with the underlined entities mentioned in the Operating Model section on the first page of this document.

Add an EMPLOYEE entity with the identifying attributes FirstName and LastName

Add the Position and Salary regular attributes to the EMPLOYEE entity

A

Add the following entities to your diagram by referencing the Operating Model section on the first page of this document

DEPARTMENT, EMPLOYEE, CUSTOMER, PURCHASE, PRODUCT, SKILL

Save your diagram

4

Add Cardinalities

Cardinality helps describe how one entity is related to another entity. Cardinalities constrain the data. Maximum cardinality specifies how many children a parent can have and how many parents a child can have. Minimum cardinality specifies if an entity must first exist so another entity can exist.

Relationship labels show how the data ties to the business processes.

Add the cardinalities between the entities based on the Operating Model section on the first page

Label each relationship according to the verb is specified in the Operating Mode section on the first page (labels should be unique and tie to the business process)

Save your diagram

5

Add Attributes

An attribute is an aspect of an entity. It is a characteristic. Some attributes can be used to identify the entity. In most cases, an entity is identified by a name. There is a DepartmentName, ProductName, FirstName, and LastName. In other cases an entity can be identified by its own attributes and the identifying attributes of its parents. PURCHASE is identified by FirstName, LastName, Product, and PurchaseDate. SKILL is identified by FirstName, LastName, and SkillName.

Examine each column headings from your .CSV files and add the appropriate attributes and identifying attributes to your data model diagram. For Employee Name and Customer Name add FirstName and LastName. For Address add Addresses.

Confirm SKILL and PURCHASE have the correct identifying attributes

Confirm PRODUCT has the ListedPrice attribute and PURCHASE has the PurchasePrice attribute.

Confirm CUSTOMER has the Company attribute

6

Submit Conceptual Data Model

Add below a screen shot of your Conceptual Data Model diagram. To receive marks your diagram must includes your entities, relationships, labels, name, and student number with the current date

Exclude menus, toolbars, and object browsers

<paste screenshot here>

7

Create and Import .CSV Files

Now you are going to prepare your data to be imported into Oracle. You will split the provided spreadsheet into two .CSV files so you can import them.

Using a Spreadsheet program open CST2355 – Assignment 2 – DATA.xls

Save the Employee Tab as IMPORT_EMPLOYEE_XX.CSV where XX is your initials. Ensure you delete the columns G, H, I, J,K

Import IMPORT_EMPLOYEE_XX.CSV

Save the Purchase Tab as IMPORT_PURCHASE_XX.CSV where XX is your initials. Ensure you delete the columns G, H, I, J, K

Import IMPORT_PURCHASE_XX.CSV

8

Confirm Data Matches Model

For a data model to be valid, the data structure must reflect that model. You would run a series of scripts to confirm the entities seen in the data have the same maximum cardinalities as specified in your diagram.

For each parent entity in your conceptual data model use the following script to confirm the cardinality between entities.

SELECT parent_identifying_attribute,

COUNT(DISTINCT child_identifying_attribute) ChildCount

FROM import_table

GROUP BY parent_identifying_attribute

HAVING COUNT(DISTINCT child_identifying_attribute) > 1;

Rows returned means PARENT – 1:N – CHILDREN

No rows returned means PARENT – 1:1 – CHILD

NOTE: If the parent or child identifying attribute is FirstName & LastName, use EmployeeName or CustomerName accordingly. To confirm the PURCHASE entity only use PurchaseDate as your identifying attribute. For SKILL use only SkillName.

9

Submit Confirmation Scripts

Go to DOS. Maximize your DOS window. Re-run your confirmation scripts from the previous activity. Copy and paste a screenshot of each query and its results below. To receive marks your screen shot must show your SELECT, your results, and the tables having your initials.

<paste screenshot(s) here>

10

Create Database Design

You now are going to transform your conceptual data model into a database design. In most cases, an entity would be transformed into a table. Each non-identifying attribute would be turned into a column for that table. Each identifying attribute will turn into a column for the table if that identifying attribute doesn’t come from the table’s parent. An identifying attribute will be omitted from the table if that identifying attributes comes from the table’s parent.

In other words the SKILL table doesn’t have FirstName, LastName columns and the PURCHASE table doesn’t have FirstName, LastName, Product columns.

Transform each entity into a table and place it in your database design diagram. In each table name include your initials of your student number. Example – CUSTOMER_BC. Identifying attributes taken from a parent are never fields in the table.

The Addresses attribute cannot be represented in the EMPLOYEE or CUSTOMER table. Create a separate ADDRESS table with the columns AddressID, Street, City, Province, Postal. Make EMPLOYEE and CUSTOMER parents of ADDRESS.

Create the same parent-child relationships in your database design that you see in your data model. Make sure you have Mandatory selected when creating your relationship. This will help with trouble shooting.

Confirm you did not duplicate the same column in two different tables

The SKILL table will have duplicated SkillName values. To remove duplication, add a SKILL_TYPE table with columns SkillTypeID and SkillTypeName. Remove SkillName in the SKILL table. Make SKILL_TYPE the parent of SKILL

Confirm all tables and columns follow proper naming standards. Tables are in UPPER_CASE and fields are in ProperCase.

Confirm all foreign key columns have the form <table>ID. If they don’t you will have difficulty with the final step of the assignment.

Confirm all constraint names are under 30 characters to avoid compile errors

11

Submit Database Design

Add below a screen shot of your Database Design diagram. To receive marks your diagram must includes tables and fields with proper naming, all relationships, plus your name, and student number with the current date.

Exclude menus, toolbars, and object browsers

<paste screenshot here>

12

Forward Engineer Database Design to Oracle

13

Cleanup Columns

The values in the data need to be cleaned up. You will go into both import tables in Oracle. Manually make the changes using copy/paste.

Go into IMPORT_EMPLOYEE and cleanup the DepartmentName column. Update the values to the appropriate name in the following list.

Administration

Education & Training

Finance

Front Desk

Human Resources

Marketing and Sales

Operations

Maintenance & Safety

Software Services

Go into IMPORT_PURCHASE and cleanup the ProductName column. Update the values to the appropriate name in the following list.

Car Battery

Chocolate Coins

Chocolate Mints

Gift Card ($10)

Fishing Rod

Gummy Balls

Gummy Wire

Mobile case (black)

Miscellaneous

Model Kit (Automobile)

Model Kit (Train)

Model Kit (Airplane)

Paper Toffee

Shopping Bag (Organic)

Toy Soldiers

Waterpark Pass

Wine (Red)

Wine (White)

14

Submit Department and Product Updates

Go to DOS. Maximize your DOS window. Run the following queries.

SELECT DISTINCT ProductName FROM IMPORT_PURCHASE;

SELECT DISTINCT DepartmentName FROM IMPORT_EMPLOYEE;

Copy and paste a screenshot of these queries and their results below. To receive marks your screen shot must show tables having your initials.

<paste screenshot(s) here>

15

Format Columns

You now run scripts to clean up the formatting of your columns. You will run these scripts on each appropriate column in your IMPORT_EMPLOYEE and IMPORT_PURCHASE tables. You do this so you can simplify your SQL in a later step for updating your keys.

Use the following UPDATE example for each text column in both import tables to make that text column in ProperCase; and to remove leading and trailing spaces (do not put dates and numbers in ProperCase)

UPDATE import_table

SET TextField1 = TRIM(INITCAP(TextField1));

Use the following UPDATE example for each number column in both import tables to remove its comma (do not remove commas from text and date fields)

UPDATE import_table

SET NumberField1 = REPLACE(NumberField1, ‘,);

Run COMMIT; so all connections can see your data updated

16

Submit Format Script

Go to DOS. Maximize your DOS window. Re-run your format scripts from the previous activity. Copy and paste a screenshot of each query and its results below. To receive marks your screen shot must show your SELECT, your results, and your tables having your initials.

<paste screenshot(s) here>

17

Add Keys

You will now add key columns to your IMPORT_EMPLOYEE and IMPORT_PURCHASE tables so you can automate the assignment of primary and foreign keys column values. Use copy/paste to create these columns. Always make sure you don’t have trailing spaces in a field name. Having one will cause errors.

Add the columns PRIMARYID, EMPLOYEEID, ADDRESSID, SKILLID, SKILLTYPEID, DEPARTMENTID to IMPORT_EMPLOYEE. Make each column a NUMBER with 10 digits.

Add the columns PRIMARYID, CUSTOMERID, ADDRESSID, PURCHASEID, PRODUCTID to IMPORT_PURCHASE. Make each column a NUMBER with 10 digits.

Use the following UPDATE example on each import table to number each row

UPDATE import_table SET PrimaryID = ROWNUM;

Use the following UPDATE example on IMPORT_EMPLOYEE to assign values to update the EmployeeID, DepartmentID, SkillTypeID

field_ID

Identifying_attribute

EmployeeID

EmployeeName

DepartmentID

DepartmentName

SkillTypeID

SkillName

UPDATE IMPORT_EMPLOYEE

SET field_ID =

(

SELECT MIN(PrimaryID) FROM IMPORT_EMPLOYEE IT2

WHERE IT2.identifying_attribute =

IMPORT_EMPLOYEE.identifying_attribute

GROUP BY identifying_attribute

);

Use the following UPDATE example to assign values to update the CustomerID and ProductID

field_ID

identifying_attribute

CustomerID

CustomerName

ProductID

ProductName

UPDATE IMPORT_PURCHASE

SET field_ID =

(

SELECT MIN(PrimaryID) FROM IMPORT_PURCHASE IT2

WHERE IT2.identifying_attribute =

IMPORT_PURCHASE.identifying_attribute

GROUP BY identifying_attribute

);

Use the following UPDATE example to update SkillID, PurchaseID, AddressID in the IMPORT_EMPLOYEE and IMPORT_PURCHASE tables

UPDATE import_table

SET field_ID = PrimaryID;

Use the following UPDATE example to avoid conflicting AddressIDs during imports

UPDATE IMPORT_PURCHASE

SET AddressID = PrimaryID + 9999;

Run COMMIT; so all connections can see your updated data

Confirm all Key columns in both import tables have values.

18

Submit Key SELECTs

Go to DOS and maximize your DOS window. Copy and paste a screenshot of this select statement and its results.

SELECT

PRIMARYID,

EMPLOYEEID,

ADDRESSID,

SKILLID,

SKILLTYPEID,

DEPARTMENTID

FROM IMPORT_EMPLOYEE;

To receive marks your screen shot must show tables having your initials.

<paste screenshot(s) here>

19

Submit Key SELECTs

Go to DOS and maximize your DOS window. Copy and paste a screenshot of this select statement and its results.

SELECT

PRIMARYID,

CUSTOMERID,

ADDRESSID,

PURCHASEID,

PRODUCTID

FROM IMPORT_PURCHASE;

To receive marks your screen shot must show tables having your initials.

<paste screenshot(s) here>

20

Split Name & Address for IMPORT_EMPLOYEE

Rather than manually splitting names and addresses into multiple columns you will run scripts to automate the process.

Add FIRSTNAME and LASTNAME columns to IMPORT_EMPLOYEE

Use the following UPDATE example to populate FirstName and LastName fields in IMPORT_EMPLOYEE

— Update for LastName, FirstName

UPDATE import_table

SET LastName = TRIM(REGEXP_SUBSTR (NameField, ‘[^,]+’, 1, 1)) ,

FirstName = TRIM(REGEXP_SUBSTR (NameField, ‘[^,]+’, 1, 2))

WHERE NameField LIKE ‘%,%’;

— Update for FirstName LastName

UPDATE import_table

SET LastName = TRIM(REGEXP_SUBSTR (NameField, ‘[^ ]+’, 1, 2)),

FirstName = TRIM(REGEXP_SUBSTR (NameField, ‘[^ ]+’, 1, 1))

WHERE NameField NOT LIKE ‘%,%’;

Add STREET, CITY, PROVINCE, POSTAL columns to IMPORT_EMPLOYEE

Use the following UPDATE example to populate these fields in IMPORT_EMPLOYEE

UPDATE import_table

SET Street = TRIM(REGEXP_SUBSTR (AddressField, ‘[^,]+’, 1, 1)),

City = TRIM(REGEXP_SUBSTR (AddressField, ‘[^,]+’, 1, 2)),

Province = TRIM(REGEXP_SUBSTR (AddressField, ‘[^,]+’, 1, 3));

Use the following UPDATE example to split province and postal code

UPDATE import_table

SET Province = TRIM(REGEXP_SUBSTR (ProvincePostalField, ‘[^ ]+’, 1, 1)),

Postal = TRIM(REGEXP_SUBSTR (ProvincePostalField, ‘[^ ]+’, 1, 2)) || ‘ ‘ ||

TRIM(REGEXP_SUBSTR (ProvincePostalField, ‘[^ ]+’, 1, 3));

Use the following UPDATE example to put PROVINCE and POSTAL values in UPPER_CASE

UPDATE import_table

SET Province = UPPER(Province),

Postal = UPPER(Postal);

Run COMMIT; so all connections can see your updated data

Confirm you can see all updates.

21

Submit Split Columns SELECT for IMPORT_EMPLOYEE

Go to DOS. Maximize your DOS window. Copy and paste a screenshot showing your SELECT and the results of the SELECTed split columns from the IMPORT_EMPLOYEE. Use COLUMN to format your column widths. To full receive marks format each column so each row of data is on a single line. No row should span more than one line in your screenshot.

<paste screenshot(s) here>

22

Split Name & Address for IMPORT_PURCHASE

You will rerun a variant of the scripts form the previous step to manually splitting names and addresses into multiple columns.

Add FIRSTNAME, LASTNAME, and COMPANY columns to IMPORT_PURCHASE. Make each column VARCHAR(50).

Use the following UPDATE example to populate FirstName, LastName, and Company fields in IMPORT_PURCHASE

— Update for LastName, FirstName

UPDATE import_table

SET LastName = TRIM(REGEXP_SUBSTR (NameField, ‘[^,]+’, 1, 1)) ,

FirstName = TRIM(REGEXP_SUBSTR (NameField, ‘[^,]+’, 1, 2))

WHERE NameField LIKE ‘%,%’ AND NameField NOT LIKE ‘% %’;

— Update for FirstName LastName

UPDATE import_table

SET LastName = TRIM(REGEXP_SUBSTR (NameField, ‘[^ ]+’, 1, 2)),

FirstName = TRIM(REGEXP_SUBSTR (NameField, ‘[^ ]+’, 1, 1))

WHERE NameField NOT LIKE ‘%,%’ AND NameField NOT LIKE ‘% %’;

— Update for Company FirstName LastName

UPDATE import_table

SET Company = TRIM(REGEXP_SUBSTR (REPLACE(NameField, , |), ‘[^|]+’, 1, 1)),

LastName = TRIM(REGEXP_SUBSTR (REPLACE(NameField, , |), ‘[^|]+’, 1, 2))

WHERE NameField LIKE ‘% %’;

UPDATE import_table

SET LastName = TRIM(REGEXP_SUBSTR (LastName, ‘[^ ]+’, 1, 2)),

FirstName = TRIM(REGEXP_SUBSTR (LastName, ‘[^ ]+’, 1, 1))

WHERE NameField LIKE ‘% %’;

Add STREET, CITY, PROVINCE, POSTAL columns to IMPORT_EMPLOYEE. Make them each VARCHAR(50)

Use the following UPDATE example to populate these fields in IMPORT_EMPLOYEE

UPDATE import_table

SET Street = TRIM(REGEXP_SUBSTR (AddressField, ‘[^,]+’, 1, 1)),

City = TRIM(REGEXP_SUBSTR (AddressField, ‘[^,]+’, 1, 2)),

Province = TRIM(REGEXP_SUBSTR (AddressField, ‘[^,]+’, 1, 3));

Use the following UPDATE example to split province and postal code

Use the following UPDATE example to put PROVINCE and POSTAL values in UPPER_CASE

UPDATE import_table

SET Province = UPPER(Province),

Postal = UPPER(Postal);

Confirm you can see all updates.

23

Submit Split Columns SELECT for IMPORT_PURCHASE

Go to DOS. Maximize your DOS window. Copy and paste a screenshot showing your SELECT and the results of the SELECTed split columns from the IMPORT_PURCHASE. Use COLUMN to format your column widths. To full receive marks format each column so each row of data is on a single line. No row should span more than one line in your screenshot.

<paste screenshot(s) here>

24

Import Data

The last step is to insert data from one import table into the tables you created from your forward engineering.

Use the following script to insert data from your import table into your destination table (please leverage copy/paste)

INSERT INTO destination_table (Field1, Field2,… FieldN)

SELECT DISTINCT Field1, Field2,… FieldN

FROM import_table;

Complete step (i) for each destination table you forward engineered. Insert data into your parents tables before its children to avoid insert errors. Remember to include the foreign key columns in your INSERT and SELECT.

If you get a key violation error inserting into ADDRESS, go to your ADDRESS table and allow for nulls. (Click the ADDRESS table in the object browser, select Column, click Edit, turn Not Null toggle off). Then reattempt the insert.

If you get an error inserting dates into PURCHASE, replace your date columns in your SELECT then reattempt the insert.

DeliveryDate becomes …. to_date(DeliveryDate, ‘DD-MM-YYYY’)

Purchase Date becomes… to_date(PurchaseDate, ‘DD-MM-YYYY’)

25

Submit Import from IMPORT_EMPLOYEE Script

Go to DOS and maximize your DOS window. Delete all your destination table data using DELETE FROM <table_name>. Re-run your Import scripts that use IMPORT_EMPLOYEE. (Omit importing into ADDRESS). To receive full marks, paste a screen shot showing each query consecutively ran one after the other with its results. Your screen shot must show tables having your initials.

<paste screenshot(s) here>

26

Submit Import from IMPORT_PURCHASE Script

Go to DOS and maximize your DOS window. Delete all your destination table data using DELETE FROM <table_name>. Re-run your Import scripts that use IMPORT_PURCHASE. (Omit importing into ADDRESS). To receive full marks, paste a screen shot showing each query consecutively ran one after the other with its results. Your screen shot must show tables having your initials.

<paste screenshot(s) here>

27

Submit Import into ADDRESS Script

Go to DOS and maximize your DOS window. Re-run your Import into ADDRESS scripts that uses IMPORT_EMPLOYEE and IMPORT_PURCHASE. To receive full marks, paste a screen shot showing each query consecutively ran one after the other with its results. Your screen shot must show tables having your initials.

<paste screenshot(s) here>

28

Export to JSON and XML

Export all table data in the JSON format using the built-in functionality of a database tool. Save your output as a .JSON file.

Export all table data in the XML format using the built in functionality of a database tool. Save your output as an .XML file.

29

Submit JSON and XML files

Open the .XML and .JSON files using a text editor. Maximize both windows. Submit a screen shot of each window.

<paste screenshots here>

30

Create JSON and XML SELECT Statements

Create a SELECT statement that takes data from the EMPLOYEE table and return that data in JSON format

Create a SELECT statement that takes data from the EMPLOYEE table and return that data in XML format

31

Submit JSON and XML SELECT

Go to DOS and maximize your DOS window. Re-run your SELECT statements from the previous activity. To receive full marks, paste a screen shot showing each query ran consecutively one after the other with its results. Your screen shot must show the tables having your initials.

<paste screenshot here>

32

End of Lab