The University of Melbourne
School of Computing and Information Systems
COMP10002 Foundations of Algorithms
Semester 2, 2023
Assignment 1
Due: 4pm Friday 26 May 2023
Version 1.0
1 Learning Outcomes
In this assignment you will demonstrate your understanding of structures, linked data structures, and algorithm time complexity analysis. You will further extend your skills in program design and implementation.
2 The Story…
In Assignment 1, we have tasted supervised learning, which is a major subcategory of machine learning
that uses training datasets to derive parameters for mathematical models. In this assignment, we will try
out another major subcategory of machine learning, unsupervised learning, in particular, clustering, which
discovers data patterns by grouping a set of objects such that objects in the same group are more similar to
each other than to those in other groups. Like before, you don’t need to be an expert in machine learning
for the assignment. We will guide you through the process step by step.
Social network apps, such as Facebook, Twitter, and WeChat, have become prevalent and are now part
of our daily lives. A lot of opportunities and challenges arise from such apps. A challenge to look at
in this assignment is called community detection, which aims to find groups of social network app users
that are densely connected, that is, following” each other (or, in a friendship“). Figure 1 illustrates two
communities (each enclosed by a red dashed curve) detected from a social network of 12 users u0, u1, : : :,
u11, where two users connected by a line is considered to be in a friendship.
u0
u1
u2
u3
u4
u5
u6
u7
u8
u9
u10
u11
Figure 1: A social network example
The communities detected offer many opportunities. For example, the organisers of a product launch event
may want to invite influencers from every community to maximise the impact of the event, or they may
analyse the interests of different communities and focus on those that are more likely to be their customers.
As another example, social network platforms may detect groups of malicious users from the communities,
e.g., accounts set up for spreading fake news or for political propaganda at election times.
3 Your Task
In this assignment, you will be given a small social network dataset, and your goal is to implement a
simplified community detection algorithm which has been broken down into a few stages like before.
1
The given input to the program has three sections (a sample input is shown below):
1. At least one and up to 50 lines of user profiles. Each line represents a user, which starts with ‘u[i]’
where 0 <= i < 50 represents the user number. For simplicity, each user number is the same as
the line number, that is, user number i will appear at line i. Each user profile line then contains a
four-digit positive integer representing the year when the user joined the social network. After that,
each line contains at least one and up to 10 hashtags separated by single whitespace characters. Each
hashtag starts with a ‘#’ symbol, followed by at least one and up to 20 lower-case English letters.
In the sample input below, user u0 is at line 0, who joined the social network in the year 2018, with
hashtags #foodiesofinstagram, #foodies, and #fresh.
2. An n × n friendship matrix, where n represents the number of users (in the sample input, there are 12
user profile lines, and so n = 12). The element at row i, column j of the matrix represents whether
users ui and uj are in a friendship. If so, the element has a value of 1, and 0 otherwise.
The friendship matrix in the sample input below corresponds to the social network in Figure 1, where
user u0 is in a friendship with users u1, u2, and u3. Thus, in the friendship matrix, columns 1, 2, and
3 of row 0 are all 1’s, while the rest of row 0 are all 0’s.
To simplify the discussion, this assignment assumes two-way friendships, that is, two users must
follow each other at the same time (in a friendship), or they do not follow each other at all (not in a
friendship). This means that the friendship matrix is symmetric.
3. Two threshold values ths (a positive real number) and thc (a positive integer) separated by a whitespace character, which will be described and used in Stage 4.
u0 2018 #foodiesofinstagram #foodies #fresh
u1 2011 #local #togo #yummy #keyfooddeli #supportsmallbusiness #foodlover
u2 2013 #foodlover #yummy #dinner #foodies #togo
u3 2014 #foodies
u4 2017 #storemade #macncheese
u5 2022 #melbournedemons #richmondtigers #sydneyswans
u6 2021 #mcg #richmondfc #footy
u7 2014 #aussierulesfootball #melbournefc #aflfinals
u8 2019 #afl #footy #football #aussierules #aflw #sport #aussie #melb #syd #tas
u9 2017 #sydneyswans #nfl #aussie #melbournedemons #footy
u10 2018 #startreck
u11 2015 #starwars
0 1 1 1 0 0 0 0 0 0 0 0
1 0 1 1 0 0 0 0 0 0 0 0
1 1 0 1 1 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 0 0 0 0 0 0
0 0 0 0 1 0 1 1 1 0 0 0
0 0 0 0 0 1 0 1 0 1 0 0
0 0 0 0 0 1 1 0 1 1 0 0
0 0 0 0 0 1 0 1 0 1 0 1
0 0 0 0 0 0 1 1 1 0 1 0
0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 1 0 0 0
0.3 2
You may assume that the test data always follows the format above. No input validity checking is needed.
You will be given a skeleton code file named program.c for this assignment on Canvas. The skeleton code
file contains a main function that has been partially completed. There are a few other functions which are
incomplete. You need to add code to all the functions including the main function for the following tasks.
3.1 Stage 1: Read User Profiles (Up to 5 Marks)
Your first task is to add code to the stage_one function to read the user profiles. You need to define a
struct named user_t to represent a user, and an array to store all user profiles. This stage outputs (1) the
2
number of users read, that is, the value of n, (2) the user with the largest number of hashtags, and (3) the
hashtags of the user. If there is a tie, print out the user with the smallest user number among the tied ones.
Hint: Use scanf with u%d” to read a letter ‘u’ and an integer, or use getchar to read the letter ‘u’ first.
Note the newline character ‘n’ at the end of each line if using getchar.
The output for this stage given the above sample input should be (where mac:” is the command prompt):
mac: ./program < test0.txt
Stage 1
==========
Number of users: 12
u8 has the largest number of hashtags:
#afl #footy #football #aussierules #aflw #sport #aussie #melb #syd #tas
Like in Assignment 1, we will again use input redirection to feed test data into your program. Thus, you
should still use the standard input functions such as scanf or getchar to read the data. You do not need
to (and should not) use any file operation functions such as fopen or fread. Your program should not print
anything except for the data requested to be output (as shown in the output example).
You can also modify the stage_one function to read all input in one go. You can (and should) create further
functions to complete the tasks when opportunities arise.
3.2 Stage 2: Compute the Strength of Connection (Up to 10 Marks)
Add code to the stage_two function to read the first two rows of the friendship matrix. These two rows
represent the users who are in a friendship with the first two users, u0 and u1.
Then, calculate the strength of connection between u0 and u1 with the following equation:
s o c(u0; u1) =
8<:
0 if u0 and u1 are not in a friendship
jF (u0) F (u1)j
jF (u0) [ F (u1)j otherwise (1)
Here, F (u0) and F (u1) refer to the sets of users who are in a friendship with users u0 and u1, respectively;
operator takes the intersection of two sets; operator [ takes the union of two sets; and j · j returns the
number of members of a set (not the abs function as in Assignment 1). Intuitively, two users with more
shared friends will have a larger strength of connection, and s o c(u0; u1) 2 [0; 1].
In the sample input, the set of users who are in a friendship with u0 is F (u0) = fu1; u2; u3g; the set of
users who are in a friendship with u1 is F (u1) = fu0; u2; u3g; F (u0) F (u1) = fu2; u3g (2 members);
F (u0) [ F (u1) = fu0; u1; u2; u3g (4 members). Thus, s o c(u0; u1) = 2
4
= 0:5.
The output for this stage given the above sample input should be (using %4.2f for output formatting):
Stage 2
==========
Strength of connection between u0 and u1: 0.50
3.3 Stage 3: Compute the Strength of Connection Between All Pairs of Users
(Up to 15 Marks)
Add code to the stage_three function to finish reading the friendship matrix. Then, compute the strength
of connection between every pair of users. The output for the stage is a strength of connection matrix, where
the element at row i, column j is s o c(ui; uj). You need to generalise Equation 1 for the computation.
Hint: You may also choose to read the full friendship matrix and compute the strength of connection matrix
in Stage 2. For marking purposes, you should still follow the output requirement of each stage.
Given the sample input above, the output of this stage is shown in the next page (using %4.2f for output
formatting). As the output shows, the strength of connection matrix is also symmetric. For example, the
3
element at row 0, column 1 (that is, s o c(u0; u1)) and the element at row 1, column 0 (that is, s o c(u1; u0))
are the same, which are both 0.5 as calculated above.
Stage 3
==========
0.00 0.50 0.40 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.50 0.00 0.40 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.40 0.40 0.00 0.40 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.50 0.50 0.40 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.17 0.33 0.14 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.17 0.00 0.40 0.00 0.17 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.33 0.40 0.00 0.33 0.33 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.14 0.00 0.33 0.00 0.14 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.17 0.33 0.14 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
3.4 Stage 4: Detect Communities and Topics of Interest (Up to 20 Marks)
Stage 4.1 (3 marks)
Add code to the stage_four function to read the strength of connection threshold ths and the core user
threshold thc.
We say that two users are close friends if their strength of connection is greater than ths. For example,
when ths = 0.3, u0 and u1 (or u2, u3) are close friends, as their strength of connection 0.5 is greater than
0.3 as shown in the Stage 3 output, while u5 and u6 are not close friends as s o c(u5; u6) = 0:17 < 0:3.
Further, we say that a user is a core user if the user has more than thc number of close friends. Counting
the number of close friends of a user ui is simple – just count the number of elements at row i of the
strength of connection matrix that are greater than ths. In the Stage 3 output matrix shown above, row 0
has 3 elements greater than ths = 0.3, and so u0 has 3 close friends which is greater than thc = 2. Thus,
u0 is a core user. In total, the sample input (as plotted in Figure 1) has four core users: u0, u1, u2, u3, and u7.
Write a function that takes a user (or user number) as its input and test whether the user is a core user.
Each core user forms a community together with the close friends of the user. For example, u0 forms a
community with u1, u2, and u3; u1 forms a community with u0, u2, and u3; u7 forms a community with
u5, u6, u8, and u9. Technically, when two communities have shared users, they should be merged together
and considered as a single community. For simplicity, in this assignment, we omit this merging process and
just treat each community formed by a different core user separately.
Now, add code to the stage_four function to go through each user and check if the user is a core user, by
calling the core user testing function you just wrote. If so, compute and output the close friends of the user,
in ascending (increasing) order of the user numbers.
Stage 4.2 (for a challenge, 2 marks)
Further compute and output all unique hashtags of the users in the community form by each core user,
sorted in alphabetical order (5 hashtags per line).
To compute the unique hashtags, modify the given linked list code and the insert_unique_in_order function to insert the unique hashtags into the list and maintain the list nodes in alphabetical order. Then, you
just need to go through the list and print out the data (that is, a hashtag) in each node, by adding code to
the print_list function. This step aims to test your learning of linked lists. There may be more efficient
solutions for this step, which you can implement but should not be included in your assignment submission.
At the end of your submission file, you need to add a comment that states the time complexity
of your algorithm for Stage 4.2, and explains why it has that time complexity, assuming that
there are U users, C core users, H hashtags per user, and a maximum length of T per hashtag.
4
For marking purposes, you must adapt the linked list implementation given in the skeleton code to form your
solution instead of starting from scratch. The output for this stage given the sample input is shown below:
Stage 4
==========
Stage 4.1. Core user: u0; close friends: u1 u2 u3
Stage 4.2. Hashtags:
#dinner #foodies #foodiesofinstagram #foodlover #fresh
#keyfooddeli #local #supportsmallbusiness #togo #yummy
Stage 4.1. Core user: u1; close friends: u0 u2 u3
Stage 4.2. Hashtags:
#dinner #foodies #foodiesofinstagram #foodlover #fresh
#keyfooddeli #local #supportsmallbusiness #togo #yummy
Stage 4.1. Core user: u2; close friends: u0 u1 u3
Stage 4.2. Hashtags:
#dinner #foodies #foodiesofinstagram #foodlover #fresh
#keyfooddeli #local #supportsmallbusiness #togo #yummy
Stage 4.1. Core user: u3; close friends: u0 u1 u2
Stage 4.2. Hashtags:
#dinner #foodies #foodiesofinstagram #foodlover #fresh
#keyfooddeli #local #supportsmallbusiness #togo #yummy
Stage 4.1. Core user: u7; close friends: u5 u6 u8 u9
Stage 4.2. Hashtags:
#afl #aflfinals #aflw #aussie #aussierules
#aussierulesfootball #football #footy #mcg #melb
#melbournedemons #melbournefc #nfl #richmondfc #richmondtigers
#sport #syd #sydneyswans #tas
4 Submission and Assessment
This assignment is worth 20% of the final mark (5% per stage). A detailed marking scheme will be provided
on Canvas.
Submitting your code. You should put all your code for the assignment into a single file named
program.c. To submit your code, you will need to: (1) Log in to Canvas LMS subject site, (2) Navigate to Assignment 2″ in the Assignments” page, (3) Click on Load Assignment 2 in a new window”,
and (4) follow the instructions on the Gradescope Assignment 2″ page and click on the Submit” link to
make a submission. You can submit as many times as you want to. Only the last submission made before the
deadline will be marked. Submissions made after the deadline will be marked with late penalties as detailed
at the end of this document. Do not submit after the deadline unless a late submission is intended. Two
hidden tests will be run for marking purposes. Results of these tests will be released after the marking is done.
You can (and should) submit both early and often { to check that your program compiles correctly on
our test system, which may have some different characteristics to your own machines.
Testing on your own computer. You will be given a sample test file test0.txt and the sample output
test0-output.txt. You can test your code on your own machine with the following command and compare
the output with test0-output.txt:
mac: ./program < test0.txt /* Here ‘<’ feeds the data from test0.txt into program */
Note that we are using the following command to compile your code on the submission testing system (we
name the source code file program.c).
gcc -Wall -std=c17 -o program program.c -lm
The flag -std=c17” enables the compiler to use a modern standard of the C language { C17. To ensure
that your submission works properly on the submission system, you should use this command to compile
your code on your local machine as well.
5
You may discuss your work with others, but what gets typed into your program must be individual work,
not from anyone else. Do not give (hard or soft) copies of your work to anyone else; do not lend” your
memory stick to others; and do not ask others to give you their programs just so that I can take a look
and get some ideas, I won’t copy, honest”. The best way to help your friends in this regard is to say a
very firm no” when they ask for a copy of, or to see, your program, pointing out that your no”, and their
acceptance of that decision, is the only thing that will preserve your friendship. A sophisticated program that
undertakes deep structural analysis of C code identifying regions of similarity will be run over all submissions
in compare every pair” mode. See https://academichonesty.unimelb.edu.au for more information.
Deadline: Programs not submitted by 4pm Friday 26 May 2023 will lose penalty marks at the rate of
3 marks per day or part day late. Late submissions after 4pm Monday 29 May 2023 will not be accepted.
Students seeking extensions for medical or other outside my control” reasons should email the lecturer at
[email protected]. If you attend a GP or other health care professional as a result of illness,
be sure to take a Health Professional Report (HRP) form with you (get it from the Special Consideration
section of the Student Portal), you will need this form to be filled out if your illness develops into something
that later requires a Special Consideration application to be lodged. You should scan the HPR form and
send it in connection with any non-Special Consideration assignment extension requests.
And remember, Algorithms are fun!
c 2023 The University of Melbourne
Prepared by Jianzhong Qi
6