算法分析代写 | CSC263 Winter 2021 Problem Set 2

这个作业是设计一个数据结构并完成相应的操作算法分析
CSC263 Winter 2021 Problem Set 2

Question 1
In this question you will design a data structure to support the following ADT.
Data A collection of Piazza posts for a course where each post has at least the following fields 1
• postID: a unique integer identifying a post within the course
• date: the date when the post was initially created 2
• views: an integer giving the total number of times this post has been viewed
Operations Each of these operations must run in worst-case O log(n) time where n is the number of Piazza
posts in the collection.
• Insert(postID, date, views) Add an item with these values to the collection. If an
item with postID already exists in the collection, calling Insert updates the views but
does not change the date.
• Delete(postID) Remove this item from the collection.
• Search(postID) Return the information in the collection about this post.
• MaxViewedAfter(earliest date) Of all posts on or after earliest date, return
the postID of the one that has the most views. If multiple posts meet the date criteria
and are tied for the most views, return any one of them.
(a) Describe the data structure you will use to implement this ADT. If your solution uses an augmentation of one or more standard data structures, be clear about any extra information that
will be stored.
(b) Draw a diagram of your data structure with the following values already inserted (in any order
you wish).
(c) Describe the algorithm for Insert and justify that it runs in O(log(n)) time.
(d) Describe the algorithm for Delete and justify that it runs in O(log(n)) time.
(e) Describe the algorithm for Search and justify that it runs in O(log(n)) time.
(f) Provide pseudocode for MaxViewedAfter and justify that it runs in O(log(n)) time.
Question 2
You are working on a project modelling COVID-19 positive test counts. You have an array A of
two-element tuples where the first element is a date and the second element is the number of positive
tests on that date. The array is sorted in date order with earlier dates first.
You have been asked to compute a new array as follows. Each output element i corresponds to
input element i. It holds the number of days from input day i until the future date in the input array
that has the closest test count to the count on day i without being over the count on day i. If there
is no future date with a test count lower than a given day, that element in the output array should be
set to 0.
Here is an example input array:
[(2021/01/01, 3363), (2021/01/02, 2964), (2021/01/07, 4249),
(2021/01/11, 2903), (2021/01/15, 3056), (2021/01/16, 3422),
(2021/02/01, 1172), (2021/02/04, 1670), (2021/02/09, 1072)]
and the corresponding expected output:
[14, 9, 9, 24, 20, 19, 8, 5, 0]
In your algorithm you may use the statement A[i].date – A[j].date to determine the number
of days between tuples A[i] and A[j]. This is an O(1) call.
(a) The obvious naive algorithm takes O(n
2
) time. Describe this algorithm.
(b) Design an algorithm and supporting data structure to implement this task in asymptotically less
time. Describe the algorithm including any data structures it uses.
(c) State and justify the worst-case complexity of your algorithm.
Question 3
Envision an AI algorithm which tracks student engagement minute by minute during Zoom lectures.
This is hypothetical of course, though it is not too far fetched to have such feature in the near future.
Engagement is measured by a complicated function that takes into account the activity, the relevance
of chat messages to course content, the instructor’s voice tone, poll response rate, number of attendees,
and the facial expressions of those with video on. The function finds the engagement score over a time
period (not at a single point in time) and returns a value between 0 and 100 for each time period.
Each time period is represented as an integer t rather than an interval or tuple. For example, t1
refers to the first, i.e. earliest, time period measured in that lecture (which could be minute 1, the
first 120 seconds, from 00:00:00 to 00:02:59, or the first half of the lecture). Analogously, t2 would
be the second time period measured in that lecture: minute 2, the second 120 seconds, from 00:03:00
to 00:04:59, or the second half of the lecture. Since a lecture is made up of a bunch of time periods,
we add the subscript i to indicate an ordering relationship between periods. A relationship between
ti and tj where i < j indicates that time period ti is temporally earlier than tj and there might be other periods between them. You do not need to worry about the specific units of the time periods, just that the periods are consecutive, are represented as integers, do not overlap, and are measured in a consistent way within a single instance of the ADT. We wish to design the implementation of a data-structure that supports the following operations: • Engagement(L,t): Given a single time period t in L, return the engagement score for that time period. L is the ADT storing the time periods (and any other necessary data) for a single lecture. • AverageEngagement(L,ti,tj): Given two time periods ti and tj from L where i ≤ j, return the average engagement score (per time period) for that interval. This average should include both endpoint time periods in the interval if i 6= j. For example, if t1 has engagement score 50, t2 has engagement score 67, t3 has 80, and t4 has 40, then the average engagement over the interval t2 to t4 is 62.33. • Update(L,t,e): Update the engagement score for time period t to e. You can assume that t is already in L. Your implementation will be based on an augmented AVL tree and you should make the following assumptions: 1. n is the number of time periods in a zoom lecture. A lecture is broken up into a collection of time periods t1, t2, …., tn, all stored in L. Your algorithm must work for any size n. Recall that what t means as a time period will always be consistent within a single ADT. 2. A tree L already exists for each lecture. (L.root is the root of the AVL tree.) 3. A node already exists for every time period ti in L where 1 ≤ i ≤ n. Each node has the associated engagement score so you do not need to implement or use Insert. 4. The nodes in L have the in-order traversal t1, t2, …., tn. Here is what you need to do. (a) Describe any additional information that you will need to store in your AVL tree to implement the ADT operations in worst-case running time of O log(n). To get full marks, your data structure must not waste space by storing unncessary data. (b) – (d) For each of the three operations (Engagement, AverageEngagement, and Update), explain the algorithm used to implement the operation and justify that the operation runs in the required worst-case time. For AverageEngagement, provide pseudocode as part of your explanation. (e) Assume that we drop assumption 3. That means L could have some of the ti for 1 ≤ i ≤ n but not necessarily all. For example, if L only has the following time periods: t2 with engagement score 67, t4 with engagement score 40, t5 with engagement score 50, and t7 with engagement score 80, then the average engagement for the interval t2 to t7 is 59.25. The average here is the sum of period scores that we have in the interval / the number of periods (again, that we have in the interval). There is no need for interpolation or estimating the missing data. Describe briefly what you would have to do and what extra information you have to store so that AverageEngagement will calculate the average. Describe any changes you would make to the algorithm for AverageEngagement. You do not need to completely rewrite the pseudocode.