大数据代写 | COSC2636/2632 Big Data Management

这个作业是给定一组广告牌(每个都有影响力和成本)和一个预算,在预算内找到一组广告牌,它们的数量最大
COSC2636/2632 Big Data Management
Assignment 3
Assessment Type: Individual assignment; no group work. Submit online via Canvas→Assignments→Assignment 3. Marks
awarded for meeting requirements as closely as possible. Clarifications/updates may be made via announcements.
Due date: 23:59pm, Sunday of Week 11, 24/5/2020; Deadlines will not be advanced but they may be extended.
Please check Canvas→Syllabus or via Canvas→Assignments→Assignment 3 for the most up to date information.
As this is a major assignment in which you demonstrate your understanding, a university standard late penalty of
10% per each working day applies for up to 5 working days late, unless special consideration has been granted.
Weighting: 39 marks
1. Overview
Problem Description:
Given a set of billboards (each with an influence and a cost), and a budget, find a set of billboards within the budget, which have the maximum
total influence.
Step-by-Step Guidance to complete this assignment:
a. Read the paper “Trajectory-driven Influential Billboard Placement” which has been introduced as part of the lecture.
i. Sections 1-4 of the paper.
ii. Understand the EnumSel Algorithm (Algorithm 2).
iii. Understand the PartSel Algorithm (Algorithm 3).
b. Have a look at the Billboard dataset (available at /dataset/dataset.txt).
c. Understand how to find the local solution for each cluster
i. Implement Algorithm 1 (i.e., GreedySel) first.
ii. Enumerate the candidate sets where the cardinality is smaller than 4 (Algorithm 2, line 2.6).
iii. Based on each candidate set, calling Algorithm 1 to fill up each candidate set until reach the budget constraint (Algorithm 2, lines 2.7-
2.9).
iv. Output candidate sets as local solution.
d. Understand how to find the global solution based on local solutions
i. Calculate the influence 𝛏 [𝐢][𝐥𝐢
] of the local solution for a given cluster 𝐂𝐢 and a budget 𝐥𝐢
(Algorithm 3, line 3.7). (In order to simplify this
assignment, we provide you the billboard object with influence and cost. Therefore, you do not need to calculate 𝑝𝑟(𝑏𝑖
,𝑡𝑗).)
ii. Using a dynamic programming approach to construct the global solution based on the influence of location solutions 𝛏 [𝐢][𝐥𝐢
]
maintained by different clusters (Algorithm3, lines 3.8-3.9).
e. Complete three algorithms (i.e., GreedySel.java, EnumSel.java, and PartSel.java)
Besides the specification, we provide the Ass3.zip that includes the java program that you need to complete, and a readme.pdf file that explains
the structure of the java program. Before you start to implement your codes, please read the Readme.pdf carefully.
What you will learn:
You need to understand the basic greedy algorithm, and the dynamic programming algorithm. You need to understand why we first find the local
solution from each cluster, then generate the global solution from each local solution. You need to think about the how the different size of
cluster affects the efficiency and effectiveness.
2. Marking Criteria
We will evaluate your program in terms of its correctness and efficiency on our test cases. We have prepared seven test datasets: two for
GreedySel; two for EnumSel; three for PartSel.
1. Correctness is worth 21 marks.
1. For testing GreedySel, we test two datasets, each of them is worth 1 mark.
2. For testing EnumSel, we test two datasets, each of them is worth 2 marks.
3. For testing PartSel, we test three datasets, each of them is worth 5 marks.
,
2. The efficiency is worth 6 marks, 2 marks for each algorithm. We will not evaluate the efficiency of one algorithm that fails to pass all
corresponding test datasets. For example, if GreedySel does not pass all two tests, the efficiency of GreedySel will not be tested. For the
efficiency evaluation, the mark is calculated based on the following conditions:
1. X/Y<= 1.5, efficiency marks=2.
2. 1.5<X/Y< 3, efficiency marks=1.
3. 3<X/Y, efficiency marks=0.
where X is the average running time (in seconds) of your program and Y is the one of our programs. For each algorithm, among all tests
(two for GreedySel; two for EnumSel; three for PartSel), we evaluate the efficiency based on the worst case respectively.
3. At last, you are required to answer five questions in the report document. They are worth 12 marks. You can find the five questions at
the last of this document.
Note. No marks will be given if you change the function “main()” (e.g., output format and default parameter setting.).
Note: Please ensure that you have read section 1 and 2 of this document before going further.
3. Learning Outcomes
This assessment is relevant to the following Learning Outcomes:
1. Devise solutions to simple computing problems under specific requirements
2. Encode the devised solutions into computer programs and test the programs on a computer
3. Demonstrate understanding of standard coding conventions and ethical considerations in programming.
4. Referencing guidelines
What: This is an individual assignment, and all submitted contents must be your own (except for the main body of the code provided if any). If
you have used sources of information other than that, you must give acknowledge the sources and give references using IEEE referencing style.
Where: Add a code comment near the work to be referenced and include the reference in the IEEE style.
How: To generate a valid IEEE style reference, please use the citethisforme tool if unfamiliar with this style. Add the detailed reference before
any relevant code (within code comments).
5. Submission format
Submit a zip file via Canvas→Assignments→Assignment 3. The zip file should contain two parts:
(1) the report (using the naming convention StudentNumber_A3.pdf (e.g., s1234567_A3.pdf)), and
(2) one zip file consists of the whole “Ass3” folder.
Naming convention: StudentNumber_A3.zip (e.g., s1234567_A3.zip).
It is the responsibility of the student to correctly submit their files. Please verify that your submission is correctly submitted by downloading what
you have submitted to see if the files include the correct contents.
6. Academic integrity and plagiarism (standard warning)
Academic integrity is about honest presentation of your academic work. It means acknowledging the work of others while developing your own
insights, knowledge and ideas. You should take extreme care that you have:
• Acknowledged words, data, diagrams, models, frameworks and/or ideas of others you have quoted (i.e. directly copied), summarised,
paraphrased, discussed or mentioned in your assessment through the appropriate referencing methods,
• Provided a reference list of the publication details so your reader can locate the source if necessary. This includes material taken from
Internet sites.
If you do not acknowledge the sources of your material, you may be accused of plagiarism because you have passed off the work and
ideas of another person without appropriate referencing, as if they were your own.
RMIT University treats plagiarism as a very serious offence constituting misconduct. Plagiarism covers a variety of inappropriate
behaviours, including:
• Failure to properly document a source
• Copyright material from the internet or databases
,
• Collusion between students
For further information on our policies and procedures, please refer to the University website.
7. Assessment declaration
When you submit work electronically, you agree to the assessment declaration.
8. Rubric/assessment criteria for marking
Please refer to section 4 for details. The marking is based on the correctness of your implemented algorithm, as well as the efficiency upon the
correctness of the results obtained.
9. Report questions
Answer the following questions (12 marks).
• Q1 (2 marks). What is the advantage of EnumSel, as compared to GreedySel? What is the advantage of PartSel, as compared to EnumSel?
• Q2 (2 marks). Does the naive greedy hold any theoretical guarantee on the approximation ratio of the result? If yes, what is the
theoretical guarantee? If no, please explain the reason.
• Q3 (2 marks). Let 𝑝𝑟(𝑏𝑖
,𝑡𝑗) denote the influence of a billboard 𝑏𝑖
to a trajectory 𝑡𝑗
, let 𝑝𝑟(𝑆,𝑡𝑗) denote the influence of a billboard set 𝑆
to a trajectory 𝑡𝑗
, can we compute 𝑝𝑟(𝑆,𝑡𝑗) as ∑ 𝑝𝑟(𝑏𝑖
,𝑡 𝑏 𝑗)
𝑖∈𝑆 ? If not, how can we get 𝑝𝑟(𝑆,𝑡𝑗)?
• Q4 (3 marks). In the paper, how does the 𝜃 − 𝑝𝑎𝑟𝑖𝑡𝑖𝑡𝑖𝑜𝑛 affect the efficiency and effectiveness of PartSel? What if 𝜃 is too large or too
small (e.g., 𝜃 = 0 𝑜𝑟 1)?
• Q5 (3 marks). In the paper, θ controls the size of clusters, which affects the efficiency and effectiveness of PartSel. Will the size of clusters
also affect the efficiency and effectiveness in your experimental results? If yes, how does it works? If no, why does it not work?