Python辅导 | Assignment – Analysing use cases
这个任务是使用python分析用例,设计机器学习算法以及评估最终的设计/实现
Introduction
This assignment is about analysing use cases, designing machine learning algorithms and
evaluating the resulting design/implementation. This year’s use case is based on the
MediaEval 2015 “verifying multimedia use” task.
Background: The MediaEval 2015 “verifying multimedia use” task aims to test automatic
ways to classify viral social media content propagating fake images or presenting real images
in a false context. After a high impact event has taken place, a lot of controversial
information goes viral on social media and investigation needs to be carried out to debunk it
and decide whether the shared multimedia represents real information. As there is lack of
publicly accessible tools for assessing the veracity of user-generated content, the task
intends to aid news professionals, such as journalists, to verify their sources and fulfil the
goals of journalism that imposes a strict code of faithfulness to reality and objectivity.
The task is to design/build algorithm(s) to classify social media posts within the MediaEval
2015 “verifying multimedia use” challenge dataset as ‘real’ or ‘fake’.
Definition of fake posts:
• Reposting of real multimedia, such as real photos from the past re-posted as being
associated to a current event
• Digitally manipulated multimedia
• Synthetic multimedia, such as artworks or snapshots presented as real imagery
You will evaluate a set of possible machine learning algorithm designs to classify posts within
the MediaEval 2015 “verifying multimedia use” dataset. You will analyse the use case (task
and data) and identify 5 possible algorithms designs. Each algorithm design will include a
choice of pre-processing, feature selection, dimensionality reduction technique(s) and
machine learning algorithm. In addition you will write a final report. This will explain your
use case analysis, justifying your 5 algorithm design choices and critically review them. This
critical review will identify for each algorithm design 3 strengths and 3 weaknesses, compare
all 5 algorithm designs against each other, and then rank them in order of suitability to the
use case problem (with justifications for the ranking).
Final report
The final report will be structured as below:
• Introduction and data analysis – Describe the problem being addressed. Provide a detailed
characterization of the task dataset in terms of format, volume, quality and bias.
• Algorithm design: Describe 5 possible algorithm designs, each including pre-processing,
feature selection, dimensionality reduction and a machine learning algorithm. Outline all
choices made when selecting these designs, justifying why they were considered good in the
context of the wider options available and data characteristics.
• Evaluation – Describe for each algorithm design 3 strengths and 3 weaknesses, then
critically compare all 5 algorithm designs against each other using these strengths and
weaknesses. Rank your algorithm designs in order of suitability to the task, and include
justifications for this ranking.
• Conclusion – Summarize your findings, and suggest some areas for future improvement
and lessons learnt.
The report PDF document has no page or word limit. However, it would normally be 5 to 10
pages long. Use any document style (e.g. reference style) as long as it’s clear and easy to
read. It is strongly suggested you add a section for (a) Introduction and data analysis, (b)
Algorithm design, (c) Evaluation and (d) Conclusion.
You need to explain both your design and the design choices, alternatives considered, and
justifications for each choice in the context of the data and problem characteristics. This may
involve writing software to examine (a) pre-processing and feature selection options
allowing you to show concrete examples, (b) perform a detailed data characterization and (c)
gain some practical experience with candidate algorithms you might not have used before.
You are not expected to implement 5 algorithms, you need only analyse the use case and
identify 5 possible algorithms designs with enough evidence to justify your choices.
The marking scheme shows you how marks are allocated to each section.