Python数据处理代写 | COMP20008 2021 SM1 Assignment

本次澳洲代写是Python数据处理的一个assignment

The starting repository contains the following files:
main.py: This file contains code to verify your answers. You must not edit anything
in main.py

assignment1.py: This file contains one function for each task in the assignment. You
should fill in the relevant function to complete the task. You may choose to create
additional functions to segment your code, but all the code you write must be
contained in this file.

data/data.json: This file contains details of recent soccer matches in the English
Premier League, which you will need in order to complete your assignment.
data/football: This folder contains a number of news articles about soccer matches
in the English Premier League. You will need to load these files in order to complete
your assignment.

Write a function task1() that loads the data/data.json file into Python. Your function
should return a list of teams codes, sorted in alphabetical order by team code.
You can test your implementation with the following command: python main.py task1

Write a function task2() that uses the information contained in the clubs objects to work out
how many goals were scored by and against each team in total throughout the season. Your
function should output this information to a csv file called task2.csv. Your csv file should
contain the following headings: team code, goals scored by team, goals scored against team.
Each row in the file should contain the details for one team, sorted in alphabetical order by
team code.

You can test your implementation with the following command: python main.py task2

In addition to the information contained in the data.json file, we also have a number of
news articles written about soccer matches. Each article is located in a separate text file in
the data/football folder. For this task we will assume that each article is written about a
match. Write a function task3() to extract the largest match score identified in the article.
Add the number of goals scored by each side together to produce the total number of goals
scored in the match.

For example, if the largest match score mentioned in an article is 14-6, your program
should calculate 20 as the total number of goals. For this task we define the largest match
score as the one with the highest total number of goals, so a score of 14-6 is considered larger
than a score of 16-2.
If a suitable score cannot be found in the article, your function should return 0 as the
total number of goals for that article. You will need to use regular expressions to accomplish
this.

Your function should produce a csv file containing the filename and the total number of
goals for each article. Your csv file should contain two columns, filename and total goals.
Each row in the file should contain the detail for one article, sorted in ascending alphabetic
order by filename. Save this file as task3.csv

You can test your implementation with the following command: python main.py task3

We now wish to understand whether there are outliers present in the number of goals we
calculated in Task 3. Write a function task4() that produces a boxplot showing the distri-
bution of values for total goals. Any values more than 1.5 interquartile ranges above Q3
should be identified as outliers on the plot. This boxplot should be saved as task4.png
For all tasks involving visualisations, you should ensure that your plots contain a title and
labels for all relevant axes.

You can test your implementation with the following command: python main.py task4

We now wish to understand how often each club is mentioned by the media. The data.json
file also contained a list of club names. Write a function task5() that searches through each
of the news articles for mentions of each club and counts the articles for which each club is
mentioned at least once. Your function should produce a csv file containing the club name
and number of mentions for each club. Your csv file should contain the following column
headings: club name and number of mentions. Save this file as task5.csv. Each row in
the file should contain the details for one team, sorted in ascending alphabetic order by club
name. Your function should also produce a bar chart conveying this information, saved as
task5.png.

You can test your implementation with the following command: python main.py task5