Python代写数据可视化 | COMP20008 Elements of Data Processing Assignment

本次澳洲CS代写主要是使用python进行疫情数据分析和可视化

COMP20008 Elements of Data Processing Assignment 1

Due date March 3, 2021

The assignment is worth 20 marks, (20% of subject grade) and is due 8:00am Thursday 1st April 2021 Australia/Melbourne time.

Background Learning outcomes

The learning objectives of this assignment are to:

documentation from Web resources.

Your tasks

There are three parts in this assignment, Part A, Part B, and Part C. Part A and Part B are worth 9 marks each and Part C is worth 2 marks.

Part A (Total 9 marks)
For Part A, download the complete “Our World in Data COVID-19 dataset” (“owid-covid-

Part A Task 1 Data pre-processing (3 marks)

Program in python to produce a dataframe by

􏰀 total cases 􏰀 new cases
􏰀 total deaths 􏰀 new deaths

by month and location in the year 2020.
The dataframe should contain the following columns after completion of this sub-task:

􏰀 location
􏰀 month
􏰀 total cases 􏰀 new cases
􏰀 total deaths 􏰀 new deaths

Note: if there are no entries for certain combinations of locations and months, there should be no entry for those combinations in the dataframe.

The final dataframe should contain the columns in the following order:

􏰀 location
􏰀 month
􏰀 case fatality rate 􏰀 total cases
􏰀 new cases
􏰀 total deaths
􏰀 new deaths

and the rows are to be sorted by location and month in ascending order.

COMP20008 2021 SM1

Print the first 5 rows of the final dataframe to the standard output.
Save the new dataframe to a CSV file named, “owid-covid-data-2020-monthly.csv” in the same directory as the python program. Your program should be called from the command line as follows:

Hint: You will need to use appropriate functions for the aggregation based on your under- standings of the variables.

Part A Task 2 Visualisation (2 marks) Program in python to produce two scatter plots:

Your program should be called from the command line as follows:

Part A Task 3 Discussion and visual analysis (4 marks)
A short report of your visual analysis of the two plots produced from Task 2.

It is expected that the visual analysis would include:

The report is to be 500 – 600 (maximum) words excluding figures, about 1 page, in pdf format, and must include the two plots, scatter-a.png and scatter-b.png, produced from Part A Task 2.
The filename of the report must be “owid-covid-2020-visual-analysis.pdf ”.

Part B Task 1: Regular Expressions (1 mark)

Each article contains a document ID which uniquely identifies the document. This document ID is comprised of four letters followed by a hyphen, followed by three numbers and optionally ending in a letter. For example, each of the following are valid document IDs:

􏰀 ABCD-123
􏰀 ABCD-123V 􏰀 XKCD-999A 􏰀 COMP-200

The document IDs are not located in a consistent place in each article. Use a regular expres- sion to identify the document ID for each document in the dataset. Write a Python program in partb1.py that produces a CSV file called partb1.csv containing the filenames and Doc- ument IDs for each document in the dataset. Your CSV file should contain the following columns in the order below:

􏰀 filename

􏰀 documentID
Your program should be called from the command line along with the name of the CSV file: python partb1.py partb1.csv

Part B Task 2: Preprocessing (1 mark)
We now wish to perform the following preprocessing on each article in the cricket folder in

order to make them easier to search:

Create a Python program in partb2.py that performs this preprocessing.
Your program should be called from the command line along with the filename of a document. For example:

Your program should then load the specified file, perform the preprocessing steps above and print the results to standard output.

Hint: You may wish to create a function for performing this preprocessing as you will need to perform this pre-processing as part of each task in Part B