Python代写 | COMP9321 Assignment-1

这个Assignment是用Python对电影数据进行分析、绘制图表等
Assignment-1
The assignment data has been extracted from a Movie dataset on Kaggle , with some
minor modification to make things interesting. The dataset is split into two CSV files
credits and movies . Use the datasets to answer the following questions:
Question 1: (based on the both datasets) (0.5 Mark)
Join the two datasets based on the “id” columns in the datasets, keeping the rows
as long as there is a match between the id columns of both dataset (do not
concatenate the datasets).
Question 2: ( based on the dataframe created in Question-1 ) ( 0.5 Mark )
Keep the following rows in the resultant dataframe (remove the rest of columns
from the result dataset):
‘ ‘id’, title’, ‘popularity’, ‘cast’, ‘crew’, ‘budget’, ‘genres’, ‘original_language’,
‘production_companies’, ‘production_countries’, ‘release_date’, ‘revenue’, ‘runtime’,
‘spoken_languages’, ‘vote_average’, ‘vote_count’
Question 3: ( based on the dataframe created in Question-2 ) ( 0.5 Mark )
Set the index of the resultant dataframe as ‘id’.
Question 4: ( based on the dataframe created in Question-3 ) ( 0.5 Mark )
Drop all rows where the budget is 0
Question 5: (based on the dataframe created in Question-4) (1 Mark)
Assume that there is a ranking scheme for movies defined by ” (revenue –
budget)/budget “. Add a new column for the dataframe, and name it
“success_impact”, and calculate it for each movie based on the given formula.
Question 6: (based on the dataframe created in Question-5) (1 Mark)
Normalize the ” popularity ” column by scaling between 0 to 100. The least popular
movie should be 0 and the most popular one must be 100.
Question 7: (based on the dataframe created in Question-6) ( 0.5 Mark )
Change the data type of the “popularity” column to (int16).
1/9
Question 8: (based on the dataframe created in Question-7) (1.5 Marks)
Clean the “cast” column by converting the complex value (JSONs) to a comma
separated value. The cleaned “cast” column should be a comma-separated value of
characters and alphabetically sorted according to their names (e.g., Angela, Athena,
Betty, Chester Rush ) .
Question 9: (based on the dataframe created in Question-8) (1.5 Marks)
Return a list, containing the names of the top 10 movies according to the number
of characters in movie. The first element in the list should be the movie with the
most number of characters.
Question 10 : (based on the dataframe created in Question-8) (1 Marks)
Sort the dataframe by the release date (the most recently released movie should
be first row in the dataframe)
2/9
Question 11: (based on the dataframe created in Question-8) (2 Marks)
– ( 1 .5 Mark ) Plot a pie chart, showing the distribution of genres in the dataset
(e.g., Family, Drama).
– ( 0.5 Mark ) Show the percentage of each genre in the pie chart. Please be noted
that the following figure is just a sample and it does not reflect the real values or
the list of all genres in the dataset.
3/9
Question 12 : (based on the dataframe created in Question-8) (2 Marks)
– (1.5 Marks) Plot a bar chart of the countries in which movies have been
produced. For each county you need to show the count of movies.
– (0.5 Mark) Countries should be alphabetically sorted according to their names.
Please be noted that the following figure is just a sample and it does not reflect the
real values or the list of all countries in the dataset.
4/9
Question 13: (based on the dataframe created in Question-8) (2.5 Marks)
– (1.5 Marks) Plot a scatter chart with x axis being “vote_average” and y axis being
“success_impact”.
– (0.5 Marks) Ink bubbles based on the movie language (e.g, English, French); In
case of having multiple languages for the same movie, you are free to pick any one
as you wish.
– (0.5 Marks) Add a legend showing the name of languages and their associated
colors.
Please be noted that the following figure is just a sample and it does not reflect the
real values or the list of all countries in the dataset.
What not to forget!
Due Date: Friday the 13th of March 2020 17:59
Submit your script named ” YOUR_ZID .py” (z2123232.py) which contains your code.
You are required to use the following code template ( it is not complete; please
download the file ) for your submission:
5/9
import ast
import json
import matplotlib.pyplot as plt
import pandas as pd
import sys
import os
studentid = os.path.basename(sys.modules[__name__].__file__)
#################################################
# Your personal methods can be here …
#################################################
def log(question, output_df, other):
print(“————— {}—————-“.format(question))
if other is not None:
print(question, other)
if output_df is not None:
print(output_df.head(5).to_string())
def question_1(movies, credits):
“””
:param movies: the path for the movie.csv file
:param credits: the path for the credits.csv file
:return: df1
Data Type: Dataframe
Please read the assignment specs to know how to create the output dataframe
“””
#################################################
# Your code goes here …
#################################################
log(“QUESTION 1”, output_df=df1, other=df1.shape)
return df1

if __name__ == “__main__”:
df1 = question_1(“movies.csv”, “credits.csv”)
df2 = question_2(df1)
df3 = question_3(df2)
df4 = question_4(df3)
df5 = question_5(df4)
df6 = question_6(df5)
df7 = question_7(df6)
df8 = question_8(df7)
movies = question_9(df8)
df10 = question_10(df8)
question_11(df10)
question_12(df10)
question_13(df10)
You can download the code template from :
https://raw.githubusercontent.com/mysilver/COMP9321-DataServices/master/20t1/z1111111.py
If you do not follow this structure, you will not be marked.
You can only add codes in the specified lines (do not edit the rest of the lines):
6/9
#################################################
# Your code goes here …
#################################################
If your code does not run on CSE machines for any reasons (e.g., hard-coded
file path such as C://Users/), you will be penalize at least by 5 marks. We
assume that the two csv files are located in the same directory of your script,
and the name is the same as the one in the template (movies.csv, and
credits.csv)
Please look at the documentation for each question method; it describes the
inputs (e.g., a dataframe) and output (e.g., dataframe, list of movies) of the
method.
“””
:param df7: the dataframe created in question 7
:return: df8
Data Type: Dataframe
Please read the assignment specs to know how to create the output dataframe
“””
Please use the same variable names as mentioned in the comments (e.g., in
question 8, you are supposed to create a dataframe and name it df8
In the last three questions, you need to plot charts; please do not use
“plt.show()” function to pop up charts. The code template will automatically
save the chart on the disk. What you need to do is to just call the plot
functions of the dataframe (e.g., df.plot.pie()). We highly recommend you go
through the lab activities to know how to plot charts.
FAQ:
Can I pass extra variables to functions?
No
Can we create our own functions besides the question functions (e.g.,
question_1)?
Yes
Can I call another function inside the question functions? e.g., calling
question_1 inside question_2
Yes
What should I do if my charts are not shown automatically?
Look at the lab sample codes; if still need a help, ask your tutor during the labs.
How should I print my dataframe?
print(df.to_string())
Is it okay that the graph for Q8 does not pop up until the graph for Q7 is
closed or should they both pop up at the same time?
This is fine
7/9
Do the charts need to look the same (colors, legend position, grid) as the
examples shown? or would it be fine to just use the default plotting from
pandas?
The default colours/fonts are fine
How are our submissions marked?
They are marked manually by tutors, by running the following command: python3
z{YOUR_ZID}.py
What python packages can I use in my assignment?
You can only use pandas and matplotlib to do the assignment.
What version of python should I use?
Python 3+
How I can submit my assignment?
Go to the assignment page click on the “Make Submission” tab; pick your files
which must be named “YOUR_ZID.py”. Make sure that the files are not empty, and
submit the files together.
Can I submit my file after deadline?
Yes, you can. But 25% of your assignment will be deducted as a late penalty per
day. In other words, if you be late for more than 3 days, you will not be marked.
Plagiarism
This is an individual assignment . The work you submit must be your own work.
Submission of work partially or completely derived from any other person or jointly
written with any other person is not permitted. The penalties for such offence may
include negative marks, automatic failure of the course and possibly other academic
discipline. Assignment submissions will be examined manually.
Do not provide or show your assignment work to any other person – apart from the
teaching staff of this course. If you knowingly provide or show your assignment work to
another person for any reason, and work derived from it is submitted, you may be
penalized, even if the work was submitted without your knowledge or consent. Pay
attention that is also your duty to protect your code artifacts . if you are using any
online solution to store your code artifacts (e.g., GitHub) then make sure to keep the
repository private and do not share access to anyone.
Reminder: Plagiarism is defined as using the words or ideas of others and presenting
them as your own. UNSW and CSE treat plagiarism as academic misconduct, which
means that it carries penalties as severe as being excluded from further study at UNSW.
There are several on-line sources to help you understand what plagiarism is and how it is
dealt with at UNSW:
Make sure that you read and understand these. Ignorance is not accepted as an excuse
for plagiarism. In particular, you are also responsible for ensuring that your assignment
files are not accessible by anyone but you by setting the correct permissions in your CSE
8/9
Resource created 27 days ago, last modified about 8 hours
ago.
directory and code repository, if using one (e.g., Github and similar). Note also that
plagiarism includes paying or asking another person to do a piece of work for you and
then submitting it as your own work.
UNSW has an ongoing commitment to fostering a culture of learning informed by
academic integrity. All UNSW staff and students have a responsibility to adhere to this
principle of academic integrity. Plagiarism undermines academic integrity and is not
tolerated at UNSW.
9/9