Python数据分析代写 | QBUS2820 Predictive Analytics Individual Assignment 1


1. Required submissions (through Canvas/Assignments/Individual Assignment 1)

a. ONE written report (word or pdf format) for both tasks.
b. ONE Jupyter Notebook .ipynb file for Task A.

2. Due date/time and closing date/time: See Canvas. The late penalty for the
assignment is 5% of the assigned mark per day, starting after 4pm on the due date.

The closing date/time is the last date/time on which an assessment will be accepted
for marking.

3. Weight: 30% of the total mark.

4. Length: The main text of your report (including Task A and Task B) should have a
maximum of 15 pages with the usual font size 11-12. For Task A, you should write a
complete report including sections such as business context, problem formulation,
data processing, EDA, methodology, analysis, conclusions and limitations, etc.

5. If you wish to include additional material, you can do so by creating an appendix.
There is no page limit for the appendix. Keep in mind that making good use of your
audience’s time is an essential business skill. Every sentence, table and figure have
to count. Extraneous and/or wrong material will reduce your mark no matter the
quality of the assignment.

6. Anonymous marking: As the anonymous marking policy of the University, please
only include your student ID in the submitted report, and do NOT include your
name. The file name of your report and code file should follow the following format.
Replace “SID” with your Student ID. Example: SID_Qbus2820_Assignment1.

7. Presentation of the assignment is part of the assignment. Markers will allocate 5
marks for clarity of writing and presentation. Numbers with decimals should be
reported to the four-decimal point.

Carefully read the requirements for each part of the assignment.

Please follow any further instructions announced on Canvas.

You must use Python for the assignment. Use “random_state= 1” when needed, e.g.
when using “train_test_split” function of Python. For all other parameters that are not
specified in the questions, use the default values of corresponding Python functions.

Reproducibility is fundamental in data analysis, so that you will be required to submit a
Jupyter Notebook that generates your results. Not submitting your code will lead to a
loss of 50% of the assignment marks.

Failure to read information and follow instructions may lead to a loss of marks.
Furthermore, note that it is your responsibility to be informed of the University of
Sydney and Business School rules and guidelines, and follow them.

Referencing: Harvard Referencing System. (You may find the details at:

Task A (Lab): 70 Marks

You will work on the NBA salary dataset.

Note: This task does not require prior knowledge of basketball. You should not add any
personal subjective assumptions about these data based on your existing knowledge. This
can lead to inaccurate results. You should use the techniques that we learnt and you
discovered to complete the prediction task.

1. Problem description

As a consultant working for a sports analytics company, the NBA league approached you to
develop predictive models to predict NBA salaries based on data analysis techniques. To
enable this task, you were provided with a dataset containing highly detailed performance
of the NBA players.

Select two models (or three) to predict NBA player salary from performance statistics.
These models are:

 a linear regression model,
 a kNN regression model,
 The third model is optional, and you might be given maximum 10 bonus marks for
this. This model can be any model of your choice (might be a Kernel regression
model, or even a model not covered in the QBUS2820 unit). This is to encourage
you to self-explore and self-study, since the ability of self-study is critical in the field
of machine learning which is evolving rapidly.

As part of the contract, you need to write a report according to the details below.

You can download the “NBA_Train.csv” and “NBA_test.csv” data for the Canvas. The
response is the SALARY($Millions) column in the dataset.

NBA glossary link below and the glossary Table at the end of the Task A can help you
understand the meaning of the variables better:

You should use the given test set to evaluate the performance of your work. The
performance/scoring metric is Root Mean Squared Error (RMSE), for the test set.

Your target of the test set RMSE should be less than 4.1 ($Millions).

The purpose of the report is to describe, explain, and justify your solution to the client with
polished presentation. Be concise and objective. Find ways to say more with less. When it
doubts, put it in the appendix. You can refer to the file “TaskA_instructions” on more
detailed instructions on how to work on Task A including writing the report.