Python代写 | Question 1: NZ Health survey data – Wrangling, reshaping, functions and plotting

本次新西兰代写主要为Python机器学习限时测试

Question 1: NZ Health survey data – Wrangling, reshaping, functions and plotting (35 marks)
This question relates to the “health_survey.csv” dataset, that you can download from the Stream site. This dataset is
from the NZ Health survey. You can learn more about this dataset and what the column names and labels represent
here:
https://www.health.govt.nz/nz-health-statistics/national-collections-and-surveys/surveys/new-zealand-health-survey

a) Importing data:
• Read in the health data and save to a dataframe object. There is an encoding argument that can be set
when using read_csv. You may need to set this to ‘latin’ to avoid the ‘utf-8′ codec can’t decode…’ error (see
the Pandas documentation for further information).
• Remove the first unnamed column and the seven ‘p.value’ columns.
• Change all ‘percent’ column name to their associated ‘Year’ values (e.g. the name for ‘percent.16’ changes
to ‘2016’) and change the column name of ‘short.description’ to ‘description’.
(3 marks)
b) Filtering:
• Display the unique labels in the ‘description’ column so that you can inspect them.
• Save a new dataframe object (with an appropriate name) into a new memory location, that contains all the
rows that meet all of the following criteria:
• That match six of the ‘description’ labels of your choosing. For instance, if you were interested in
knowing about ‘Physically active’, ‘Anxiety disorder’, ‘Daily smokers’, ‘Diabetes’, ‘Healthy weight’, and
‘Self-rated health – very good’, then your dataframe would contain only rows that matched these
‘description’ labels,
• That also match the ‘Total’ label in the Group column
• That also match the ‘adult’ label in the population column

Note: this is not an ‘either/or’ filter. All rows in your dataframe must meet all of the above conditions. Only a
maximum of half marks will be awarded if you choose the exact same ‘description’ labels as given in the