Python代写|CS 3753 HW5

本次美国代写是Python数据科学的一个Homework

1. Pandas plots, probability models. (24 pts)

Use pandas to load hw5q1.xlsx file into a dataframe. The first line in the file is column header, and there is no
row index. (Your data should have 1000 rows and 4 columns.) Do the following.

a. (4pts) Show a boxplot of the data.

b. (4pts) Apply log2 transformation (with applymap and np.log2) to the data and show the boxplot.

c. (4pts) Use pandas function describe() to print out the summary statistics of the data

d. (4pts) Use pandas function hist to show the histogram of each column of the data frame. (Use option
density = True so it plots probability instead of counts.)

e. (4pts) Use pandas function hist to show the histogram of the log transformed data frame. (Use option
density = True so it plots probability instead of counts.)

f. (4 pts) Based on the information and plots you obtained above, which column do you think is from
normal distribution and which column is from log normal distribution? (FYI: data in the four columns
come from four different distributions we discussed in class: normal, lognormal, exponential, and
pareto. See slides 4.1-stats.pptx page 27-49.).

2. Pandas DataFrame operations (36 points)

a. (6 pts) Load data stored in brfss.csv format into a python DataFrame. The first line in the file is column
header, and the first column in the file is row index. Drop the rows that have any NaN values, and drop
the ‘wtyrago’ and ‘wtkg2’ column. Change the sex column so that 1 (True) means male and 0 (False)
means female. (Originally, sex == 2 means female and sex == 1 means male.) Rename the columns to
age, weight, height, and male. Name the new DataFrame as brfss. Print out the shape (number of rows
and columns) of brfss.

b. (30 pts – 3 pts each) Based on the DataFrame brfss, print out answers to the following questions using
pandas DataFrame functionality.

i. What is the max age for people in the dataset?
ii. What is the mean weight for people in the dataset?
iii. What is the mean weight for male in the dataset?
iv. What is the median height for female in the dataset?
v. What is the mean weight for female younger than 20 years old?
vi. How many males are in the dataset?
vii. How many individuals in the dataset has height > 190cm and weight < 50kg?
viii. What is the average height of females whose weight is between 59 and 61 kg?
ix. Print out row 2001 to row 2010 (inclusive, a total of ten rows) from the dataframe. (Just to be
sure, our row count starts from row 0.)
x. Print out rows with row index from 2001 to 2010 (including 2010, but maybe less than 10 rows
due to NaNs being dropped out) from the dataframe. (Note that the row indices of this
dataframe are integers assigned when we loaded the data from the csv file; these indice have
been kept for each record in the lifetime of the dataframe unless we change them explicitly.)