# Python数据科学代写 | HW3 – Errorbars and correlation

本次美国代写是Python数据科学相关的一个assignment

To complete this homework, you need to download one csv file, which contain the monthly totals

of the number of new cases of measles, mumps, and chicken pox, respectively, for New York City

during the years 1931-1971 (for a total of 41 years). The data file contains 123 rows and 12

columns. Each column represents a month from Jan to Dec. The first 41 rows are the number of

new measles cases in each year during that period, the next 41 rows are for mumps, and the

remaining 41 rows are chicken pox. The rows are ordered by the years in chronical order.

Complete the python script skeleton to analyze the data for the following tasks. For your

information, data has been loaded with the Pandas package and reorganized into a Numpy 3D

array of shape (3, 41, 12), where the first dimension represents the three diseases in the order

mentioned above. Several other variables are also defined for your convenience.

Q1 (10 pts). Calculate the mean of the number of cases per year for each disease, and estimate

the 95% confidence interval of the mean (Lec4.pptx slide #4). Plot as an errorbar. (Use marker=’d’,

linestyle=”, capsize=5 to show a figure similar to example Figure 1 on the next page.)

Q2 (10 pts). For each disease, calculate the fraction of cases occurred in each month of the year

during this period of time. You will end up with a matrix C of size 3 x 12, where each row is for a

disease, and the value in the i-th row and j-th column, Cij, is the total number of cases of disease

i occurred in month j (of all 41 years), divided by the total number of cases of disease i. (Hint: use

matrix multiplication instead of for loops for this if you can.) Plot the vectors as three lines in one

graph. (See example figure 2.)

Q3.1 (8 pts) Scatter plot the mean monthly mumps cases occurred in each month of the year

during this period of time against the mean monthly chickpen pox cases. In other words, you are

scatter plotting two vectors, x, and y, each of which has 12 values, representing the average

number of mumps (chicken pox) cases in each of the 12 months averaged over 41 years. (See

example figure 3.) Annotating the figure with months is optional (lecture2 slides #27).

Q3.2 (7 pts) Calculate the Pearson correlation coefficient as well as the spearman correlation

coefficient between the mean monthly mumps cases and mean monthly chicken pox cases (the

two vectors x and y you calculated in Q3.1), print out on screen, and display the values (with a

precision 0.0001) in the upper left corner of the figure (decide the x and y positions ad hoc from

your figure).

Q4.1 (8 pts) Scatter plot the total number of mumps cases in each year against that of chicken

pox cases. In other words, you are scatter plotting two vectors, x, and y, each of which has 41

values, representing the total number of mumps or chicken pox cases in year 1931, 1932, etc.)

(See example figure 4.)

Q4.2 (7 pts) Calculate the Pearson correlation coefficient as well as the spearman correlation

coefficient between the annual mumps cases and annual chicken pox cases (the two vectors x

and y you calculated in Q4.1), print out on screen, and display the values (with a precision 0.0001)

in the bottom lower corner of the figure. (decide the x and y positions ad hoc from your fig).

Q5 (10 pts) Calculate and show the correlation matrix between each of the 12 months for the

number of mumps cases. More formally, you have a matrix M of size 41 x 12, where Mij is the

number of mumps cases in year i and month j. You need to calculate a matrix C of size 12 x 12,

where Cij is the correlation between Mi and Mj. Mi is the i-th column of M. Use plt.imshow(C) to

display the matrix, and plt.colorbar() to show the color map. Changing the months from 0-11 to 1-

12 is optional but can be done with xticks and yticks as usual: xticks(range(12), range(1,13)). (See

example Fig 5.)