R语言代写 | STA 402/502 Homework 5

这个作业是用R语言编写一个包含现有SAS数据集并分析
STA 402/502 Homework 5
Friday, April 03 2020
Submit an electronic copy of your homework via Canvas as a single file (e.g., pdf or doc file). See the PDF
file ‘Homework Preparation Requirements’ on Canvas (week1 handouts) for more information.
A. (20 point) Write a single DATA step that takes an existing SAS data set and creates a new SAS data
set for which each observation consists of the mean and standard deviation of the absolute values of all
numeric variables in the corresponding observation from the original set. (Note: the new data set should
have the same number of observations as the old data set, but only two variables.) Use arrays with the
functions MEAN, STD, and ABS for the numerical calculations. Your code should employ macro
variables so that it works for any SAS data set. Apply your code to the data set ‘manivars.sas7bdat’ on the
course Canvas site, using PROC PRINT to show the result.
Hint. It can help to try your code on a similar but much smaller data set.
B. (30 point) A research organization conducted an experiment at three different labs, each taking one
numerical measurement per subject per day for 285 consecutive days. Each lab prepared a permanent
SAS data set and you need to combine the three data sets into a single SAS data set with these variables:
Lab, Subject, Day, and Measurement. However, the separate lab data sets each need to be rearranged
before they can be combined. Specifically, they are organized as follows:
Lab 1. 29 observations with variables Subject, d1, d2,…, d285,
Lab 2. 32 observations with variables ID, day1, day2,…, day285.
Lab 3. 33 observations with variables Label, day_001, day_002,…, day_285.
Write a SAS program that produces the combined data set from the three given sets (lab1results.sas7bdat,
etc. on the course Canvas site). To convince the instructor that the combined data set AllResults is correct,
generate output using the following code:
proc contents data=AllResults varnum;
run;
proc means data=AllResults mean;
by Lab;
var Subject Day / weight=Measurement; * nonsense
calculation;
run;
C. (20 point) A fourth lab also collected data (lab4results.sas7bdat on Canvas) for the experiment in
Problem B. However, the Lab 4 data is transposed relative to Labs 1-3: it has 285 observations
corresponding to days, with variables Subj1, Subj2,…, Subj25. Write a SAS program that reshapes the
Lab 4 data into a set lab4reshaped so that it, like AllResults above, has these variables: Lab, Subject, Day,
and Measurement. Generate output using the following code:
proc means data=lab4reshaped N mean;
var Subject Day / weight=Measurement; * nonsense calculation;
run;
D. (4.8 Exercises Problem 2, 30 point) Fifty animals were exposed to one of five concentration levels of
nitrofen (10 animals per group, but some observations might be missing in the data). The data was
recorded separately for three broods produced by each of the 50 animals. Thus, each animal can have data
in each of the three brood data sets. A particular animal is uniquely identified by the ID variable. Produce
a combined data set containing observations (animals and IDs) that has data on all three broods. In
addition, construct an additional variable for the total number of young produced in all three broods.
Compare the different joins for this data. In particular, construct the following:
Brood1 [inner join] Brood2 [inner join] Brood3
(Brood1 [left join] Brood2) [left join] Brood3
(Brood1 [right join] Brood2) [right join] Brood3
Brood1 [full join] Brood2 [full join] Brood3
Print the results for the different combined data sets. Multiple observations are given on each line.
data B1; * Brood=1 data;
input ID conc number of young @@;
datalines;
3 0 6 4 0 6 5 0 6 6 0 5 7 0 6 8 0 5 9 0 3 10 0 6
12 80 5 13 80 6 14 80 5 15 80 8 16 80 3 17 80 5 18 80 7
19 80 5 20 80 3
21 160 6 22 160 6 23 160 2 24 160 6 25 160 6 26 160 6
27 160 6 28 160 5 30 160 6
31 235 4 32 235 6 34 235 6 35 235 6 36 235 6 37 235 7
38 235 4 39 235 6 40 235 7
41 310 6 42 310 6 43 310 7 44 310 0 45 310 5 47 310 6
48 310 4 49 310 6 50 310 5
;
run;
data B2; * Brood=2 data;
input ID conc number of young @@;
datalines;
1 0 14 2 0 12 3 0 11 4 0 12 6 0 14 7 0 12 8 0 13
9 0 10 10 0 11
11 80 11 13 80 11 14 80 12 15 80 13 16 80 9 17 80 9
18 80 12 19 80 13 20 80 12
21 160 12 22 160 12 23 160 8 24 160 10 25 160 11
26 160 13 27 160 12 29 160 13 30 160 12
31 235 13 32 235 10 33 235 5 34 235 0 35 235 13
36 235 0 37 235 0 38 235 2 39 235 8 40 235 0
41 310 0 42 310 0 43 310 0 45 310 10 46 310 0 47 310 0
48 310 0 49 310 0 50 310 0
;
run;
data B3; * Brood=3 data;
input ID conc number of young @@;
datalines;
1 0 10 2 0 15 3 0 17 4 0 15 5 0 15 6 0 15 7 0 15
8 0 12 10 0 14
11 80 16 12 80 16 13 80 18 14 80 16 15 80 15 16 80 14
17 80 13 18 80 12 19 80 14 20 80 14
21 160 11 22 160 11 23 160 13 24 160 11 25 160 13 26 160 12
27 160 12 28 160 11 29 160 10 30 160 11
31 235 6 32 235 5 33 235 0 34 235 6 35 235 8 36 235 10
38 235 9 39 235 7 40 235 10
41 310 0 42 310 0 43 310 0 44 310 0 45 310 0 46 310 0
48 310 0 49 310 0 50 310 0
;
run;