Python代写|Data 102 Assignment 4



Submit your writeup including all code and plots as a PDF via Gradescope.1 We recom
mend reading through the entire homework beforehand and carefully using functions for
testing procedures, plotting, and running experiments. Taking the time to test, maintain,
and reuse code will help in the long run!

Data science is a collaborative activity. While you may talk with others about the
homework, please write up your solutions individually. If you discuss the homework with
your peers, please include their names on your submission. Please make sure any hand
written answers are legible, as we may deduct points otherwise.

Please note that this homework is slightly shorter than usual, to give you
time to start working on your project.

1 Observational Data on Infant Health

The Infant Health and Development Program (IHDP) was an experiment treating low
birth-weight, premature infants with intensive high-quality childcare from a trained provider.

The goal is to estimate the causal effect of this treatment on the child’s cognitive test
scores. The data does not represent a randomized trial with randomly allocated treat
ment, so there may be confounders between treatment and outcome. In this problem, we
devise a propensity score model to control for observed confounders.

(a) (2 points) The CSV file ihdp.csv has 27 columns:

In this part, you’ll estimate ˆ e(x) (the predicted probability that zi = 1) by fitting a
logistic regression model that predicts zi from xi. Specifically:

1. Read the data in ihdp.csv (e.g. using the csv package in Python) into three
arrays: Z ∈ {0, 1}n containing the treatments, Y ∈ Rn containing the outcomes,
and X ∈ Rn×25 containing the features.

2. To fit a logistic regression model, use the scikit-learn package in Python,
which is imported as sklearn. Start with the following two lines: