python辅导 | ECON 570 Problem Set 1

ECON 570 Problem Set 1
Due: October 4, 2019
1 Computational Complexity
A. Write custom functions for
1. Calculating the inner product of two vectors.
2. Calcuatling the product of two matrices.
B. Generate random m × n matrix X and n × p matrix Y , for m = 50, n = 100, p = 200.
Compute
1. The amount of time it takes to multiply X and Y using the custom code you
wrote.
2. The amount of time it takes to multiply X and Y using built-in NumPy methods.
C. Now increase m, n, p each by a factor of 10.
1. How long do you expect it would take to multiply X and Y using your custom
code? How long does it actually take?
2. How long do you expect it would take to multiply X and Y using built-in NumPy
methods? How long does it actually take?
D. Increase m, n, p each by a factor of 10 again, and repeat the above but using NumPy’s
built-in methods only. How long does it take to multiply X and Y ? Did it increase by
the same factor as it did before when all the dimensions were increased by a factor of
10? Why or why not?
E. Generate a n × p matrix and a n-vector y.
1. Set n = 5000, p = 200. How long does it take to regress y on X?
1
2. Set n = 50000, p = 200. How long do you expect the same regression would take?
How long does it actually take?
3. Set n = 5000, p = 2000. How long do you expect the same regression would take?
How long does it actually take?
2 Breast Cancer Data
Use the breast cancer data from sklearn to perform the following exercises.
A. Load the breast cancer data with the load breast cancer method from the module
sklearn.datasets.
B. Standardize each feature in the data set.
C. Perform PCA on the standardized features. How many principle components must we
keep to explain 90% of the total variance? How much variance is explained if we keep
2?
D. Perform k-means with k = 2 on the full set of features, and on the first 2 principle
components only. Compare how well the clusters found by k-means in each of these
cases compare to the true targets of the data set.
3 Olivetti Faces
Use the Olivetti faces data set available through sklearn to do the following.
A. Fetch and load the data with the fetch olivetti faces method from the module
sklearn.datasets.
B. Demean each face in the data set (no need to divide by standard deviation as every
dimension is a number between a fixed range representing a pixel).
C. Compute and display the first 9 eigenfaces.
D. In class we showed that any given face in the data set can be represented as a linear
combination of the eigenfaces. For any face in the data set, show how it progresses as
we combine 1, 51, 101, . . . eigenfaces, until the full image is recovered.
2