Python代写 | MGTF 495: Machine Learning Homework 2

1. Instructions
The answers to the questions and the code should be submitted on Canvas by 23 May 2021, 11:59 pm.
You may submit the notebook downloaded as PDF to Canvas, but please make sure that the questions
are clearly segmented and labelled. To secure full marks for a question both the answer and the code
should be correct. Completely wrong (or missing) code with correct answer will result in zero marks.

2. Data
Download the MNIST train and test data from Canvas along with their corresponding label files. The train
and test data consist of 6000 and 1000 binarized MNIST images respectively.

3. Generative Learning
Please don’t use the direct function from scikit-learn library for questions 1, 2, 3 and write your own
implementation for them.

Question 1: Compute and report the prior probabilities πj for all labels. (10 marks)
Question 2: For each pixel Xi and label j, compute Pji = P(Xi = 1|y = j) (Use the maximum likelihood
estimate shown in class). Use Laplacian Smoothing for computing Pji. Report the highest Pji for each label
j. (15 marks)
Question 3: Use naive bayes (as shown in lecture slides) to classify the test data. Report the accuracy.
(5 marks)
Note: You can use the scikit-learn function from Question 4 onwards.

Question 4: Compute the confusion matrix (as shown in the lectures) and report the top 3 pairs with most
(absolute number) incorrect classifications. (10 marks)

Question 5: Visualizing mistakes: Print two MNIST images from the test data that your classifier
misclassified. Write both the true and predicted labels for both of these misclassified digits. (10 marks)

Now, we will implement Gaussian Mixture Model and Linear Discriminant Analysis on the breast cancer
data (sklearn.datasets.load breast cancer) available in sklearn.datasets. Load the data and split it into
train-validation-test (40-20-40 split). Don’t shuffle the data, otherwise your results will be different.

Question 6: Implement Gaussian Mixture model on the data as shown in class. Tune the covariance type
parameter on the validation data. Use the selected value to compute the test accuracy. As always, train
the model on train+validation data to compute the test accuracy. (10 mark)

Question 7: Apply Linear Discriminant Analysis model on the train+validation data and report the
accuracy obtained on test data. Report the transformation matrix (w) along with the intercept. (5 mark)