# Python代写 | MGTF 495: Machine Learning Homework 2

本次美国代写主要为Python基础及机器学习的homework

1. Instructions

The answers to the questions and the code should be submitted on Canvas by 23 May 2021, 11:59 pm.

You may submit the notebook downloaded as PDF to Canvas, but please make sure that the questions

are clearly segmented and labelled. To secure full marks for a question both the answer and the code

should be correct. Completely wrong (or missing) code with correct answer will result in zero marks.

2. Data

Download the MNIST train and test data from Canvas along with their corresponding label files. The train

and test data consist of 6000 and 1000 binarized MNIST images respectively.

3. Generative Learning

Please don’t use the direct function from scikit-learn library for questions 1, 2, 3 and write your own

implementation for them.

Question 1: Compute and report the prior probabilities πj for all labels. (10 marks)

Question 2: For each pixel Xi and label j, compute Pji = P(Xi = 1|y = j) (Use the maximum likelihood

estimate shown in class). Use Laplacian Smoothing for computing Pji. Report the highest Pji for each label

j. (15 marks)

Question 3: Use naive bayes (as shown in lecture slides) to classify the test data. Report the accuracy.

(5 marks)

Note: You can use the scikit-learn function from Question 4 onwards.

Question 4: Compute the confusion matrix (as shown in the lectures) and report the top 3 pairs with most

(absolute number) incorrect classifications. (10 marks)

Question 5: Visualizing mistakes: Print two MNIST images from the test data that your classifier

misclassified. Write both the true and predicted labels for both of these misclassified digits. (10 marks)

Now, we will implement Gaussian Mixture Model and Linear Discriminant Analysis on the breast cancer

data (sklearn.datasets.load breast cancer) available in sklearn.datasets. Load the data and split it into

train-validation-test (40-20-40 split). Don’t shuffle the data, otherwise your results will be different.

Question 6: Implement Gaussian Mixture model on the data as shown in class. Tune the covariance type

parameter on the validation data. Use the selected value to compute the test accuracy. As always, train

the model on train+validation data to compute the test accuracy. (10 mark)

Question 7: Apply Linear Discriminant Analysis model on the train+validation data and report the

accuracy obtained on test data. Report the transformation matrix (w) along with the intercept. (5 mark)