Python代写 | ECE 520.438 & 520.638: Deep Learning HW3


ECE 520.438 & 520.638: Deep Learning

Homework 3: Perceptron, Logistic Regression

Spring 2021

Answer the following questions.

1. 2. 3. 4. 5. 6. 7.

9. 10. 11. 12.

What kind of decision boundary is learned by perceptrons? How can we use perceptrons to do C-class calssfication? Breifly explain stochastic gradient descent.
Briefly explain mini-batch gradient descent.

Let σ(a) = 1 . Show that σ′(a) = σ(a)[1 − σ(a)]. 1+e−a

What is an Epoch?
If the dataset size is 2000 samples and the batch size is 50, then an epoch will consists of how many iterations?
Breifly describe Dropout.
What is early stopping?
Why is regularization used during training a neural network?
What are different ways in which we can reduce over-fitting in neural network?
Why do we use batch normalization?

Part 2: Perceptron

ECE 520.438 & 520.638: Deep Learning Homework 3: Perceptron, Logistic Regression Spring 2021

In this problem, you will classify handwritten digits using the MNIST Database.
Please download the four files found there, these will be used for this homework. To reduce computation time, please use only the first 20,000 training images/labels and only the first 2,000 testing images/labels.

Read in the data from files. Each image is 28 × 28 pixels. An image can be viewed as a 784-dimensional vector of pixel intensities. For each image, append a ‘1’ to the beginning of each x-vector; this will act as an intercept term (for bias). Use the gradient descent algorithm for Perceptron derived in class, to classify given x ∈ R785 whether a digit’s label is k or if it is some “other” digit, i.e. k ∈/ {0, · · · , 9}. For instance, if you are classifying “2”, you would designate y = 2 as the positive class, and all other digits as the negative class. For each image, do this two-way classification for all 10 digits. You will train 10 perceptrons that will collectively learn to classify the handwritten digits in the MNIST dataset. Each perceptron will have 785 inputs and one output.

count it as correctly classified if the perceptron test of {2} vs. {0,1,3,4,5,6,7,8,9} had the highest probability of all the 10 2-way classifications.

3 Part 3: Logistic Regression

Logistic regression is a binary classification method which can be modeled as using a single neuron reading in an input vector x ∈ Rd and parameterized by weight vector w ∈ Rd, where the neuron outputs the probability of the class being y = 1 given x

P(y = 1|x) = gw(x) = 1 = σ(wT x) 1 + exp(−wT x)

P (y = 0|x) = 1 − P (y = 1|x) = 1 − gw (x).
Given {(x(i),y(i))}Ni=1, the Cross Entropy Loss function is defined as follows

J(w) = − 􏰅 􏰂y(i) log(gw(x(i))) + (1 − y(i)) log(1 − gw(x(i)))􏰃 ,

where N denotes the total number of training samples. We will optimize this cost function

via gradient descent.
1. Show that the gradient of the cost function with respect to the parameter w is:

Show your work.

= 􏰅 x(i)(gw(x(i)) − y(i)). j

Grading: You will be graded based on the code you develop, plus your homework report summarizing your findings. If possible, please write your report using LaTeX.