Python定制 | Xavier Alameda-Pineda & Julien Mairal
这个作业是python卷积神经网络相关的练习
Advanced Learning Models
First Homework 2019-2020
Xavier Alameda-Pineda & Julien Mairal
Due date January 10th, 2020
Provide a latex-generated PDF to xavier.alameda-pineda@inria.fr AND julien.mairal@inria.fr by the due
date. Delays will affect the grade.
1 Neural Networks
Let X = (xij )ij , i, j ∈ {1, . . . , 5} denote the input of a convolutional layer with no bias. Let W = (wij )ij , i, j ∈ {1, . . . , 3}
denote the weights of the convolutional filter. Let Y = (yij )ij , i ∈ {1, . . . , I}, j ∈ {1, . . . , J} denote the output of the
convolution operation.
1. What is the output size (i.e. values of I and J) if:
(a) the convolution has no padding and no stride?
(b) the convolution has stride 1 and no padding?
(c) the convolution has no stride and padding 2?
2. Let us suppose that we are in situation 1.(b) (i.e. stride 1 and no padding). Let us also assume that the output of the
convolution goes through a ReLU activation, whose output is denoted by Z = (zij )ij , i ∈ {1, . . . , I}, j ∈ {1, . . . , J}:
(a) Derive the expression of the output pixels xij as a function of the input and the weights.
(b) How many multiplications and additions are needed to compute the output (the forward pass)?
3. Assume now that we are provided with the derivative of the loss w.r.t. the output of the convolutional layer
∂L/∂zij , ∀ i ∈ {1, . . . , I}, j ∈ {1, . . . , J}:
(a) Derive the expression of ∂L/∂xij , ∀ i, j ∈ {1, . . . , 5}.
(b) Derive the expression of ∂L/∂wij , ∀ i, j ∈ {1, . . . , 3}.
Let us now consider a fully connected layer, with two input and two output neurons, without bias and with a sigmoid
activation. Let xi
, i = 1, 2 denote the inputs, and zj , j = 1, 2 the output. Let wij denote the weight connecting input i to
output j. Let us also assume that the gradient of the loss at the output ∂L/∂zj , j = 1, 2 is provided.
4. Derive the expressions for the following derivatives:
(a) ∂L
∂xi
(b) ∂L
∂wij
(c) ∂
2L
∂w2
ij
(d) ∂
2L
∂wijwi
0j
0
, i 6= i
0
, j 6= j
0
(e) The elements in (c) and (d) are the entries of the Hessian matrix of L w.r.t the weight vector. Imagine now
that storing the weights of a network requires 40 MB of disk space: how much would it require to store the
gradient? And the Hessian?
1
2 Conditionally positive definite kernels
Let X be a set. A function k : X × X → R is called conditionally positive definite (c.p.d.) if and only if it is symmetric
and satisfies:
Xn
i,j=1
aiajk(xi
, xj ) ≥ 0
for any n ∈ N, x1, x2, . . . , xn ∈ X n and a1, a2, . . . , an ∈ R
n with Pn
i=1 ai = 0 .
1. Show that a positive definite (p.d.) function is c.p.d.
2. Is a constant function p.d.? Is it c.p.d.?
3. If X is a Hilbert space, then is k(x, y) = −||x − y||2 p.d.? Is it c.p.d.?
4. Let X be a nonempty set, and x0 ∈ X a point. For any function k : X × X → R, let ˜k : X × X → R be the function
defined by:
˜k(x, y) = k(x, y) − k(x0, x) − k(x0, y) + k(x0, x0).
Show that k is c.p.d. if and only if ˜k is p.d.
5. Let k be a c.p.d. kernel on X such that k(x, x) = 0 for any x ∈ X . Show that there exists a Hilbert space H and a
mapping Φ : X → H such that, for any x, y ∈ X ,
k(x, y) = −||Φ(x) − Φ(y)||2
.
6. Show that if k is c.p.d., then the function exp(tk(x, y)) is p.d. for all t ≥ 0
7. Conversely, show that if the function exp(tk(x, y)) is p.d. for any t ≥ 0, then k is c.p.d.
8. Show that the shortest-path distance on a tree is c.p.d over the set of vertices (a tree is an undirected graph without
loops. The shortest-path distance between two vertices is the number of edges of the unique path that connects them). Is
the shortest-path distance over graphs c.p.d. in general?
2