# Python定制 | Xavier Alameda-Pineda & Julien Mairal

这个作业是python卷积神经网络相关的练习

Advanced Learning Models

First Homework 2019-2020

Xavier Alameda-Pineda & Julien Mairal

Due date January 10th, 2020

Provide a latex-generated PDF to xavier.alameda-pineda@inria.fr AND julien.mairal@inria.fr by the due

date. Delays will affect the grade.

1 Neural Networks

Let X = (xij )ij , i, j ∈ {1, . . . , 5} denote the input of a convolutional layer with no bias. Let W = (wij )ij , i, j ∈ {1, . . . , 3}

denote the weights of the convolutional filter. Let Y = (yij )ij , i ∈ {1, . . . , I}, j ∈ {1, . . . , J} denote the output of the

convolution operation.

1. What is the output size (i.e. values of I and J) if:

(a) the convolution has no padding and no stride?

(b) the convolution has stride 1 and no padding?

(c) the convolution has no stride and padding 2?

2. Let us suppose that we are in situation 1.(b) (i.e. stride 1 and no padding). Let us also assume that the output of the

convolution goes through a ReLU activation, whose output is denoted by Z = (zij )ij , i ∈ {1, . . . , I}, j ∈ {1, . . . , J}:

(a) Derive the expression of the output pixels xij as a function of the input and the weights.

(b) How many multiplications and additions are needed to compute the output (the forward pass)?

3. Assume now that we are provided with the derivative of the loss w.r.t. the output of the convolutional layer

∂L/∂zij , ∀ i ∈ {1, . . . , I}, j ∈ {1, . . . , J}:

(a) Derive the expression of ∂L/∂xij , ∀ i, j ∈ {1, . . . , 5}.

(b) Derive the expression of ∂L/∂wij , ∀ i, j ∈ {1, . . . , 3}.

Let us now consider a fully connected layer, with two input and two output neurons, without bias and with a sigmoid

activation. Let xi

, i = 1, 2 denote the inputs, and zj , j = 1, 2 the output. Let wij denote the weight connecting input i to

output j. Let us also assume that the gradient of the loss at the output ∂L/∂zj , j = 1, 2 is provided.

4. Derive the expressions for the following derivatives:

(a) ∂L

∂xi

(b) ∂L

∂wij

(c) ∂

2L

∂w2

ij

(d) ∂

2L

∂wijwi

0j

0

, i 6= i

0

, j 6= j

0

(e) The elements in (c) and (d) are the entries of the Hessian matrix of L w.r.t the weight vector. Imagine now

that storing the weights of a network requires 40 MB of disk space: how much would it require to store the

gradient? And the Hessian?

1

2 Conditionally positive definite kernels

Let X be a set. A function k : X × X → R is called conditionally positive definite (c.p.d.) if and only if it is symmetric

and satisfies:

Xn

i,j=1

aiajk(xi

, xj ) ≥ 0

for any n ∈ N, x1, x2, . . . , xn ∈ X n and a1, a2, . . . , an ∈ R

n with Pn

i=1 ai = 0 .

1. Show that a positive definite (p.d.) function is c.p.d.

2. Is a constant function p.d.? Is it c.p.d.?

3. If X is a Hilbert space, then is k(x, y) = −||x − y||2 p.d.? Is it c.p.d.?

4. Let X be a nonempty set, and x0 ∈ X a point. For any function k : X × X → R, let ˜k : X × X → R be the function

defined by:

˜k(x, y) = k(x, y) − k(x0, x) − k(x0, y) + k(x0, x0).

Show that k is c.p.d. if and only if ˜k is p.d.

5. Let k be a c.p.d. kernel on X such that k(x, x) = 0 for any x ∈ X . Show that there exists a Hilbert space H and a

mapping Φ : X → H such that, for any x, y ∈ X ,

k(x, y) = −||Φ(x) − Φ(y)||2

.

6. Show that if k is c.p.d., then the function exp(tk(x, y)) is p.d. for all t ≥ 0

7. Conversely, show that if the function exp(tk(x, y)) is p.d. for any t ≥ 0, then k is c.p.d.

8. Show that the shortest-path distance on a tree is c.p.d over the set of vertices (a tree is an undirected graph without

loops. The shortest-path distance between two vertices is the number of edges of the unique path that connects them). Is

the shortest-path distance over graphs c.p.d. in general?

2