机器学习代写 | COMPSCI 3314 Introduction to Statistical Machine Learning

这个作业是完成统计机器学习相关的测试题
COMPSCI 3314
Introduction to Statistical Machine Learning

Question 1
(a) Cross-validation is a method to (Choose the best single answer
from multiple choices):
(A) Remove the curse of dimensionality
(B) Assess how the results of a machine learning model will generalise to an unseen data set
(C) Remove noises or outliers from a data set
[2 marks]
(b) Kernel Principal Component Analysis is a method for (Choose the
best single answer from multiple choices):
(A) Classification
(B) Reduction of the dimensionality
(C) Probability estimation
(D) Regression
[2 marks]
(c) Which of the following statements is best practice in Machine Learning for building a real system? (Choose the best single answer from
multiple choices)
(A) Use all the data available for training to obtain optimal performance
(B) Use all the data available for testing the performance of your
algorithm
(C) Split the training data into two separate sets. Use the first subset for training and perform cross-validation solely on the second
subset
(D) Perform cross-validation on training, validation and testing sets
[3 marks]
(d) Which of the following statements about Machine Learning is False?
(Choose the best single answer from multiple choices)
(A) Machine learning algorithms often suffer from the curse of dimensionality
(B) Machine learning algorithms cannot generalise to the data that
are not observed during training of the algorithm
(C) Machine learning algorithms are typically sensitive to noise
(D) Machine learning algorithms typically perform better in terms
of testing accuracy when more training data become available
[3 marks]
(e) Which of the following statements is (are) true? (Select all the
correct ones)
(A) Gaussian mixture model (GMM) is a supervised learning method.

Question 2
Let {(xi
, yi)}
n
i=1 be the training data for a binary classification problem,
where xi ∈ R
d and yi ∈ {−1, 1}. Let w ∈ R
d be the parameter vector,
b ∈ R be the offset, ξi be the slack variable for i = 1, …, n.
Here the notation hp, qi = p · q calculates the inner product of two
vectors.
(a) What is wrong with the following primal form of the soft margin
SVMs?
min
w,b,ξ
1
2
k w k
2 + C
Xn
i=1
ξi
,
s.t. yi(hxi
, wi + b) ≥ 1 − ξi
, i = 1, · · · , n.
[2 marks]
(b) After fixing the problem in the above form, what is the estimated
w if C = 0?
[2 marks]
(c) The dual form of the soft margin SVMs is given below. How to
modify it (slightly) to make it become the dual form for the hard
margin SVMs?
max
α
Xn
i=1
αi −
1
2
X
i,j
αiαjyiyj hxi
, xj i
s.t. 0 ≤ αi ≤ C, i = 1, · · · , n
Xn
i=1
αiyi = 0
[2 marks]
(d) Express b using the dual variables and the training data.
[3 marks]
(e) A RBF kernel corresponds to lifting to a feature space with how
many dimensions?
[3 marks]
(f) Let u = [w; b] and z = [x; 1]. We can rewrite (hw, xi + b) as
hu, zi. This means if we augment the training data {(xi
, yi)}
n
i=1
to {(zi
, yi)}
n
i=1, where zi = [xi
; 1], we only need to learn one parameter u instead of two parameters w and b.
1. Please write down the primal form of the soft margin SVMs
using decision function sign[hu, zi].
2. Is the new primal form equivalent to the old primal form? In
other words, if we train two SVMs (standard SVM and this new
re-parameterised SVM), in general, will we obtain exactly the
same classification function?
3. Please prove your answer for above question (i.e. using derivation to show why or why not equivalent).
[6 marks]
(g) Suppose that we have a kernel K(·, ·) such that there is an implicit
high-dimensional feature map Φ : R
d → R
D that satisfies ∀x, z ∈
R
d
, K(x, z) = hΦ(x), Φ(z)i = Φ(x)
>Φ(z) = PD
i=1 Φ(x)iΦ(z)i
is the
inner product in the D-dimensional space.
Show how to compute the squared `2 distance in the D-dimensional
space:
||Φ(x) − Φ(z)||2 =
X
D
i=1
(Φ(x)i − Φ(z)i)
2
without explicitly calculating the values in the D-dimensional vectors. You are asked to provide a formal proof.
[6 marks]