机器学习代写 | ELEC6008 Pattern recognition and machine learning

这个作业是用机器学习完成模式识别相关的理论题

ELEC6008 Pattern recognition and machine learning

Q1.
(a) Let the likelihood of the two classes
1
and
2
with respect to
x
be given by
2
( 2)
1
2
2
1
( | )
+

=
x
p x e


and
8
( 5)
2
2
2 2
1
( | )


=
x
p x e

 .
(Sub-total: 18)
The a priori probabilities for the two classes are given by
P(1
) = 0.8
and
P(2
) = 0.2 .
i) Find the Maximum Likelihood Classifier .
(5 marks)
ii) Using the Bayes rule
( )
( | ) ( )
( | )
p x
p x P
P x
i i
i
 
 =
, find the classifier(s) for the two
classes. (If there are more than one decision boundaries, you should find them
as well)
(5 marks)
Client X wants to apply the above classifier for bio-medical applications and has
suggested the following loss functions for Bayes classification:
iii) Write down the 4 different values of the loss function
(1 |1) , (1 |2 ),
(2 |1)
and
(2 |2 ).
(2 mark)
iv) Find the Bayes Minimum Risk Classifier using the new loss function in (iii).
(4 marks)
v) Suggest and explain whether the new classifier in (iv) is still a minimum error
rate classifier.
(2 marks)
(b) Consider the following criterion function for finding a hyperplane to separate the
two classes of samples, which contain
T
[4,1] x1
= ,
T x2 =[3,2]
(Class 1) and
T
[6,8] x3
= ,
T x4 =[9,9]
(Class 2),


= −
C
T Jq
y Y
a a y
~
( ) . (Sub-total: 15)

i) The Gradient Descent can be used to solve
J q (a)
. Write down the expression
in terms of
(k )  , a Jq (a),
(k+1)
a
and
(k )
a
that solves
a
iteratively.
(2 marks)
ii) Suppose the augmented feature vector is defined as
T y =[1, x1, x2 ]
. Using (i) and
(ii), find
(2)
a
and
(3)
a
with an initialization
T
[0,0,0]
(1)
a =
and a step size
1
( ) =
k  .
(6 marks)
iii) Student Y suggests the soft-margin SVM should be employed rather than the
perceptron, which is given as
w  i
, , 0
min
w
+  − +
=
N
i
i
T C zi w
1
0
2
2
w max(0,1 (w x )) ,
zi
=1,−1
. Using an initialization
T T T
[w , ] [0,0,0]
~ (1) (1)
0
(1) w = w =
, step size
0.1
( ) =
k 
and regularization parameter
C =10
, find
~(2) w
and
~(3) w .
(7 marks)
Q2.
(a) Let the likelihood of a parameter

of the density function given as




− 
=
otherwise
x x x
p x
0
exp( ) 0
3
4
( | )
5/ 2 4 2  
  . (Sub-total: 16)
i) Given a set of independent feature samples
{ , , , } {2,5,7,11} x1
x2
x3
x4
= ,
determine the maximum likelihood of
 .
(6 marks)
Assume that the parameter

has an a priori probability
p() = 0.5[ ( −2)+ ( −3)],
where
 (.)
is the ideal unit impulse function informally defined as:


 =
=
otherwise
y
y
0
0
 ( )
and

( ) 1

−
 y = .
ii) Determine the posterior probability
p( | x1
, x2
, x3
, x4 ).
(6 marks)
iii) Find the Maximum A Posteriori (MAP) Estimate of
 .
(4 marks)
(b) Consider the following independently drawn samples
X = {1,2,2,4,5,7,8,9,9}, N = 9 (Sub-total: 17)

i) Find
p(x)
for
x = 4.5
using the Parzen window with a bandwidth
hd
= 2
using
the rectangular window.
(5 marks)
ii) The Silverman’s Rule is a method to choose the bandwidth. Suggest under what
situation the determined bandwidth is optimal.
(1 mark)
iii) Find
p(x)
for
x = 4.5
using the kNN method with
kn
= 3 .
(5 marks)
iv) Suppose
[11,3 4,5 5] X1
=
, , ,
and
[8 9,9,1112,12] X2
=
, ,
belongs to class 1 and
class 2 respectively, suggest which class does an arbitrary value
x = 7
belongs
to if the kNN method with
kn
= 3
is used.
(5 marks)
vi) Explain why an even
kn
should not be used in a two-class classification
problem