机器学习代写 | ELEC6008 Pattern recognition and machine learning
这个作业是用机器学习完成模式识别相关的理论题
ELEC6008 Pattern recognition and machine learning
Q1.
 (a) Let the likelihood of the two classes
 1
 and
 2
 with respect to
 x
 be given by
 2
 ( 2)
 1
 2
 2
 1
 ( | )
 +
 −
 =
 x
 p x e
 
 
 and
 8
 ( 5)
 2
 2
 2 2
 1
 ( | )
 −
 −
 =
 x
 p x e
 
  .
 (Sub-total: 18)
 The a priori probabilities for the two classes are given by
 P(1
 ) = 0.8
 and
 P(2
 ) = 0.2 .
 i) Find the Maximum Likelihood Classifier .
 (5 marks)
 ii) Using the Bayes rule
 ( )
 ( | ) ( )
 ( | )
 p x
 p x P
 P x
 i i
 i
  
  =
 , find the classifier(s) for the two
 classes. (If there are more than one decision boundaries, you should find them
 as well)
 (5 marks)
 Client X wants to apply the above classifier for bio-medical applications and has
 suggested the following loss functions for Bayes classification:
 iii) Write down the 4 different values of the loss function
 (1 |1) , (1 |2 ),
 (2 |1)
 and
 (2 |2 ).
 (2 mark)
 iv) Find the Bayes Minimum Risk Classifier using the new loss function in (iii).
 (4 marks)
 v) Suggest and explain whether the new classifier in (iv) is still a minimum error
 rate classifier.
 (2 marks)
 (b) Consider the following criterion function for finding a hyperplane to separate the
 two classes of samples, which contain
 T
 [4,1] x1
 = ,
 T x2 =[3,2]
 (Class 1) and
 T
 [6,8] x3
 = ,
 T x4 =[9,9]
 (Class 2),
 
 
 = −
 C
 T Jq
 y Y
 a a y
 ~
 ( ) . (Sub-total: 15)
i) The Gradient Descent can be used to solve
 J q (a)
 . Write down the expression
 in terms of
 (k )  , a Jq (a),
 (k+1)
 a
 and
 (k )
 a
 that solves
 a
 iteratively.
 (2 marks)
 ii) Suppose the augmented feature vector is defined as
 T y =[1, x1, x2 ]
 . Using (i) and
 (ii), find
 (2)
 a
 and
 (3)
 a
 with an initialization
 T
 [0,0,0]
 (1)
 a =
 and a step size
 1
 ( ) =
 k  .
 (6 marks)
 iii) Student Y suggests the soft-margin SVM should be employed rather than the
 perceptron, which is given as
 w  i
 , , 0
 min
 w
 +  − +
 =
 N
 i
 i
 T C zi w
 1
 0
 2
 2
 w max(0,1 (w x )) ,
 zi
 =1,−1
 . Using an initialization
 T T T
 [w , ] [0,0,0]
 ~ (1) (1)
 0
 (1) w = w =
 , step size
 0.1
 ( ) =
 k 
 and regularization parameter
 C =10
 , find
 ~(2) w
 and
 ~(3) w .
 (7 marks)
 Q2.
 (a) Let the likelihood of a parameter
 
 of the density function given as
 
 
 
 
 − 
 =
 otherwise
 x x x
 p x
 0
 exp( ) 0
 3
 4
 ( | )
 5/ 2 4 2  
   . (Sub-total: 16)
 i) Given a set of independent feature samples
 { , , , } {2,5,7,11} x1
 x2
 x3
 x4
 = ,
 determine the maximum likelihood of
  .
 (6 marks)
 Assume that the parameter
 
 has an a priori probability
 p() = 0.5[ ( −2)+ ( −3)],
 where
  (.)
 is the ideal unit impulse function informally defined as:
 
 
  =
 =
 otherwise
 y
 y
 0
 0
  ( )
 and
 
 ( ) 1
 
 −
  y = .
 ii) Determine the posterior probability
 p( | x1
 , x2
 , x3
 , x4 ).
 (6 marks)
 iii) Find the Maximum A Posteriori (MAP) Estimate of
  .
 (4 marks)
 (b) Consider the following independently drawn samples
 X = {1,2,2,4,5,7,8,9,9}, N = 9 (Sub-total: 17)
i) Find
 p(x)
 for
 x = 4.5
 using the Parzen window with a bandwidth
 hd
 = 2
 using
 the rectangular window.
 (5 marks)
 ii) The Silverman’s Rule is a method to choose the bandwidth. Suggest under what
 situation the determined bandwidth is optimal.
 (1 mark)
 iii) Find
 p(x)
 for
 x = 4.5
 using the kNN method with
 kn
 = 3 .
 (5 marks)
 iv) Suppose
 [11,3 4,5 5] X1
 =
 , , ,
 and
 [8 9,9,1112,12] X2
 =
 , ,
 belongs to class 1 and
 class 2 respectively, suggest which class does an arbitrary value
 x = 7
 belongs
 to if the kNN method with
 kn
 = 3
 is used.
 (5 marks)
 vi) Explain why an even
 kn
 should not be used in a two-class classification
 problem

 
                        