# 机器学习代写｜COMP3670: Introduction to Machine Learning

这是一篇来自澳洲的关于机器学习简介的**机器学习代写**

**Note: **For the purposes of this assignment, we let lowercase *p *denote probability density functions (pdf’s), and upper case *P *denote probabilities. If a random variable *Z *is characterized by a probability density function *p*, we have that

*P*(*a **≤ **Z **≤ **b*) = Z *ab *

*p*(*z*) *dz *

You should show your derivations, but **you may use a computer algebra system (CAS) to ****assist with integration or differentiation**.1.

**Question 1 Bayesian Inference **(40 credits)

Let *X *be a random variable representing the outcome of a biased coin with possible outcomes *X *= *{*0*, *1*}*, *x **∈ X *. The bias of the coin is itself controlled by a random variable Θ, with outcomes2 *θ **∈ *** θ**,where

** θ **=

*{*

*θ*

*∈*R : 0

*≤*

*x*

*≤*1

*}*

The two random variables are related by the following conditional probability distribution function of *X *given Θ.

*p*(*X *= 1 *| *Θ = *θ*) = *θ *

*p*(*X *= 0 *| *Θ = *θ*) = 1 *− **θ *

We can use *p*(*X *= 1 *| **θ*) as a shorthand for *p*(*X *= 1 *| *Θ = *θ*).

We wish to learn what *θ *is, based on experiments by flipping the coin. Before we flip the coin, we choose as our prior distribution

*p*(*θ*) = 30*θ*2 (1 *− **θ*)2

which, when plotted, looks like this:

a) (3 credits) Verify that *p*(*θ*) = 30*θ*2 (1 *− **θ*)2 is a valid probability distribution on [0*, *1] (i.e that it is always non-negative and that it is normalised.)

We flip the coin a number of times.3 After each coin flip, we update the probability distribution for *θ *to reflect our new belief of the distribution on *θ*, based on evidence.

Suppose we flip the coin four times, and obtain the sequence of coin flips 4 *x*1:4 = 0101. For its two subsequences 01 and 0101, denoted by *x*1:2*, x*1:4 (and for the case before any coins are flipped),complete the following questions.

b) (15 credits) Compute their probability distribution functions after observing the two subsequences *x*1:2 and *x*1:4, respectively.

c) (3 credits) Compute their expectation values *µ *of *θ *before any evidence as well as after observing the two subsequences *x*1:2 and *x*1:4, respectively.

d) (3 credits) Compute their variances *σ*2 of *θ *before any evidence as well as after observing the two subsequences *x*1:2 and *x*1:4, respectively.

e) (5 credits) Compute their *maximum a posteriori *estimations *θ**MAP *of *θ *before any evidence as well as after observing the two subsequences *x*1:2 and *x*1:4, respectively.

Present your results in a table like as shown below.

f) (5 credits) Plot each of the probability distributions *p*(*θ*)*, p*(*θ**|**x*1:2 = 01)*, p*(*θ**|**x*1:4 = 0101) over the interval 0 *≤ **θ **≤ *1 on the same graph to compare them.

g) (6 credits) What behaviour would you expect of the posterior distribution *p*(*θ**|**x*1:*n*) if we updated on a very long sequence of alternating coin flips *x*1:*n *= 01010101 *. . .*?

What would you expect *µ, σ*2 *, θ**MAP *to look like for large *n*?

Sketch/draw an estimate of what *p*(*θ**|**x*1:*n*) would approximately look like against the other distributions.

**Question 2 Bayesian Inference on Imperfect Information **(50 credits)

We have a Bayesian agent running on a computer, trying to learn information about what the parameter *θ *could be in the coin flip problem, based on observations through a noisy camera. The noisy camera takes a photo of each coin flip and reports back if the result was a 0 or a 1. Unfortunately, the side of the coin with a ”1” on it is very shiny, and the reflected light causes the camera to sometimes report back the wrong result.5 The probability that the camera returns a correct answer is parameterised by *ϕ **∈ *[0*, *1]. Letting *X *denote the true outcome of the coin, and b*X *denoting what the camera reported back, we can draw the relationship between *X *and b *X *as shown.

We would now like to investigate what posterior distributions are obtained, as a function of the parameter *ϕ*. Let b*x*1:*n *be a sequence of coin flips as observed by the camera.

a) (5 credits) Briefly comment about how the camera behaves for *ϕ *= 1*, ϕ *= 0*.*5*, ϕ *= 0. How you expect this would change how the agent updates it’s prior to a posterior on *θ*, given an observation of b*X*. (No equations required.)

b) (10 credits) Compute*p*(b*X *= *x**|**θ*) for all *x **∈ {*0*, *1*}*.

c) (15 credits) The coin is flipped, and the camera reports seeing a zero. (i.e. thatb*X *= 0.)Given an arbitrary prior *p*(*θ*), compute the posterior *p*(*θ**| *b*X *= 0). What does *p*(*θ**| *b*X *= 0) simplify to when *ϕ *= 1? When *ϕ *= 1*/*2? When *ϕ *= 0? Explain your observations.

d) (10 credits) Compute *p*(*θ **| *b*X *Simplify your expression. = 0) for the same choice of prior *p*(*θ*) = 30*θ*2 (1 *− **θ*)2 as before.

e) (10 credits) Plot *p*(*θ **|*b*X *= 0) as a function of *θ*, for all *ϕ **∈ {*0*, *0*.*25*, *0*.*5*, *0*.*75*, *1*} *on the same graph to compare them. Comment on how the shape of the distribution changes with *ϕ*. Explain your observations.