Python代写|Data 102 Assignment 6

本次美国代写是一个Python MDP和差分隐私的assignment

Submit your writeup including all code and plots as a PDF via Gradescope. We recommend
reading through the entire homework beforehand and carefully using functions for testing
procedures, plotting, and running experiments. Taking the time to test, maintain, and
reuse code will help in the long run!

Data science is a collaborative activity. While you may talk with others about the
homework, please write up your solutions individually. If you discuss the homework with
your peers, please include their names on your submission. Please make sure any hand
written answers are legible, as we may deduct points otherwise.

A soccer robot R is on a fast break toward the goal, starting in position 1. From positions
1 through 3, it can either shoot (S) or dribble the ball forward (D). From 4 it can only
shoot. If it shoots, it either scores a goal (state G) or misses (state M). If it dribbles, it
either advances a square or loses the ball, ending up in state M.

In this Markov Decision Process (MDP), the states are 1, 2, 3, 4, G, and M, where G
and M are terminal states. The transition model depends on the parameter y, which is
the probability of dribbling successfully (i.e., advancing a square). Assume a discount of
γ = 1. For k ∈ {1, 2, 3, 4}, we have

and rewards are 0 for all other transitions.

(a) (3 points) Denote by V π the value function for the specific policy π. What is V π(1)
for the policy π that always shoots?

(b) (4 points) Denote by Q∗(s, a) the value of a q-state (s, a), which is the expected
utility when starting with action a at state s, and thereafter acting optimally. What
is Q∗(3, D) in terms of y?

(c) (3 points) For what range of values of y is Q∗(3, S) ≥ Q∗(3, D)? Interpret your answer
in plain English.