# Python代写｜Data 102 Assignment 6

本次美国代写是一个Python MDP和差分隐私的assignment

Submit your writeup including all code and plots as a PDF via Gradescope. We recommend

reading through the entire homework beforehand and carefully using functions for testing

procedures, plotting, and running experiments. Taking the time to test, maintain, and

reuse code will help in the long run!

Data science is a collaborative activity. While you may talk with others about the

homework, please write up your solutions individually. If you discuss the homework with

your peers, please include their names on your submission. Please make sure any hand

written answers are legible, as we may deduct points otherwise.

A soccer robot R is on a fast break toward the goal, starting in position 1. From positions

1 through 3, it can either shoot (S) or dribble the ball forward (D). From 4 it can only

shoot. If it shoots, it either scores a goal (state G) or misses (state M). If it dribbles, it

either advances a square or loses the ball, ending up in state M.

In this Markov Decision Process (MDP), the states are 1, 2, 3, 4, G, and M, where G

and M are terminal states. The transition model depends on the parameter y, which is

the probability of dribbling successfully (i.e., advancing a square). Assume a discount of

γ = 1. For k ∈ {1, 2, 3, 4}, we have

and rewards are 0 for all other transitions.

(a) (3 points) Denote by V π the value function for the specific policy π. What is V π(1)

for the policy π that always shoots?

(b) (4 points) Denote by Q∗(s, a) the value of a q-state (s, a), which is the expected

utility when starting with action a at state s, and thereafter acting optimally. What

is Q∗(3, D) in terms of y?

(c) (3 points) For what range of values of y is Q∗(3, S) ≥ Q∗(3, D)? Interpret your answer

in plain English.