Python代写|SCDAA Coursework 2022-23

Your overall task will be to implement a numerical algorithm for solving stochastic control problems using policy iteration combined with the “deep Galerkin method” for solving a linear PDE.

This will be broken into steps which a) give you credit for completion and b) allow you to check your progress.

You will be expected to submit a PDF report on what you did on Learn by the deadline. In the PDF provide a link to an online git repository where your code can be found. It should contain a telling me how to run the code to reproduce any output (graphic / table) you include in the PDF. I will use the code from the latest commit made prior to the deadline.

Use numpy, scipy, matplotlib and torch and all the functions they contain but do not use other libraries. 1 You will be working in groups of three2 and the submitted PDF and git repo should state your names, student numbers and individual contribution fractions. 3 You need to work together on each of the tasks at least to the extent that you can use code created in one task for the next one. There are 22 marks in total on this assignment but it is marked out of 20. Think of the extra two marks as a bonus.

Please note that a submission that doesn’t demonstrate why things work (e.g. test examples, convergence plots etc.) will get very few marks even if you have code that correctly solves the exercises.

Linear quadratic regulator

We consider

dXs = [HXs + Mαs] ds + σdWs , s ∈ [t, T] , Xt = x .

(1)Our aim is to minimize

Jα(t, x) := Et,x ?? t T

(Xs ⊤CXs + α⊤Dαs) ds + XT ⊤RXT ? ,

where4 C ≥ 0, R ≥ 0 and D > 0, are given and deterministic and we will assume 2 ×2.

The value function is v(t, x) := infα Jα(t, x). We know how to solve the Bellman PDE to obtain that

v(t, x) = x⊤S(t)x + ? t T

tr(σσ⊤Sr) dr ,

where S is the solution of the Ricatti ODE:

S′ (r) = −2H⊤S(r) + StMD−1MS(r) − C , r ∈ [t, T] , S(T) = R .

Note that solution takes values in the space of 2 × 2 matrices. The optimal Markov control is a(t, x) = −DM⊤S(t)x .

Exercise 1.1 (Solving LQR using from Ricatti ODE, 1 mark).

Write a class which:

i) Can be initialised with the matrices specifying the LQR problem and T >

ii) Has a method which will solve (approximate) the associated Ricatti ODE on a time grid which is an input (numpy array or torch tensor).

iii) Has a method that, given a torch tensor of dimension batch size × 1 × 2, will return a torch tensor of dimension batch size ×1 with entriesbeing the control problem value v(t, x) for each t, x in the batch (for xtwod imensional).

iv) Has a method that, given a torch tensor of dimension batch size ×1× 2,will return a torch tensor of dimension batch size ×2 with entriesbeing the Markov control function for each t, x in the batch (for xtwod im ensional).

Exercise 1.2 (LQR MC checks, 6 marks). Run a Monte Carlo simulation of the system with the optimal solution you have obtained and ensure that you’re converging to the optimal value function you obtained in Exercise 1.1. In particular:

Hint: Let a = a(t, x) denote the optimal control from Exercise 1.1. Once you’ve plugged this in (1) you get dXs = [HXs + Ma(s, Xs)] ds + σdWs , s ∈ [t, T] , Xt = x .

(2)To do a time discretisation fifix N

∈ N (number of time steps);let τ:=T/N be the time step. Assume from now that you’d only possible want to start the SDE at time stk = kτ for some k = 0, 1, . . . , N. You have a choice of (at least) two schemes since you know a(t, x) = −DM⊤S(t)x.

2 Supervised learning, checking the NNs are good enough

For our purposes a neural network is a parametric function that depends on an input,say x ∈ Rd and parameters, say θ ∈ Rp and takes values in say Rd′ and we’ll write ϕ = ϕ(x; θ).

An example of one hidden layer neural network is

ϕ(x; θ) = ϕ(x; α(1) , α(2) , β(1) , β(2) ) = α(1) ψ(α(2) x + β(2) ) + β(1) ,