# Python辅导 | Predicting Price Direction

这个作业是使用python对股票价格进行预测

Unpredictability of short-term asset returns is a subject of asset pricing research: eﬃcient markets produce near-Normal daily returns with low correlation to past values. That limits application of autoregression on lagged returns. However, the progress is possible and in this assignment you will make direction predictions using ** any 2 out of 4** types of Classiﬁer of your choice.1

Predict sign of next daily move but welcome to modify the task to predict for longer periods, a 5-day move. Certain classiﬁers SVM, ANN are more suited to make such longer-term predictions.

The assignment limits the task to binomial prediction in asset price movement: positive or negative return−1,1. For some classiﬁers, particularly neural networks or if using bagging/boosting, re-label as 0,1.

Start with lagged log-returns rt−1,rt−2,… as your features. Use ADDITIONAL simple variations around price Pt from Table 1. More complex indicators (eg, RSI, Stochastic K, MACD, CCI, Acc/Distrib) are beyond the scope of the assignment.

**Study Design:**

**Classiﬁer A.1 Logistic Classiﬁer and Bayesian Classiﬁer**

**Classiﬁer A.2 Support Vector Machines**

**Classiﬁer A.3 Decision Tree Regressor (or Boosted Random Forest)**

**Classiﬁer A.4 Artiﬁcial Neural Network**

If on believes the data carries autoregressive structure: a recurrent neural network model can be a successful alternative to time series regression. a) Attempt to use LSTM classiﬁer with features given in Table 1. LSTM can come out as one of best-predicting models from ﬁnancial ratios/volatility estimators/adv technical indicators but those features are beyond the scope. b) Dealing with the arbitrary length of sequence is the major characteristic of LTSM. Attempt prediction of 5D or 10D return for equity or 1W, 1M for FF factor, but for robust estimation use > 5−7 years of data for equity.

**Task B.1** Investigate the prediction quality using confusion matrix (precision/recall statistics) and **area under ROC curve** – these are possible for all classiﬁers if prediction is binomial. Particularly check the quality of predicting the down movements (negative sign of return).

**Task B.2** Improve your use of classiﬁer by changing features or hyperparameters, for example with *sklearn.model-selection.GridSearchCV*. Alternatively, introduce bagging/boosting and discuss impact on prediction quality. A new boosted model deals with mistakes of the previous models – common use is AdaBoost for decision trees as weak learners. Particularly describe steps taken to reduce **misclassiﬁed negative returns**. Present comparison BEFORE and AFTER your improvements.

**Task B.3** Develop a scheme that utilises transition probabilities *predict-proba()* method. Provide separate scatter plots for probabilities of up and down moves, using colour codes for correctly/incorrectly realised prediction. Devise a P&L that relies on fractional betting and the edge *p−(1−p) = 2p−1*, where probability of move *p* is above a threshold 75%-90%. Discuss over-relying on transition probabilities for poorly predicted negative returns.

Work on these tasks can be appended to each classiﬁer use case.

__Instructions __

Work on ALL tasks in the format required. Recite mathematical underpinnings for each chosen Classiﬁer. Code must be submitted and be producing the computational output. Full mathematical workings required for Interest Rates Modeling questions.

__Format and Coding__: Submit ONE .pdf report ﬁle and ONE .zip ﬁle with data and code, ﬁle name starting with your LASTNAME. It is advantageous to merge all your workings in __one PDF ﬁle__.

Report Content and Analytical Quality: