Python辅导 | Predicting Price Direction

这个作业是使用python对股票价格进行预测

Unpredictability of short-term asset returns is a subject of asset pricing research: efficient markets produce near-Normal daily returns with low correlation to past values. That limits application of autoregression on lagged returns. However, the progress is possible and in this assignment you will make direction predictions using any 2 out of 4 types of Classifier of your choice.1

Predict sign of next daily move but welcome to modify the task to predict for longer periods, a 5-day move. Certain classifiers SVM, ANN are more suited to make such longer-term predictions.

The assignment limits the task to binomial prediction in asset price movement: positive or negative return−1,1. For some classifiers, particularly neural networks or if using bagging/boosting, re-label as 0,1.

Start with lagged log-returns rt−1,rt−2,… as your features. Use ADDITIONAL simple variations around price Pt from Table 1. More complex indicators (eg, RSI, Stochastic K, MACD, CCI, Acc/Distrib) are beyond the scope of the assignment.

Study Design:

Classifier A.1 Logistic Classifier and Bayesian Classifier

Classifier A.2 Support Vector Machines

Classifier A.3 Decision Tree Regressor (or Boosted Random Forest)

Classifier A.4 Artificial Neural Network

If on believes the data carries autoregressive structure: a recurrent neural network model can be a successful alternative to time series regression. a) Attempt to use LSTM classifier with features given in Table 1. LSTM can come out as one of best-predicting models from financial ratios/volatility estimators/adv technical indicators but those features are beyond the scope. b) Dealing with the arbitrary length of sequence is the major characteristic of LTSM. Attempt prediction of 5D or 10D return for equity or 1W, 1M for FF factor, but for robust estimation use > 5−7 years of data for equity.

Task B.1 Investigate the prediction quality using confusion matrix (precision/recall statistics) and area under ROC curve – these are possible for all classifiers if prediction is binomial. Particularly check the quality of predicting the down movements (negative sign of return).

Task B.2 Improve your use of classifier by changing features or hyperparameters, for example with sklearn.model-selection.GridSearchCV. Alternatively, introduce bagging/boosting and discuss impact on prediction quality. A new boosted model deals with mistakes of the previous models – common use is AdaBoost for decision trees as weak learners. Particularly describe steps taken to reduce misclassified negative returns. Present comparison BEFORE and AFTER your improvements.

Task B.3 Develop a scheme that utilises transition probabilities predict-proba() method. Provide separate scatter plots for probabilities of up and down moves, using colour codes for correctly/incorrectly realised prediction. Devise a P&L that relies on fractional betting and the edge p−(1−p) = 2p−1, where probability of move p is above a threshold 75%-90%. Discuss over-relying on transition probabilities for poorly predicted negative returns.

Work on these tasks can be appended to each classifier use case.

Instructions

Work on ALL tasks in the format required. Recite mathematical underpinnings for each chosen Classifier. Code must be submitted and be producing the computational output. Full mathematical workings required for Interest Rates Modeling questions.

Format and Coding: Submit ONE .pdf report file and ONE .zip file with data and code, file name starting with your LASTNAME. It is advantageous to merge all your workings in one PDF file.

Report Content and Analytical Quality: