Python机器学习代写 | COSC 2670/2732 Practical Data Science with Python Assignment 2


The key objectives of this assignment are to learn how to train and evaluate a non-trivial
machine learning model. More specifically, the task is called “learning-to-rank”. You
will be given a set of training features that contain the relevance labels 0 (not relevant),

1 (partially relevant) and 2 (relevant), for a large set of query-document pairs. Your
task is to research this problem, find a suitable solution, train a model, and produce a
result file from a training set that will be scored using standard evaluation measures in
Information Retrieval.

If you are unfamiliar with IR, you might want to look through the book Learning to
Rank for Information Retrieval by Tie-Yan Liu, which is available online in the RMIT
library and also on canvas now.

The following files are provided:

• A2.pdf : This specification file.
• train.tsv : A large file of labelled query-document pairs suitable for training.
• test.tsv : The holdout set that you will use to create a runfile.
• documents.tsv: A 3 field file containing the document id, original html, and
clean text parse of each document.
• query.tsv: A 2 field file containing the query id and the query text for each query.

The A2.pdf file is on canvas, the train and test files are in a zip file you can
download using the URL below called and the document.tsv and
query.tsv files can be found in the optional download below called

You are allowed to use any python library you like to solve the problem. The are
a wealth of tools to choose from, including pandas, numpy, and scikit-learn for the
basic processing, and multiple libraries designed specifically for Learning to Rank. I
will let you find these on your own. It should be easy to find several that will work,
and you can try several to determine which works the best. We will need you to
ensure your environment is reproducible though, so the correct way to do this is to
create an anaconda environment for a specific version of python (I strongly suggest it
be 3.8, install any packages you need using pip (not anaconda), and then generate a
requirements.txt file to include with your submission. So, something like:

conda create -n SXXXXXX python=3.8
conda activate SXXXXXX
pip install pandas numpy scikit-learn
pip freeze > requirements.txt

This will create a new environment you can start in Anaconda using “conda
activate SXXXXX” and exit from using “conda deactivate”.