Python代写|Implement A Bandit Advertising Recommender in Reco-Gym


The folks at Criteo Research have set up a simulator environment for testing interactive
recommendation algorithms (including bandit algorithms). This environment is
at: (Links to an external site.)
and you can find material in the Getting Started notebook and their connected ArXiv

Your challenge is to set up a Bandit implementation, and then to get creative and
improve it.

Basic Assignment = 24 points. You will receive up to 24 points for successfully showing
a bandit involving at least three recommender algorithms (they may be the built-in
algorithms), and showing the performance of that bandit when compared with the three
(or more) component algorithms and compared to a baseline (non-learning) bandit that
simply selects from the algorithms at random. Please graph the average click through
rate performance of your bandit over rounds (100 online users, 70 rounds for each user,
around 7000 rounds in total), of each constituent algorithm, and of the (non-learning)
bandit. What do you observe about how the bandit compares at the beginning and how
it learns over time (comment in your notebook)?

Enhanced Assignment = 6 points. You can receive up to 6 additional points for
improving the performance of the bandit either by adding in at least one additional (and
better-performing) algorithm, or by improving the bandit algorithm itself (to learn more
effectively). Please graph the results of your initial learning bandit against your improved
bandit to show the improvement in click-through rate.

Implement a Learning to Rank Model

Ranking is a central part of many recommender systems. Learning-to-rank (Links to an
external site.) is a family of machine learning techniques that focuses specifically on
learning an ordering of items (as opposed to predicting item relevance or
rating). Training data consists of lists of items with some partial order specified among
items in each list. This order is typically induced by giving a numerical or ordinal score,
but may also be input by providing a set of pairwise preferences (A or B) over pairs of
items. The ranking model proposes to rank, i.e. producing a permutation of items in new,
unseen lists in a similar way to rankings in the training data.

There is a library pytorchltr for Learning to Rank (LTR) with PyTorch that implements
LTR with several ranking losses. Your first task is to read the example in the getting
started page: (Links to
an external site.). From this example, you will find that an LTR model has 2 key elements:

1. Scoring function: generates ranking scores for each item. It could be any
scoring algorithm that make sense. In this task, use the feedforward network
here (Links to an external site.).

2. Ranking loss: a measure of how far off the proposed ranking is from ground
truth data (used to learn to rank better). We sort a list of items by ranking
scores and calculate the corresponding ranking loss. There are three major
categories of ranking loss metrics: pointwise (which focuses on the error in
individual item scores), pairwise (which focuses on correct/incorrect orders of
pairs of items), and listwise (which looks at the correctness of the list order
overall, usually prioritizing correctness at the top of the list). Read this article
to understand 3 major ranking losses: pointwise/pairwise/listwise
rank-baf0ad76203e (Links to an external site.).