# Python NLP代写 | Assignment 2

本次香港代写是一个关于Python NLP的assignment

1. Recall that attention can be viewed as an operation on a query 𝑞 ∈ 𝑅𝑑, a set of key vectors {𝑘1,…,𝑘𝑖,…,𝑘𝑛},𝑘𝑖 ∈ 𝑅𝑑, and a set of value vectors {𝑣1,…,𝑣𝑖,…,𝑣𝑛},𝑣𝑖 ∈

a) Please write down the equations for the attention weights 𝑎𝑖 and the output 𝑐 ∈ 𝑅𝑑, a correspondingly weighted average over the value vectors.

b) Describe what properties of the inputs to the attention operation would result in the output 𝑐 being approximately equal to 𝑣𝑗 for 𝑗 ∈ {1,…,𝑛}.

c) Consider a set of key vectors {𝑘1,…,𝑘𝑖,…,𝑘𝑛},𝑘𝑖 ∈ 𝑅𝑑 where 𝑘𝑖 ⊥ 𝑘𝑗 for all 𝑖 ≠ 𝑗 and ‖𝑘𝑖‖ = 1. Let 𝑣𝑎,𝑣𝑏 ∈ {𝑣1,…,𝑣𝑛}, 𝑘𝑎,𝑘𝑏 ∈ {𝑘1,…,𝑘𝑛}. Give anexpression of query vector 𝑞 such that the output 𝑐 is approximately 1/2(𝑣𝑎 +𝑣𝑏).

2. Perplexity.

a) You are given a training set of 100 numbers that consists of 10 each of digits 0-9. Now we see the following test set: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. What is the unigram perplexity of the test set?

b) You are given a training set of 100 numbers that consists of 91 zeros and 1 each of the other digits 1-9. Now we see the following test set: {0, 0, 0, 0, 0, 6, 0, 0, 0, 0}. What is the unigram perplexity of this test set? Please first use your intuition to describe whether the perplexity should go up or down compared with the result in a). Then, calculate the number to see if it aligns with your intuition.

3. In the lecture, we talked about the process of attention, Transformer, and one popular pretrained model, BERT. Here we will try to implement some details of them in Python.

For all inputs and outputs for a function or class, please state clearly in comments about the shape and meaning of the parameters. Please make necessary comments to make your code easy to understand.

Besides, all inputs and outputs in the problems below are organized in batch, with the batch size equal to batch_size. Other dimensions of the variables should be defined reasonably by students.

a) The core equations of attention are part of Problem 1. Here we hope you could implement it in Python. Please define a function called attention, taking four matrices (numpy.array) 𝑞,𝑘,𝑣, and an attention mask as inputs.

You will first do a sanity check on the dimensions of all inputs matrices. If the dimensions fail the check, please raise an error. Only if all the dimensions are proper will it calculate the attention weight and final output. The default value for the attention mask is None. Below is a sketch of the function. def attention(q, k, v, attn_mask=None):

# Sanity check on dimensions of q, k, v, attn_mask # Calculate attention weight and final output return attn_weight, outputs

b) A Transformer encoder block consists of multi-headed attention, layer normalization, feed-forward network. We follow the routine construction of the feed-forward network, which are a combination of two stacked linear layer with dropout. All modules mentioned above have their corresponding implementation in PyTorch.nn. Please construct a Transformer encoder block in PyTorch. Below is a sketch of the class. class TramsformerEncoderBlock(Module):

def __init__(): # Please fill in all the related parameters

# construct self-attention, normalization, feed-forward network

# remember to save layers and parameters to self pass

def forward(source, source_attn_mask=None, source_key_padding_mask=None):

# implement the forward process using defined layers in __init__

# source_attn_mask and source_key_padding_mask are optional

# source_attn_mask is the attention mask

# source_key_padding_mask is for the padding tokens pass

(Please construct this block in your own codes, not using any of the PyTorch implementations of the Transformer structure.)

c) Please stack multiple Transformer encoder blocks to construct a Transformer encoder. The number of blocks can be controlled via num_block. Below is a sketch of the class.

Class TransformerEncoder(Module):

def __init__(encoder_block,

num_block): # Please fill in other necessary inputs

# save layers and parameters to self

def forward(source,

source_attn_mask=None, source_key_padding_mask=None)

# implement the forward process using defined layers in __init__ pass

d) What are the objective functions of BERT? These objectives are popular choices in various NLP tasks. Please write down the objective functions in maths equations and construct these objectives in PyTorch. (No need to write the whole program, just the loss function with clear descriptions of the input and the output.)