# Python代写｜Natural Language Processing Assignment2

本次中国香港代写是一个Python NLP的assignment

1. Recall that attention can be viewed as an operation on a query 𝑞 ∈ 𝑅𝑑, a set of key

vectors {𝑘1, … , 𝑘𝑖, … , 𝑘𝑛}, 𝑘𝑖 ∈ 𝑅𝑑, and a set of value vectors {𝑣1, … , 𝑣𝑖, … , 𝑣𝑛}, 𝑣𝑖 ∈

𝑅𝑑.

a) Please write down the equations for the attention weights 𝑎𝑖 and the output

𝑐 ∈ 𝑅𝑑, a correspondingly weighted average over the value vectors.

b) Describe what properties of the inputs to the attention operation would result

in the output 𝑐 being approximately equal to 𝑣𝑗 for 𝑗 ∈ {1, . . . , 𝑛}.

c) Consider a set of key vectors {𝑘1, … , 𝑘𝑖, … , 𝑘𝑛}, 𝑘𝑖 ∈ 𝑅𝑑 where 𝑘𝑖 ⊥ 𝑘𝑗 for all

𝑖 ≠ 𝑗 and ‖𝑘𝑖‖ = 1. Let 𝑣𝑎, 𝑣𝑏 ∈ {𝑣1, . . . , 𝑣𝑛}, 𝑘𝑎, 𝑘𝑏 ∈ {𝑘1, . . . , 𝑘𝑛}. Give an

expression of query vector 𝑞 such that the output 𝑐 is approximately 1/2(𝑣𝑎 +𝑣𝑏).

**2. Perplexity.**

a) You are given a training set of 100 numbers that consists of 10 each of digits

0-9. Now we see the following test set: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. What is

the unigram perplexity of the test set?

b) You are given a training set of 100 numbers that consists of 91 zeros and 1

each of the other digits 1-9. Now we see the following test set: {0, 0, 0, 0, 0,

6, 0, 0, 0, 0}. What is the unigram perplexity of this test set? Please first use

your intuition to describe whether the perplexity should go up or down

compared with the result in a). Then, calculate the number to see if it aligns

with your intuition.

3. In the lecture, we talked about the process of attention, Transformer, and one

popular pretrained model, BERT. Here we will try to implement some details of

them in Python.

For all inputs and outputs for a function or class, please state clearly in

comments about the shape and meaning of the parameters. Please make

necessary comments to make your code easy to understand.

Besides, all inputs and outputs in the problems below are organized in batch, with

the batch size equal to batch_size. Other dimensions of the variables should be

defined reasonably by students.

a) The core equations of attention are part of Problem 1. Here we hope you

could implement it in Python. Please define a function called attention,

taking four matrices (numpy.array) 𝑞, 𝑘, 𝑣, and an attention mask as inputs.

You will first do a sanity check on the dimensions of all inputs matrices. If

the dimensions fail the check, please raise an error. Only if all the

dimensions are proper will it calculate the attention weight and final output.

The default value for the attention mask is None. Below is a sketch of the

function.

def attention(q, k, v, attn_mask=None):

# Sanity check on dimensions of q, k, v, attn_mask

# Calculate attention weight and final output

return attn_weight, outputs

b) A Transformer encoder block consists of multi-headed attention, layer

normalization, feed-forward network. We follow the routine construction of

the feed-forward network, which are a combination of two stacked linear

layer with dropout. All modules mentioned above have their corresponding

implementation in PyTorch.nn. Please construct a Transformer encoder

block in PyTorch. Below is a sketch of the class.

class TramsformerEncoderBlock(Module):

def __init__(): # Please fill in all the related parameters

# construct self-attention, normalization, feed-forward network

# remember to save layers and parameters to self

pass

def forward(source,

source_attn_mask=None,

source_key_padding_mask=None):

# implement the forward process using defined layers in __init__

# source_attn_mask and source_key_padding_mask are optional

# source_attn_mask is the attention mask

# source_key_padding_mask is for the padding tokens

pass

(Please construct this block in your own codes, not using any of the PyTorch

implementations of the Transformer structure.)