Python代写 | 11-411/611 Natural Language Processing Assignment 4: POS Tagging

本次美国作业案例是Python代写一个自然语言相关的assignment

POS tagging forms an important part of the NLP work ow for most modern-day NLP systems. Within the
NLP community, POS tagging is largely perceived as a solved problem, or at least well enough solved such
that most people do not put much effort into improving POS tagging for its own sake. However, you did
not solve it, so you can have a crack at it. It is a good problem for learning about NLP, and that is what
we are aiming for in this assignment.

Download the handout, and unzip the archive. You should have the following files:

hw04-handout
train hmm.py
viterbi.py
tag acc.py
ptb.2-21.tgs
ptb.2-21.txt
ptb.22.tgs
ptb.22.txt
ptb.23.txt

For this assignment, we have provided you with sections 2-23 of the Penn Treebank (PTB) ( ptb.*.txt
files). We have also provided the gold standard tags of sections 2-22 ( ptb.*.tgs files). Sections 2-21 are
used for training, so they are concatenated into ptb.2-21.txt and ptb.2-21.tgs for convenience. You
will use section 22 for evaluation. Section 23 is for us to test your POS tagger, so you do not have access to
the gold standard tags.

In this assignment, you will be using an HMM to perform POS tagging. You are constrained to use an HMM,
and you can only train on the provided training data (no external resources/libraries allowed). You should
train your model with ptb.2-21.txt and ptb.2-21.tgs , and evaluate the model with ptb.22.txt and
ptb.22.tgs .

Recall that a bigram HMM is one where the transition probability is defined as P(qi j qj), for qi; qj 2 Q, the
set of states. Take a look at train hmm.py . It should:

1. Read from the command line the input file for tags, the input file for text, and an output file to write
the HMM model to.

2. For every pair of states qi; qj 2 Q, calculate the transition probability qi;qj .

3. For every state qi 2 Q and every token xt in the text file, calculate the emission probability qi;xt .

4. Write the trained HMM model to an output file.

Some of the code has already been implemented for you. You should fill in the sections commented with
TODO . You can ignore smoothing for this task. Further instructions are in the train hmm.py file itself.
You can run train hmm.py from the command line as follows:

$ python train hmm.py train file.tgs train file.txt model.hmm

where tagfile.tgs is the tag file for training, textfile.txt is the token file for training, and model.hmm
is the output file that train hmm.py will write to.

Recall that the Viterbi algorithm takes a sequence of string tokens as input, and outputs a sequence of POS
tags. Take a look at viterbi.py . It should:

1. Read from the command line the HMM model, the input file for text, and an output file to write the
tags to.

2. Use the Viterbi algorithm to find the best POS tag sequence for each line of the text.

3. Write the output tags to an output file.

Some of the code has already been implemented for you. You should fill in the sections commented with
TODO . Further instructions are in the viterbi.py file itself.

You can run viterbi.py from the command line as follows:

$ python viterbi.py model.hmm input.txt myoutput.tgs

where model.hmm is the HMM file generated by your train hmm.py , input.txt is the file of text tokens,
and myoutput.tgs is the output file to write your POS tags into.