Python代写 | 11-411/611 Natural Language Processing Homework 3
In this assignment, you will write programs (in Python3) to compute the Levenshtein Edit Distance
with transposition between two strings. Levenshtein Distance is commonly used in natural language
processing applications, for example as one of the components of a spell-checking system.
Specifically, given a list of possibly misspelled words and a list of Dictionary words, you will have
to find the word in the Dictionary with the smallest Levenshtein Distance with transposition for
each element in the first list.
For the purposes of this assignment, in cases where there is a tie in the number of edits with multiple
strings, your program can choose any of them. In the setup for Levenshtein Distance with
transposition, there are four operations that can be performed:
• insert a character to find the correct word (e.g, ollege college)
• delete a character to find the correct word (e.g, collegfe college)
• substitute character with another one (e.g, cillege college)
• transpose two adjacent characters to find the correct word (e.g, college → ocllege)
You can refer to the Python handout package and implement the TODO; details about the functions
are written in the code.
For Task 2 of the homework, your will write another function which must take two file paths as
input and return a list of integer minimum Levenshtein distances. The two files are
• dictionary.txt – file that contains correctly spelled words. This file will contain multiple
lines, one per word.
• raw.txt – file that contains misspelled words, one per line.
For example, an element in your return list should be “1” if your misspelled word is “ollege” and
the dictionary contains “college” which has the minimum Levenshtein distance with “ollege”. You
need to compare all the words in the dictionary until you find the minimum Levenshtein distance of
the current word in
raw.txt. Your return list should have the number of lines as raw.txt.
Your code will be evaluated on the command line as follows:
ed = EditDistance()
student_res = task2(“dictionary.txt”,”raw.txt”)
Tasks 1 refers to the editdistance.py file and Task 2 refers to task2.py.
Implement Levenshtein Distance with transposition (as described above) and have your program
return the minimum Levenshtein distance with transposition between two strings. See the pseudo
code provided in editdistance.py.
Remember to add an additional operation to your Levenshtein Distance function, transposition.
Transposition allows swapping between two adjacent characters in the string. For example, the
minimum Levenshtein distance with transposition between college and ocllege is only one, since
we can swap the first two letters.
This task is to design your program such that finding the best word(s) in the dictionary is more
efficient than the brute force approach of comparing the misspelled word to each of the words in the
dictionary. (Hint: redefine the minimum Levenshtein distance calculations for a string and a Trie,
(https://en.wikipedia.org/wiki/Trie ) rather than two strings.) For this task, use the minimum
Levenshtein Distance with transposition from Task 1, which includes all four operations (insertion,
deletion, substitution, and transposition).
We provide a dictionary of 50,000 Spanish words constructed from Reuters news reports
(dictionaryfile.txt). We also provide misspelled words (raw.txt). You will be given
these two files and your program must return a list of these minimum Levenshtein distances. The
raw.txt file we use for test will be different in Gradescope.