Python代写 | 11-4/611 Assignment 1: Warming up with Regex
本次美国代写是Python自然语言处理的一个assignment
Download the assignment package from Piazza. You should have the following files when you uncompress
the folder.
working directory
nyt
file0.txt
file1.txt
..
cleaner.py
names.txt
decode.py
encoded.txt
key.txt
The year is 1984. Your name is Winston Smith. You live in Airstrip One and work for the Ministry of Truth.
You have been assigned the important task of making sure that the media only reports stories that are true;
that is, that agree with the Inner Party’s official historical records.
It has come to the Party’s attention that one particularly suspect media outlet, the New York Times, has
made several references to people who `do not exist’. Your job is to wipe out the old, incorrect names and
anonymize them, replacing them with the Ministry’s preferred name, John Smith .
There are far too many documents for the number of employees in the Ministry of Truth to manually comb
over and make the relevant adjustments. The Ministry would like to write a program to do this automatically.
Implement the method clean in the class Cleaner found in the file cleaner.py . A blank template has
been provided for you.
The clean method should take in three arguments: a list of banned full names, a list of banned last names,
and an input string to process. It should return as output the input string after the replacement of the
banned full names and last names, if any. Specifically, the method should:
1. Replace all instances of the banned full names with the officially approved name John Smith
2. Replace all banned last names with the officially approved last name Smith
More specific details can be found in the docstring of the class and method.
Suppose we have a segment of text as follows, where the name Tiger Woods is on the banned list:
I believe that Tiger Woods is an amazing golfer. Tiger Woods’ abilities are far beyond my own.
My favorite golfer is Tiger Woods, and I have a shirt with TIGER WOODS printed on it.
The article named Tiger-Woods was published yesterday. Woods are the material that form the
trunks and branches of trees.
After careful processing by your program, this segment should read:
I believe that John Smith is an amazing golfer. John Smith’ abilities are far beyond my own.
My favorite golfer is John Smith, and I have a shirt with TIGER WOODS printed on it.
The article named Tiger-Woods was published yesterday. Smith are the material that form the
trunks and branches of trees.
You can test your program locally by running cleaner.py as an executable. Template code has been
provided to read an input file and a file containing the banned names, one on each line.
You can run the executable as follows. This will read in the text data from input.txt and print the
processed output to STDOUT.
$ python cleaner.py names.txt input.txt