机器学习代写 | Human-Computer Interaction (CS45901_S20)
这个作业是用机器学习完成预测文本,可以提示用户输入文字时对他们的单词或拼写更正
Human-Computer Interaction (CS45901_S20)
Search Courses
UG: Mini Project 3: Designing a better predictive text
system
BACKGROUND:
Predictive text is a statistical or machine learning based approach that suggests
words or spelling corrections to a user as they enter text. Modern predictive text
systems learn based on user habits, but end up making mistakes when users
attempt to enter words that they do not frequently use or when users use alternate
spellings or when handling unknown words that are not found in the predictive text
dictionary.
Peter Norvig, Director of Research at Google, has a repository on his website that
contains data from Project Gutenberg, Wiktionary, and British National Corpus:
https://norvig.com/big.txt or locally linked here. You can use it as the basis for your
correction system, or use any other repositories. The Birkbeck spelling error corpus
contains a list of spelling mistakes and can be used to generate your list of
erroneous words: http://ota.ox.ac.uk/headers/0643.xml.
To determine word viability you can use bigram analysis. For example, the word
Firefox is viable because the bigram {, ir, re, ef, fo, ox} appear commonly in English.
But, the word Firefx is not viable because the bigram {fx} is very rare. You can
download a frequency list of common bigrams from Peter Norvig’s site:
http://norvig.com/ngrams/count_2l.txt or locally here.
TASK:
Your task is to create a unied predictive text system that corrects spelling errors,
performs word viability analysis, and suggests words for the user as they type a
body of text. You are not required to have a graphical user interface, your program
can be a simple text based system that reads in data from a le or from the
command prompt. The system should perform the following three tasks:
1. allow the user to automatically correct a body of text that has been provided.
Human-Computer Interaction (CS45901_S20)
/
2. if the user has entered a word that is not recognized, e.g. Firefox or resectioning,
then the system should determine the viability of the word and provide suggestions
if the word seems too obscure.
3. provide a suggestion for the next word based on what the user is typing.
SUGGESTED READINGS:
Peter Norvig has a very simple and elegant spelling correction system written in
Python on his website: https://norvig.com/spell-correct.html (or backup link
here: https://github.com/anderscui/spellchecker/blob/master/wiki/norvig.md) The
concept behind it is simple, and uses a reduced Bayes’ Theorem to generate the
most likely word. The approach uses a probabilistic model by querying the likelihood
of words at an edit distance of 1 or 2. Peter Norvig’s site (http://norvig.com/ngrams/)
has a wide selection of data that you will nd useful for this project.
I wrote a paper several years ago on performing automated spelling correction. My
work uses the idea that for a given word w that is incorrect, the distribution of
words before and after it will be the same as the distribution of words before and
after the correct word c. The words w and c should also be separated by a small edit
distance of 1 or 2. You can download the paper from HERE.
DEADLINES:
May 5: demos over Skype.
May 6: source code and instructions to be uploaded to Moodle.
May 6: team and individual write up to be uploaded to Moodle.
NOTE:
I am quite familiar with the myriad of examples on the web for performing spelling
correction and word prediction. I expect teams to develop their own algorithm and
not copy and paste something from the web. You are free to read other papers and
cite them as needed in your work. Also, do not post your code to any online
repositories as we’ve had issues with code being taken without consent. You will be
asked to explain your algorithm during the code demo day.
Submission status
Submission
status
No attempt
Grading status Not graded
Due date Wednesday, May 6, 2020, 11:59 PM
Time remaining Assignment is overdue by: 3 days 8 hours
Last modied –
Add submission
/
You have not made a submission yet