机器学习代写 | Human-Computer Interaction (CS45901_S20)
这个作业是用机器学习完成预测文本,可以提示用户输入文字时对他们的单词或拼写更正
 Human-Computer Interaction (CS45901_S20)
 Search Courses
 UG: Mini Project 3: Designing a better predictive text
 system
 BACKGROUND:
 Predictive text is a statistical or machine learning based approach that suggests
 words or spelling corrections to a user as they enter text. Modern predictive text
 systems learn based on user habits, but end up making mistakes when users
 attempt to enter words that they do not frequently use or when users use alternate
 spellings or when handling unknown words that are not found in the predictive text
 dictionary.
 Peter Norvig, Director of Research at Google, has a repository on his website that
 contains data from Project Gutenberg, Wiktionary, and British National Corpus:
 https://norvig.com/big.txt or locally linked here. You can use it as the basis for your
 correction system, or use any other repositories. The Birkbeck spelling error corpus
 contains a list of spelling mistakes and can be used to generate your list of
 erroneous words: http://ota.ox.ac.uk/headers/0643.xml.
 To determine word viability you can use bigram analysis. For example, the word
 Firefox is viable because the bigram {, ir, re, ef, fo, ox} appear commonly in English.
 But, the word Firefx is not viable because the bigram {fx} is very rare. You can
 download a frequency list of common bigrams from Peter Norvig’s site:
 http://norvig.com/ngrams/count_2l.txt or locally here.
 TASK:
 Your task is to create a unied predictive text system that corrects spelling errors,
 performs word viability analysis, and suggests words for the user as they type a
 body of text. You are not required to have a graphical user interface, your program
 can be a simple text based system that reads in data from a le or from the
 command prompt. The system should perform the following three tasks:
 1. allow the user to automatically correct a body of text that has been provided.
 Human-Computer Interaction (CS45901_S20)
 /
 2. if the user has entered a word that is not recognized, e.g. Firefox or resectioning,
 then the system should determine the viability of the word and provide suggestions
 if the word seems too obscure.
 3. provide a suggestion for the next word based on what the user is typing.
 SUGGESTED READINGS:
 Peter Norvig has a very simple and elegant spelling correction system written in
 Python on his website: https://norvig.com/spell-correct.html (or backup link
 here: https://github.com/anderscui/spellchecker/blob/master/wiki/norvig.md) The
 concept behind it is simple, and uses a reduced Bayes’ Theorem to generate the
 most likely word. The approach uses a probabilistic model by querying the likelihood
 of words at an edit distance of 1 or 2. Peter Norvig’s site (http://norvig.com/ngrams/)
 has a wide selection of data that you will nd useful for this project.
 I wrote a paper several years ago on performing automated spelling correction. My
 work uses the idea that for a given word w that is incorrect, the distribution of
 words before and after it will be the same as the distribution of words before and
 after the correct word c. The words w and c should also be separated by a small edit
 distance of 1 or 2. You can download the paper from HERE.
 DEADLINES:
 May 5: demos over Skype.
 May 6: source code and instructions to be uploaded to Moodle.
 May 6: team and individual write up to be uploaded to Moodle.
 NOTE:
 I am quite familiar with the myriad of examples on the web for performing spelling
 correction and word prediction. I expect teams to develop their own algorithm and
 not copy and paste something from the web. You are free to read other papers and
 cite them as needed in your work. Also, do not post your code to any online
 repositories as we’ve had issues with code being taken without consent. You will be
 asked to explain your algorithm during the code demo day.
 Submission status
 Submission
 status
 No attempt
 Grading status Not graded
 Due date Wednesday, May 6, 2020, 11:59 PM
 Time remaining Assignment is overdue by: 3 days 8 hours
 Last modied –
 Add submission
 /
 You have not made a submission yet

 
                        