avatar
拼写检查的变种# JobHunting - 待字闺中
d*w
1
拼写检查做法有个计算编辑距离,http://en.wikipedia.org/wiki/Levenshtein_distance
google人工智能老大曾经写过一个非常简短但很高效的拼写检查器,http://norvig.com/spell-correct.html.
下面这个题目也有些类似
Write a program that reads a large list of English words (e.g. from /usr/
share/dict/words on a unix system) into memory, and then reads words from
stdin, and prints either the best spelling suggestion, or "NO SUGGESTION" if
no suggestion can be found. The program should print ">" as a prompt before
reading each word, and should loop until killed.
Your solution should be faster than O(n) per word checked, where n is the
length of the dictionary. That is to say, you can't scan the dictionary
every time you want to spellcheck a word.
For example:
> sheeeeep
sheep
> peepple
people
> sheeple
NO SUGGESTION
The class of spelling mistakes to be corrected is as follows:
Case (upper/lower) errors: "inSIDE" => "inside"
Repeated letters: "jjoobbb" => "job"
Incorrect vowels: "weke" => "wake"
Any combination of the above types of error in a single word should be
corrected (e.g. "CUNsperrICY" => "conspiracy").
If there are many possible corrections of an input word, your program can
choose one in any way you like. It just has to be an English word that is a
spelling correction of the input by the above rules.
Final step: Write a second program that *generates* words with spelling
mistakes of the above form, starting with correctly spelled English words.
Pipe its output into the first program and verify that there are no
occurrences of "NO SUGGESTION" in the output.
avatar
l*3
2
貌似穷举edit distance是1的所有variation和edit distance是2的所有variations.你
给的link里面的文章说edit distance是2的就可以cover98%以上的情况了.然后可以对
每种variation到dictionary里面去找.这样时间上应该比遍历整个dictionary快

if
before

【在 d********w 的大作中提到】
: 拼写检查做法有个计算编辑距离,http://en.wikipedia.org/wiki/Levenshtein_distance
: google人工智能老大曾经写过一个非常简短但很高效的拼写检查器,http://norvig.com/spell-correct.html.
: 下面这个题目也有些类似
: Write a program that reads a large list of English words (e.g. from /usr/
: share/dict/words on a unix system) into memory, and then reads words from
: stdin, and prints either the best spelling suggestion, or "NO SUGGESTION" if
: no suggestion can be found. The program should print ">" as a prompt before
: reading each word, and should loop until killed.
: Your solution should be faster than O(n) per word checked, where n is the
: length of the dictionary. That is to say, you can't scan the dictionary

相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。