Sorasyn commented on a Page, Levenshtein Distance alias  -  Jun 08, 2013

Unique snippet. :) So, from the article you posted, this is a snippet that finds the number of single character transformations it would take to make one word into another?

Yawhatnever  -  Jun 08, 2013

Yeah. Some practical usage scenarios could be phishing detection or fuzzy string searching. For example: $levenshtein(nickserv, nikserv) which returns 1, or $levenshtein(runescaepee, runescape) which returns 2. I did some work on a bk-tree lookup [ http://nullwords.wordpress.com/2013/03/13/the-bk-tree-a-data-structure-for-spell-checking/ ] snippet for fuzzy searching which uses $levenshtein(), but it has a few issues and I haven't taken the time to sort them out.

Sorasyn  -  Jun 09, 2013

Is this caps sensitive as well? That might be a prudent addition to the snippet, if you're looking to build onto it. Being able to switch in and between caps sensitive, and insensitive, would broaden it's applicable usefulness,

Hawkee  -  Jun 09, 2013

This is a very handy function for searching engines.

Yawhatnever  -  Jun 09, 2013

Use $levenshteincs() or $editdistancecs() for case-sensitive comparisons.

Sorasyn  -  Jun 09, 2013

Reminds me of an algorithm we implemented in a programming project over the spring semester. It could take in two files, and based on what words were used and how frequent they appeared, it would give an approximation on the likelihood that the authors were the same. Used more often than not to find plagiarized texts.

Yawhatnever  -  Jun 09, 2013

You could do exactly that using this algorithm. I think the best way might be to read a file into a binary variable, and then instead of passing $mid(word, %i, 1) to $levenshteinequal() you would loop through with $bfind() and pass the words.

Even in its current state it could compare the text of two files (assuming they were within the line length limit), although it would be comparing the characters rather than the words.

Sign in to comment

Are you sure you want to unfollow this person?
Are you sure you want to delete this?
Click "Unsubscribe" to stop receiving notices pertaining to this post.
Click "Subscribe" to resume notices pertaining to this post.