## deletion distance between words

Question

I'm trying to find how many characters I would need to delete to make the two words the same. For instance "at", "cat" would be 1 because I could delete the c, "boat" and "got" would be 3 because I could delete the b,a and g to make it ot. I put the words into a dictionary with their count as the value. Then I iterate over the dictionary and see if that key exists in the other dictionary otherwise I add 1 to the difference. Is this a very inefficient algorithm?

But it is overestimating the number of deletions I need.

```
def deletiondistance(firstword, secondword):
dfw = {}
dsw = {}
diff = 0
for i in range(len(firstword)):
print firstword[i]
if firstword[i] in dfw:
dfw[firstword[i]]+=1
else:
dfw[firstword[i]]=1
for j in range(len(secondword)):
if secondword[j] in dsw:
dsw[secondword[j]] +=1
else:
dsw[secondword[j]]=1
for key, value in dfw.iteritems():
if key in dsw:
#print "key exists"
pass
else:
diff +=1
print "diff",diff
```

Show source

## Answers ( 3 )

I think your goal is similar to levenshtein distance.

Levenshtein distance is a metric for measuring distance between 2 strings.

Here is a wiki-link. https://en.wikipedia.org/wiki/Levenshtein_distance

And here is pypi package for levenshtein distance. https://pypi.python.org/pypi/python-Levenshtein

You can use difflib for this.

Example:

Prints:

As @Hulk mentioned this is similar to levenshtein distance. The only difference is that substitutions are not allowed but that can be rectified by using substitution cost of 2 which is the same as removing character from both strings. Example:

Output: