[LUNI] variation on a theme suggested by SyL
Martin Maney
maney at two14.net
Wed Dec 13 18:39:26 CST 2006
Syl has so much disk space that he doesn't know what media files he
might have dupes of. SO he wanted to find likely candidates, and
today's notion was just to troll the output of, eg., "du -a" for
matchinf sizes and basename. Since you need the full path names, I
figured it was easier to knock up something using whichecver scripting
language was handy, and it turned out to look something like this:
<file name="dupes.py">
# usage: du [-a] <target> | python dupes.py
# prints sets of files with the same sizes and basename
import sys
sizes = {}
for l in sys.stdin.readlines():
size,path = l.strip().split(None, 1)
key = (size, path.split('/')[-1])
sizes[key] = sizes.get(key, [])
sizes[key].append(path)
candidates = [paths for paths in sizes.values() if len(paths) > 1]
print "found %d candidate sets" % len(candidates)
for cs in candidates:
print ', '.join(cs)
</file>
Prolly should have called it filedupes, or filedupes1, since it's
really a fairly crude approximation (misses even trivially different
names, reports coincidental matches (which may be less unlikely for
other uses than this particular case)...
--
We've all heard that a million monkeys banging on
a million typewriters will eventually reproduce
the entire works of Shakespeare. Now, thanks to the
Internet, we know this is not true. -- Robert Wilensky
More information about the luni
mailing list