[LUNI] variation on a theme suggested by SyL
John Mason
jlm at uic.edu
Thu Dec 14 12:29:40 CST 2006
On Wed, Dec 13, 2006 at 06:39:26PM -0600, Martin Maney wrote:
>
> Syl has so much disk space that he doesn't know what media files he
> might have dupes of. SO he wanted to find likely candidates, and
> today's notion was just to troll the output of, eg., "du -a" for
> matchinf sizes and basename. Since you need the full path names, I
> figured it was easier to knock up something using whichecver scripting
> language was handy, and it turned out to look something like this:
>
> <file name="dupes.py">
> # usage: du [-a] <target> | python dupes.py
> # prints sets of files with the same sizes and basename
>
> import sys
>
> sizes = {}
> for l in sys.stdin.readlines():
> size,path = l.strip().split(None, 1)
> key = (size, path.split('/')[-1])
> sizes[key] = sizes.get(key, [])
> sizes[key].append(path)
> candidates = [paths for paths in sizes.values() if len(paths) > 1]
> print "found %d candidate sets" % len(candidates)
> for cs in candidates:
> print ', '.join(cs)
> </file>
>
consider md5sums of candidate dups.
--
%40 <- Ceci n'est pas une @. John Mason - jlm at uic.edu
University of Illinois at Chicago - Academic Computing and Communcations Center
Usenet Administrator, Listserv Administrator, Sun Software Contact et al.
More information about the luni
mailing list