[LUNI] variation on a theme suggested by SyL

John Mason jlm at uic.edu
Thu Dec 14 12:29:40 CST 2006


On Wed, Dec 13, 2006 at 06:39:26PM -0600, Martin Maney wrote:
> 
> Syl has so much disk space that he doesn't know what media files he
> might have dupes of.  SO he wanted to find likely candidates, and
> today's notion was just to troll the output of, eg., "du -a" for
> matchinf sizes and basename.  Since you need the full path names, I
> figured it was easier to knock up something using whichecver scripting
> language was handy, and it turned out to look something like this:
> 
> <file name="dupes.py">
> # usage: du [-a] <target> | python dupes.py
> # prints sets of files with the same sizes and basename
> 
> import sys
> 
> sizes = {}
> for l in sys.stdin.readlines():
>     size,path = l.strip().split(None, 1)
>     key = (size, path.split('/')[-1])
>     sizes[key] = sizes.get(key, [])
>     sizes[key].append(path)
> candidates = [paths for paths in sizes.values() if len(paths) > 1]
> print "found %d candidate sets" % len(candidates)
> for cs in candidates:
>     print ', '.join(cs)
> </file>
> 

consider md5sums of candidate dups.
-- 
%40 <- Ceci n'est pas une @.                           John Mason - jlm at uic.edu
University of Illinois at Chicago - Academic Computing and Communcations Center
   Usenet Administrator, Listserv Administrator, Sun Software Contact et al.


More information about the luni mailing list