[LUNI] File duplicate finder contest!

Martin Maney maney at two14.net
Sun Dec 17 11:00:23 CST 2006


On Sat, Dec 16, 2006 at 07:22:33PM -0600, Gene Jannece wrote:
> The files include anything from 50meg text files, unsorted mp3 
> collections, full disc copies of entire DVD box sets

For these I would expect the simple du postprocessor I posted to give
good results.

>, entire system backups (both windows and linux OS)

For this, maybe not so good - lots of smaller files, size collisions.

> collections of pictures featuring "artistic" nudes, etc.

These should be okay, but maybe I should revise the script to make an
offsite backup just in case something goes wrong.  :-)

> That's 1.1T total with du -c -h

Probably more important to know how *many* files there are in the data
set.  A ballpark figure for average path length and directory size
(count of entries, not bytes), too, since you imply that some tools
have run out of memory processing this collection, and these are the
parameters that will affect the space used during the early stages of
processing (or know that you can't do the first pass the easy way, and
revise it to accomodate that).

Assuming anyone finds the problem or the prizes sufficently motivating,
that is.  :-)

> 5) your entry CAN NOT delete anything! it must dump the results into a 
> text file for review.

How about it encrypts a few files and holds them for ransom, just to
make things more interesting?

-- 
I've just realised that one of the things I really hate hate hate
about Windows is that it doesn't have any personality. It's corporate
and it hates me but it wouldn't ever do anything but smile falsely
and refuse to talk to me.  -- Jo Walton http://www.bluejo.demon.co.uk

And now, with Vista, it can sieze your documents, too.  Innit that
innovative?


More information about the luni mailing list