Programming in perl

My goal here is to write apps in perl that are actually useful to the end user. I've found the biggest obstacle is the distribution of dependencies, but slowly things are changing. Debian users have managed to package findimagedupes, for example.

findimagedupes

[2002/02/06 23:55] PixiePlus now supports similar image finding using an algorithm based on mine, and for those unable to run a current version of KDE, gqview will also find your similar images, albeit using a different algorithm whose results I haven't compared with my own. Both are FAR faster than findimagedupes and, I would say, both make it obsolete. If someone else would like to continue its development for web or other non-GUI purposes (this means you, Debian maintainers ;) ), by all means feel free, but consider my itch scratched.

[2001/09/20 08:30] I've been getting a lot of 'bug reports' suddenly about an uninitialized value at line 212 causing findimagedupes to not work. Apparently the API for ImageMagick has changed since the version I'm running. The "Ping" method no longer returns a comma separated list, but only one value. Happily it just happens to be the one value I use, so if you replace line 206:

($width, $height, $size, $format) = split(',', $image->Ping($file));

with:

$format = $image->Ping($file));

it should work again. I see no reason to update findimagedupes until it breaks (i.e. until I'm running a distribution that includes the new version of ImageMagick; I don't install libs outside of the RPM system if I can avoid it,) so if someone wants to take it over temporarily or permanently please drop me a line.

[2001/03/03 10:05] Markus Schoder has contributed finddupes.cpp, GPL'ed source code for a C++ based version of my horribly slow compare routine. In his testing on a directory of 35,000 images, it was about 300 times faster than findimagedupes' perl implementation. It's included here for everyone who has experienced the speed problem. I'll probably integrate it into the next release somehow.

You can compile this by going

g++ -O3 finddupes.cpp -o finddupes

(or download this gzipped executable, built on Mandrake 7.2) and run it like so:

finddupes .95 <imagedupes-db.txt

where .95 is the desired threshold, and .9 is the default. Thanks, Markus!

[2001/02/11 21:00] Version 0.1.3 released with fixes and performance enhancements from Paul Cassella and Max Stekelenburg, as well as bugfixes to make it work with Linux-Mandrake 7.2 and a new "GUI mode" (not an actual GUI, but it produces output which ought to be of easier use to a GUI.)

[2000/10/01 15:30] Performs a rough "visual diff" on two or more images. This command line program will scan two pictures (or a whole tree of pictures) and determine if there are any that look alike. It uses a simple algorithm, hopefully documented well in the code, to reduce every picture to a 16x16x1 bitmap, and counts the bits that differ between each pair. It's something like 98% accurate when used on typical image subjects. Text or other graffiti added to pictures will usually not confuse the program, but if you take a lot of very similar pictures (like sunsets or webcam grabs) they will probably turn up as false positives.

NEW (20010211): Download findimagedupes 0.1.3.

Download findimagedupes 0.1.2.

NEW (20010218): Download updated Debian Sid package (0.1.3-1) kindly contributed by Guenter Bechly.

Usage:

findimagedupes [options] [<file1> <file2>]
Options:
       -rescan         = rescan fingerprints of all files in directory
       -f <file>       = use <file> as image fingerprint database
       -d <dir>        = scan <dir> instead of current directory
       -t <num>        = use <num> as threshold% of similarity (default 90)
       -v <program>    = launch <program> (in bg) to view each set of dupes
       -c <file>       = create GQView collection <file>.gqv of duplicates
       <file1> <file2> = diff just those two files, using -v if present
                         (other options ignored if files are specified)
       -p              = only valid when files specified; prints the
                         hex of the actual fingerprint of each file.
       -g              = GUI mode: produce only machine-friendly output.

Requirements:

  • perl - as with everything on this page
  • ImageMagick - library for manipulating images
  • PerlMagick (Image::Magick) - Perl interface to above
  • pwd, find, sort, tput (curses), file (i.e. if this works right under NT I'd be surprised)
  • A bunch of pictures of which you've totally lost control
  • (optional) GQView - to manage collections of duplicate images visually

CD::Info

[2000/05/19 23:30] This perl module is very primitive right now, but basically if you're running Linux and put a data CD in the drive, this will allow you to get the CD title (CD::Info::cdtitle()) or other info which basically means number of tracks (%info = CD::Info::cdinfo()). I guess I'll submit it to cpan if I ever make it do more, like navigate multisession CD's (which I myself never make.)

If anyone objects to me using a new perl module namespace (CD), please provide a suggestion. I really have no idea what existing category this would fit under. It is currently OS specific, but there's no Linux category and anyway I hope it won't be OS specific forever.

kcdfind

[2001/03/01 21;55] Due to PerlQt being apparently broken under QT2.2 and KDE2.x, and due to my own inability to debug Perl bindings against C++ libraries, kcdfind is pretty much dead at this point. I'm looking at alternatives, such as writing a converter to migrate existing cdcat files to another cataloging program (and patching that program to use the cd label when it exists) or writing a new gtk-based front end to cdfind. Sorry for any inconvenience this may cause the 2 other users of kcdfind ;)

[2000/05/20 18:41] Kudla's CD Finder is a PerlQt CD catalog app, as well as having a commandline version. It is still in its early stages and should be considered unstable, though on my machine it works great. ;)

[2000/05/22 23:30] You can download version 0.10 which includes both kcdfind and cdfind. Here is a screenshot as of this evening.

Basically it does the same sort of thing every other cd catalog program does (scan CD-ROM's, save info on all the files, let you search for files) but no Linux-based CD cataloger that I could find would use the CD title, which my old (windoze based) cataloger relies on (it really saves a lot of confusion and typing.) Oh yeah, at the moment it probably only works under Linux, because CD::Info does ioctl stuff that I assume is not portable. If you can help me out with that please let me know!

Note: Before running either kcdfind or cdfind for the first time, type "touch cdcat" in the directory you're running it from. I'll fix this in the next release.

Requirements:

  • CD::Info - my CD info module.
  • DBI - Perl database interface
    I use Mandrake 7.0, and this came in RPM form on the installation CD. You should install it from your distribution's CD if you can.
  • DBD::CSV - CSV file driver for DBI.
    This allows the program to save information in a database without needing you to set up a database server. As a result the program is slower when you have a big database, but the security hassles of running a database server aren't there.
  • SQL::Statement - required by DBD::CSV.
  • Text::CSV - also required by DBD::CSV.
If you only want to use the command line cdfind, you don't need:
  • PerlQt - Perl interface to Qt widget set.
  • Qt - if you have KDE, you should have this already.

Rob's perl programming page, March 2001, webmaster@kudla.org