I realize I'm responding to my own post, but I wanted to post this separately from my actual question.
As far as solving the problem of having to rely on database entries that are very easily orphaned vs. being forced to embed tags into the image itself (changing the contents of the image and affecting the ability of "duplicate image finders" to function), I always wondered if there was a reason why an additional column couldn't be added to the acdsee database that stores an image's SHA1 or MD5 hash. Additional intrinsic attributes, for instance filesize, could be appended if necessary to prevent hash collisions. This is exactly how p2p file sharing networks (like eMule and Limewire) identify files. The magnet URI specification has demonstrated the effectiveness of this approach in tracking files in a distributed networks, why not apply the same technique to manage an image collection.
Given that rating/categorization metadata is intended to describe a specific *image* (irrespective of its specific location) it makes much more sense from a design standpoint to associate said metadata with an image's "digital fingerprint" (i.e. hash value). This is because no matter where an image is moved, it will generate the same hash value. Despite the minor computational overhead of calculating the hash, keying organizational metadata to an image's hash would make the acdsee database significantly more versatile and powerful. Some advantages that come to mind in no particular order:
1) Any image you have previously tagged/rated will be recognized and can have the appropriate rating/categorization info automatically associated with it (this will save wasting time on tagging the same images over and over as well as assure that your categorization ontology is synchronized across your entire image collection... after all if two images are identical it stands to reason that they should share the same tags)
2) The categorization/rating information in the database would be infinitely more portable. Database's could easily be shared, merged, or backed up through a simple import/export interface. It's very unlikely that two different people will use an identical file structure, however identical images will always produce identical hashes. Leveraging this fact, teams of workers can collaborate and exchange categorization ontologies extremely easily (this allows for a much more efficient work flow in a group setting). Presently, the only way to share tagging/rating info with someone is to embed the info into the actual image and then transmit a copy of that image to said person (even if they already have a copy of the original). This becomes completely impractical if large amounts of data need to be exchanged. By using file hashes, all that is required to exchange category/rating info is the data contained within a given database. Because the data maintained in the database is text only, the data for several hundred gigabytes of image files can be condensed to several hundred megabytes... if that.
3) Lastly, by keying database entries to an "intrinsic" property of an image (rather than something as tenuous as the file's location), it is trivial to make the database self-healing. If for the sake of efficiency, a column that keeps track of a file's specific location is necessary, upon "optimizing" the database, the database can automatically attempt to rebind orphaned entries, or simply remove the invalid "location" entry (while leaving the remainder of the data intact and ready to be reassociated to the appropriate file).
Anyway, I realize this discussion is of a more technical nature and has more to do with the architecture of acdsee than with photography itself. That being said, I really think acdsee is a great application and am only posting this in hopes that someone that has some say so with the actual program development might see this and say "hey, that's not a bad idea" (or alternatively, make a post and say, that's not a bad idea... except for this that and the other).
Well, if anyone else has any programming or database know-how and wants to pitch in their two cents as far as the pro's and con's of using a hash based database structure, I look forward to hearing from you.
Posted On November 20, 2008 - 07:49 AM (1 year ago) (
Permalink to this post)