CorrelOracle3

 

CorrelOracle3 is the program that generated the links at the bottom of most ZhurnalWiki pages. It's the most recent, and perhaps the last (for a little while) ^z attempt to automagically cross-correlate files using co-occurrence measurements of words and word patterns. See CorrelOracle and CorrelOracle2 for lengthier background discussion; see CorrelOracle3SourceCode for the Perl code that did the link-building. For additional and specific details of why two pages are connected, consult the CorrelationLog.

To improve the quality of the inter-file connections, version 0.3 incorporates several changes from earlier editions:

  • stopwords now include days of the week, months of the year, time zones, and the label "Datetag"
  • a "-" is now appended to the end of each word to indicate possible stemming
  • two-word phrases (after stemming and stopword elimination) are now analyzed on the same footing as individual words
  • output is now more informative and (arguably) more æsthetic

The actual code has been cleaned up a tiny bit as well. I've learned how to test for non-existence of array and hash elements, for instance, so running "perl -w" on CorrelOracle no longer produces a sea of usage warnings. On the other hand, I have done nothing to optimize the software for speed or memory efficiency, so perhaps that should be considered in the next major re-write.

Meanwhile, I hope to be looking into clustering algorithms and other potential improvements. Please send me comments, criticism, pointers to anomalies, suggestions, and any other feedback that comes to mind concerning CorrelOracle. Thank you!

TopicProgramming - 2001-09-16


(correlates: Buss and Ride, CorrelOracle2, CorrelationLog, ...)