30 Dec 2006

Today I've re-built the correlation links among ZhurnalWiki pages yet again ... to view the gory statistical details of the strongest inter-page correlations please see ...

24 Apr 2005

A fresh set of autolinks are now in place, modulo minor tweaks, among the 1,719 ZhurnalWiki files in place on this date (omitting the RecentChanges files, the TopicIndex files, and a few others) ... ^z

4 April 2004

OK, new inter-page word-correlations are now computed and installed (as of yesterday's files) ... ^z

14 Nov 2002

After almost a year, it's time again to automagically cross-reference the ZhurnalWiki. At the bottom of most pages, therefore, you'll see something new that looks like this example (which appears at the bottom of the page DearDiary):

(correlates: WhyThis, ReadingsOnThinkingAndLiving, TidyTime, ...)

What that means is that, as far as my "Oracle" program is concerned, the listed "correlates" have something in common with the current page ... and therefore might be worth looking at if you're interested in pursuing a subject further. In this specific example, the details read:


  • WhyThis: 0.60 = 0.59 opinion- thought-, ...
  • ReadingsOnThinkingAndLiving: 0.44 = 0.07 self- management-, 0.07 personalenerg-, 0.07 humannatur-, 0.07 book- self-, 0.07 bennet- book-, ...
  • TidyTime: 0.37 = 0.29 pepy- diar-, 0.07 pepy-, 0.01 diar-, ...

So WhyThis is linked to DearDiary with strength 0.60, mostly (0.59) because of the (stemmed) phrase elements "opinion- thought-". More on all that another time; ask me if you're curious.

The process for doing auto-linkage among ^zhurnal entries is rather a hodge-podge of tinkering. In brief, I have to:

  • download all the ZhurnalWiki pages
  • remove pages that aren't appropriate for auto-linking (e.g., RecentChanges, FindPage, TopicIndex, etc.)
  • strip off the headers and footers and older machine-generated hyperlinks (using SnipPattern)
  • run the CorrelOracle against the remaining pages (which takes about an hour of processor time)
  • proofread, test, and validate
  • upload the results to the 'Net

Unfortunately, the links suggested by the Oracle are sometimes rather bizarre, since the program simply uses a weighted-average statistical analysis of the co-occurrence of words and short phrases — and not any sort of deep natural-language "understanding" of page content. (See CorrelOracle, CorrelOracle2, CorrelOracle3, etc. for gory details of how the correlations are calculated.)

But on the brighter side, many of the links that the Oracle recommends aren't half bad ... they're a sporadically-helpful navigation aid between pages which otherwise wouldn't have been linked, because no human being has enough time to do so. CPU cycles are cheap nowadays; we might as well burn some if the result can help people a wee bit. (And "burn" has a more literal meaning on my Macintosh iBook, since the little laptop gets uncomfortably toasty against my leg after so much heavy computation.)

