KwicsChinksAndChunks

 

Half a century ago ago John Tukey (of Fast Fourier Transform, "FFT", fame) invented a superb user interface for free-text information retrieval. Tukey took titles from thousands of papers published by the American Statistical Association and built an amazingly powerful yet easy-to-use index to them. His design incorporated three key insights:

  • KWIC — "Key Word In Context," a form of permuted-term display that puts each chosen word at the center of a line with context on each side
  • Chinks — little words, delimiters that break apart the meaningful units of thought (like "a", "of", "is", etc.)
  • Chunks — phrases that cohere, terms of art (like "standard deviation", "chi squared", "normal distribution", etc.)

Tukey used the computers of his time to do index generation, permutation, and sorting — with decks of punch cards and reams of all-caps line printer output. He split titles at chinks, bound chunks together as units, and generated KWICs from the results. The product was a three-volume reference set, useful in its time, now almost impossible to locate — essentially forgotten.

I only know of John Tukey's work because of a chance meeting with him in the early 1990s. I showed him my little indexer/browser software (see http://www.his.com/~z/ftirp.html = "Notes on Free Text Information Retrieval" and the http://www.his.com/~z/c/ = "Free Text Archive"). Tukey laughed with joy. He had dreamed of a virtually identical information interface but could only implement a static version of it on paper; now, the personal computer had brought it to life.

Monday, January 31, 2000 at 10:44:52 (EST) = 2000-01-31

TopicPersonalHistory - TopicScience - TopicProfiles


(correlates: Comments on High Voltage Fiberoptic Cable, JohnTukey, EyeCandy, ...)