Digital systems are not more delicate than analog ones for archival information storage and retrieval, in spite of common statements to the contrary. Sure, one can record digital data in an easily-lost format, just as one can write analog-fashion in the sand or draw in an ephemeral medium. But with a little care, digital storage can be extraordinarily resistant to loss. Aside from all the ordinary precautions that one takes with any object to safeguard it — control of light, humidity, and temperature, use of stable substrates, physical isolation, etc. — digital technology opens the door to a new means of preservation: mathematical error correction.

Error correction is like holography but with an extra twist. Holograms encode an image and spread it out over a surface. The non-locality of the data means that if part of the hologram is destroyed, the whole picture is still there, only somewhat fuzzier. (Footnote for quibblers: we're talking about classic Fourier-transform-style holography, not white-light or other variants which put image information into narrow strips or patches.) Digital error correction similarly takes data and spreads it out, but with a little extra redundancy. That way if some bits are lost, they can be reconstructed. A trivial form of error correction is to repeat the data three times ("What I tell you three times is true!") and then when reading it out take a "majority rules" approach if parts of the record differ. That's simple but wasteful, and it can only fix a single error for each triplet of information. More sophisticated math permits the detection and correction of many mistakes with only a slight overhead in extra storage.

In fact, though they often don't realize it, when people use words in any form they're dealing with a digital bitstream — whether the text is printed on paper, handwritten as calligraphy, or spoken into a recorder. Words are made of letters, and letters are chosen from a finite set of symbols. The same holds for spoken language, which is built of phonemes. Alphabets typically encode four to eight bits per symbol; a language like Chinese may have up to 20 bits per ideogram. And there's redundancy galore: "Q" in English is almost always followed by "U", the letter "E" is commonest, followed by "T", and so forth. A slightly garbled or faded message can be recovered quite reliably, and vn txts wrttn wtht ny vwls cn b rd strghtfrwrdly n mny crcmstncs. That's natural error correction for a digital medium. It's used by cryptographers to break codes, and by linguists to reconstruct lost languages from fragmentary inscriptions. In a manner of speaking, redundancy and error correction supply a built-in Rosetta Stone for digital media — a safety net that analog systems lack.

Tuesday, January 04, 2000 at 05:49:05 (EST) = 2000-01-04

TopicScience - TopicProgramming - TopicLanguage

(correlates: Zhurnal Three, DespondentStudents, WikiIsIt, ...)