^zhurnaly   -   Random   -   Recent   -   Running   -   Mantra   -   Tarot   -   Help

RegExplanations

Happy, happy happy! I've finally grokked (at least momentarily) the connection between Finite State Automata and Regular Expressions ... and at last I begin to understand why some regexps with nested quantifiers are so incredibly slow, a puzzle that has pestered me for several years now.

But first back up to get a modicum of context: what's all this about?

Now you know enough to go on. So we've got these little mathematical machines, FSAs ... and we've got this little logical language, regexps ... hmmm! Maybe they can work together? Maybe one could be translated into the other? Maybe they can jointly be used to solve complex and important problems with great speed? Yes to all of the above!

But heretofore I had only been told that FSAs and regexps were intimately linked; I had never seen an explicit explanation of how they cohabit. Textbook assertions like "It can be demonstrated ..." failed to scratch my itch.

Then, by sheer good fortune, Darren Neimke read a ZhurnalWiki note on a completely unrelated theme (SemioticArsenal, 20 Nov 2003) and posted a comment. That led me to Darren's http://regexblogs.com/ site where I found a thicket of good information on the technology of regular expressions ("regexes" to Darren, "regexen" to the affected --- but traditarian that I am, still "regexps" to me). My thirst for understanding the FSA-regexp link was aroused from its long slumber.

So I asked Darren, and he kindly pointed me to the lovely essay by Mark-Jason Dominus, "How Regexes Work" (http://perl.plover.com/Regex/article.html). I read MJD's words and the scales fell from my eyes --- much like the Eureka! moment of revelation that I experienced when I figured out how to solve a Rubik's Cube (see RubikCubism1, 16 Mar 2001).

And the other question that had nagged me? Some regexps are woefully inefficient --- they seem to take forever to run. Why? A Perl programming book had cautioned me not to use nested wild-card quantifiers, but it didn't explain the sources of the danger. Dominus did, beautifully. It's all a matter of how Perl implements regexps as FSAs, and in particular the backtracking depth-first search strategy that Perl uses to match patterns. (see BreadthAndDepth, 11 Jun 1999)

In a letter to me Darren said that "... finite state machines sit about 4 layers of abstraction below my level of competency :-) ...". Me too! But it's important to occasionally go spelunking and get a glimpse of those deeper levels, just as it's important (and fun, for a physicist's personality anyway) to have at least some vague notion of how everything works --- and thereby get a better feel for the limitations of tools that are built on those foundations. (see KnowHowAndFearNot, 19 Nov 1999)

And in another sense, maybe it's not down but up that the progression is going --- to greater levels of abstraction, richer and more powerful conceptual spaces. (see CreativeDevices, 1 Jan 2001, and No Concepts At All, 22 Feb 2001, and Darren Neimke's "Death by Abstraction" within http://regexblogs.com/, 26 Nov 2003)


TopicProgramming - TopicScience - TopicLanguage - TopicPersonalHistory - Datetag20031203



(correlates: WorseIsBetter, WikiWord, DarrenNeimke, ...)

1 person liked this