charlie's blog

Saturday, November 1, 2008

a hofstadter moment

(or, why platypuses don't make good biologists)

Douglas Hofstadter loves the idea of self-reference. As a cognitive scientist, author, composer, and teacher, his works commonly refer back to themselves or fold in on themselves in strange and delightful ways. His recent book I Am a Strange Loop is all about self-reference, and I can't recommend it highly enough. A few years ago I had what I would call a Hofstadter Moment, a surprising and unexpected case of self-reference. It came as I was developing Foogle, a tool to index and search computer code.

As a software developer, I work every day with a code base made up of many thousands of files, containing among them millions of lines of code. I often need to search this code, to find where and how different parts of the system are used. Having grown accustomed to Google and its ability to instantaneously search billions of Web pages, waiting five or ten minutes for Windows Search to find what I was looking for just didn't cut it. So I decided to write a tool that could index our code every night, and let me perform fast and accurate searches of that index any time I needed. In homage to Google (my favorite search engine) and "foo" (the official nonsense-word of programmers everywhere), I decided to call the tool Foogle.

Foogle's job would be to scan through each code file, break it up into individual parts ('tokens', in programmer-speak), and add an entry for each part to the index, which could later be searched. This would be very much like indexing a book, where you would separate the text on each page into individual words, and then add the words of interest to the index. The big difference is that computer code isn't nearly as easy to decipher as the written word - it tends to look like an explosion of letters and punctuation. This meant that the code for tokenizing a file was relatively tricky to write.

Once I had a working prototype, I turned it loose to start indexing code. First I tried it on some simple, made-up code files, and things worked pretty well. Then I tried letting it index our entire "utilities" folder, which happened to contain the code for Foogle itself. Much to my surprise, when it hit the Foogle code, it broke - it couldn't index itself. Not only that, but as I investigated further, I was amazed to find that the line of code that couldn't be indexed was the very line of code that had the bug! This line of code, when running as part of a program, was unable to cope with its own written representation. It's a bit like the double-take you might experience when reading the following: "I like the color red" (psychologists call this the Stroop effect).

As it turned out, fixing the problem was relatively easy, and before long I had a working program. Still, this was one of the most interesting bugs I've ever found, purely because of its strange, self-referential nature. It seemed rather like a platypus trying to be a biologist. With their furry bodies and duck-like bills, their laying of eggs and nursing of young, platypuses1 just don't properly conform to any of the standard animal categories. Imagine a platypus as a biologist, on the day when it walks past a mirror and realizes that it can't even categorize itself!

1. You might say I'm wrong and that 'platypi' is the correct form. However, I've consulted Wikipedia, and I stand by my 'platypuses' (and my platypuses).

Labels: ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]