charlie's blog

Monday, February 9, 2009

ten not equal to 10 (for some values of 10)

The title of this post was inspired by a classic geek joke, which has been lovingly immortalized on a t-shirt by ThinkGeek:
There are only 10 types of people in the world: Those who understand binary and those who don't.
The joke is that most people will read this as "there are ten types of people in the world", when in fact the correct reading is "there are two types of people in the world". That's because "10" is being used to represent the number two, as written in binary.

Taken at its most literal, this is a (hopefully friendly) poke at non-geek types, who likely have never been exposed to binary numbers. On a deeper level, though, I think it's a great introduction to a very interesting distinction: the difference between a number and its representation. As a programmer, I deal with this distinction regularly, and a recent conversation made me decide to write about it.


At this point, it would be perfectly legitimate to ask what it even means to distinguish between a number and its representation. Is there really a difference?

In a word, yes. There's as much difference between a written "6" and the abstract number six as there is between the word "cow" and a smelly, cud-chewing quadruped. One is a name, a label, and the other is the thing that is named or labeled. In the case of numbers, the thing being labeled is a concept, rather than a mammal, so a more apt analogy might be the difference between the word "angry" and the feeling of anger. If you aren't convinced that there's a difference between the word "anger" and anger itself, consider the fact that the word "angry" would mean nothing to a non-English speaker, but that such a person almost certainly knows what it feels like to be angry. Thus, the name is clearly not the same thing as what's being named.

Getting back to numbers, I can be clearer now: "6" and "six" are both labels for a concept: the quantity six. Of course, I used "six" in describing what "six" labels, so that's cheating, but here's an example of the quantity I'm referring to: @@@@@@ (six @ signs). The point is that quantity is a distinct concept, and two people from different countries or even different planets can still agree on whether they are looking at six widgets or seven, even if they use different words to count them. Likewise, the quantity two is the same whether you call it "two", "dos", or "dva". Since "quantity" probably has less association with written forms than "number", I'll use the former to refer to the pure concept of a number.

Representing quantities

Now that we're clear on the distinction between a quantity and its representation, I can get to the next interesting bit: the representations themselves. I'll bet that you know of at least three distinct ways to represent the number six. If you're a programmer, you probably know more.

First the most basic one, the one you've probably known since you were only a few years old. You may or may not have given it up since then, but I'm sure you still know how to do it: counting on your fingers. Go ahead and do it - hold up six fingers - and you'll see a very simple representation of the quantity six. This is the most basic and possibly the least often used (among educated adults), but in my opinion it's also one of the best. It doesn't get much clearer than holding up six fingers and saying "this many". I'm guessing that this is one of the first ways that kids learn to use numbers.

The next most obvious forms are the ones I've used throughout this discussion: the common spoken form "six", and the common numeral form "6". These are good once you start working with bigger quantities, since humans run short on fingers pretty quickly. These are also where the question of representation starts to gain more depth.

The spoken and numeral forms typically build up numbers by parts. Generally the biggest parts are named first, and then the smaller parts. So you might say "five thousand one hundred and two", indicating five groups of a thousand, one group of a hundred, and two more. Adding up the parts gives you the actual quantity of interest. "Thousand" and "hundred" are just handy names for certain quantities, ones that are used often enough to warrant their own names. The numeral forms are similar: "132" is understood by convention to indicate the one group of a hundred plus three groups of ten plus two more.

Numeric bases

An interesting thing to note with "132" is that there's nothing that really requires that the leftmost digit be counting groups of a hundred. It could be groups of seven, or groups of eighty-five; it's really just a matter of convention. Of course, using the digits to count powers of ten is one of the most common conventions, so that's how many people will read it. Since ten is in some sense the "base" of this series, we call this system "base ten". Most people have base ten so deeply ingrained that we don't even consider other possible ways to name quantities. Its popularity probably comes from the fact that most people have ten fingers for counting on.

Even so, many folks have heard of at least one or two other bases. For instance, you may not realize it, but you likely have also had some exposure to base twenty. The Gettysburg Address opens with "Four score and seven years ago", and while it may sound antiquated, it's still reasonably clear: it means four groups of twenty, and seven more. You could write this as "47", where the "7" represents ones and the "4" represents twenties.

You could use base twenty to represent larger quantities too. Since we're using powers of twenty, the third digit from the right represents four hundreds, so eight hundred and sixty five would be "235": two groups of four hundred, three groups of twenty, and five more. Actually, you probably wouldn't speak it the way I did, since "eight hundred" obviously still employs base ten. Maybe you'd have a name for four hundred, like "tav" in Hebrew, and you'd say something like "two tav three twent and seven". It probably feels weird to break things down into powers of twenty, but it's really not much different from base ten, and in fact base twenty has been used by a number of cultures throughout history.

Another common base, used heavily by programmers and other computer people, is base sixteen, or hexadecimal. Here the digits represent powers of sixteen, so "23" would correspond to the quantity thirty-five (two sixteens plus three). An interesting problem here (and with base twenty) is how to represent a quantity like twelve, since it has to fit in one digit. The most common solution for hexadecimal is to use the letters "A" through "F" to represent ten through fifteen, so twelve would be "C", and thirty would be "1E" (one sixteen plus fourteen).

As I mentioned earlier, we programmers also use base two, or binary. This means we're working with powers of two, so the rightmost digit counts ones, and moving to the left you count twos, fours, eights, sixteens, etc. So "1011" would be one eight, one two, and one one, for a total of eleven. And with that, we're finally back to the joke from the beginning: while the most common reading of "10" is ten, you can also read it as two, if you view it as a quantity represented in binary.

Speaking of numbers...

Another fun thing about this joke is that it doesn't play if you try to speak it verbally. That's because the spoken representation of a quantity is usually unambiguous, at least in English. If you read "10" as "ten", you've blown the joke, and if you read it as "one zero", it just sounds weird.

Thinking about how people verbalize numbers makes me think of something else too, something I've always found odd about spoken Spanish. I'm thinking of how telephone numbers are read: I often hear the first three digits read as "five hundred twenty three" (in Spanish, of course), rather than the English "five two three". The oddness here comes from the fact that telephone numbers are not actually descriptions of any quantity, but rather arbitrary sequences of digits used to uniquely identify a telephone. So it seems odd to use quantity words like "five hundred" when you're not actually talking about a quantity.

On the other hand, there are some things that make more sense in Spanish than they do in English, like the naming of years. In standard English, I was born in "nineteen seventy-nine", but years actually are quantities, specifically a number of years since some starting point. So it would make more sense to call it "one thousand nine hundred seventy-nine", which is exactly how it's spoken in Spanish (but again, in actual Spanish). English has switched back to using thousands for this decade, since saying "twenty nine" would be heard as 29 rather than 2009, but I'm sure many people will go back to "twenty ten" next year. Presumably we Americans prefer this form because it saves some syllables.


So there you have it: a nerdy joke made even more nerdy by a bunch of blathering about quantities and bases. Nothing kills a joke like having to explain it, but I still think the whole thing is pretty interesting.

Labels: , ,