Bit by Bit by Bit

In this sample chapter from Code: The Hidden Language of Computer Hardware and Software, 2nd Edition, Charles Petzold explains why the bit, coined to mean binary digit, has come to be regarded in the computer age as the basic building block of information.

A story dating from at least the 1950s tells of a man traveling home after a stint in a distant prison. He doesn’t know if he’ll be welcomed back, so he requests a sign in the form of some cloth tied around a branch of a tree. In one version of the story, the man is traveling by train to his family, and he hopes to see a white ribbon on an apple tree. In another, he’s traveling by bus to his wife, and he’s looking for a yellow handkerchief on an oak tree. In both versions of the story, the man arrives to see the tree covered with hundreds of these banners, leaving no doubt of his welcome.

The story was popularized in 1973 with the hit song “Tie a Yellow Ribbon Round the Ole Oak Tree,” and since then, displaying a yellow ribbon has also become a custom when family members or loved ones are away at war.

The man who requested that yellow ribbon wasn’t asking for elaborate explanations or extended discussion. He didn’t want any ifs, ands, or buts. Despite the complex feelings and emotional histories that would have been at play, all the man really wanted was a simple yes or no. He wanted a yellow ribbon to mean “Yes, even though you messed up big time and you’ve been in prison for three years, I still want you back with me under my roof.” And he wanted the absence of a yellow ribbon to mean “Don’t even think about stopping here.”

These are two clear-cut, mutually exclusive alternatives. Equally effective as the yellow ribbon (but perhaps more awkward to put into song lyrics) would be a traffic sign in the front yard: perhaps “Merge” or “Wrong Way.”

Or a sign hung on the door: “Open” or “Closed.”

Or a flashlight in the window, turned on or off.

You can choose from lots of ways to say yes or no if that’s all you need to say. You don’t need a sentence to say yes or no; you don’t need a word, and you don’t even need a letter. All you need is a bit, and by that I mean all you need is a 0 or a 1.

As you discovered in the two previous chapters, there’s nothing all that special about the decimal number system that we normally use for counting. It’s pretty clear that we base our number system on ten because that’s the number of fingers we have. We could just as reasonably base our number system on eight (if we were cartoon characters) or four (if we were lobsters) or even two (if we were dolphins).

There’s nothing special about the decimal number system, but there is something special about binary, because binary is the simplest number system possible. There are only two binary digits—0 and 1. If we want something simpler than binary, we’ll have to get rid of the 1, and then we’ll be left with just a 0, and we can’t do much of anything with just that.

The word bit, coined to mean binary digit, is surely one of the loveliest words invented in connection with computers. Of course, the word has the normal meaning, “a small portion, degree, or amount,” and that normal meaning is perfect because one binary digit is a very small quantity indeed.

Sometimes when a word is invented, it also assumes a new meaning. That’s certainly true in this case. Beyond the binary digits used by dolphins for counting, the bit has come to be regarded in the computer age as the basic building block of information.

Now that’s a bold statement, and of course, bits aren’t the only things that convey information. Letters and words and Morse code and Braille and decimal digits convey information as well. The thing about the bit is that it conveys very little information. A bit of information is the tiniest amount of information possible, even if that information is as important as the yellow ribbon. Anything less than a bit is no information at all. But because a bit represents the smallest amount of information possible, more complex information can be conveyed with multiple bits.

“Listen, my children, and you shall hear / Of the midnight ride of Paul Revere,” wrote Henry Wadsworth Longfellow, and while he might not have been historically accurate when describing how Paul Revere alerted the American colonies that the British had invaded, he did provide a thought-provoking example of the use of bits to communicate important information:

He said to his friend “If the British march

By land or sea from the town to-night,

Hang a lantern aloft in the belfry arch

Of the North Church tower as a signal light—

One, if by land, and two, if by sea…”

To summarize, Paul Revere’s friend has two lanterns. If the British are invading by land, he will put just one lantern in the church tower. If the British are coming by sea, he will put both lanterns in the church tower.

However, Longfellow isn’t explicitly mentioning all the possibilities. He left unspoken a third possibility, which is that the British aren’t invading just yet. Longfellow implies that this circumstance will be conveyed by the absence of lanterns in the church tower.

Let’s assume that the two lanterns are actually permanent fixtures in the church tower. Normally they aren’t lit:

chap11fig01.jpg

This means that the British aren’t yet invading. If one of the lanterns is lit,

chap11fig02.jpg

or

chap11fig03.jpg

the British are coming by land. If both lanterns are lit,

chap11fig04.jpg

the British are coming by sea.

Each lantern is a bit and can be represented by a 0 or 1. The story of the yellow ribbon demonstrates that only one bit is necessary to convey one of two possibilities. If Paul Revere needed only to be alerted that the British were invading and not where they were coming from, one lantern would have sufficed. The lantern would have been lit for an invasion and unlit for another evening of peace.

Conveying one of three possibilities requires another lantern. Once that second lantern is present, however, the two bits allow communicating one of four possibilities:

00 = The British aren’t invading tonight.

01= They’re coming by land.

10= They’re coming by land.

11= They’re coming by sea.

What Paul Revere did by sticking to just three possibilities was actually quite sophisticated. In the lingo of communication theory, he used redundancy to offset the effect of noise. The word noise is used in communication theory to refer to anything that interferes with communication. A bad mobile connection is an obvious example of noise that interferes with a phone communication. Communication over the phone is usually successful even in the presence of noise because spoken language is heavily redundant. We don’t need to hear every syllable of every word in order to understand what’s being said.

In the case of the lanterns in the church tower, noise can refer to the darkness of the night and the distance of Paul Revere from the tower, both of which might prevent him from distinguishing one lantern from the other. Here’s the crucial passage in Longfellow’s poem:

And lo! As he looks, on the belfry’s height

A glimmer, and then a gleam of light!

He springs to the saddle, the bridle he turns,

But lingers and gazes, till full on his sight

A second lamp in the belfry burns!

It certainly doesn’t sound as if Paul Revere was in a position to figure out exactly which one of the two lanterns was first lit.

The essential concept here is that information represents a choice among two or more possibilities. When we talk to another person, every word we speak is a choice among all the words in the dictionary. If we numbered all the words in the dictionary from 1 through 351,482, we could just as accurately carry on conversations using the numbers rather than words. (Of course, both participants would need dictionaries in which the words are numbered identically, as well as plenty of patience.)

The flip side of this is that any information that can be reduced to a choice among two or more possibilities can be expressed using bits. Needless to say, there are plenty of forms of human communication that do not represent choices among discrete possibilities and that are also vital to our existence. This is why people don’t form romantic relationships with computers. (Let’s hope not, anyway.) If you can’t express something in words, pictures, or sounds, you’re not going to be able to encode the information in bits. Nor would you want to.

For over a decade toward the end of the 20th century, the film critics Gene Siskel and Robert Ebert demonstrated a use of bits in the TV program they hosted, called At the Movies. After delivering their more detailed movie reviews they would issue a final verdict with a thumbs-up or a thumbs-down.

If those two thumbs are bits, they can represent four possibilities:

00= They both hated it.

01= Siskel hated it; Ebert loved it.

10= Siskel loved it; Ebert hated it.

11= They both loved it.

The first bit is the Siskel bit, which is 0 if Siskel hated the movie and 1 if he liked it. Similarly, the second bit is the Ebert bit.

So back in the day of At the Movies, if your friend asked you, “What was the verdict from Siskel and Ebert about that new movie Impolite Encounter?” instead of answering, “Siskel gave it a thumbs-up and Ebert gave it a thumbs-down” or even “Siskel liked it; Ebert didn’t,” you could have simply said, “One zero,” or if you converted to quaternary, “Two.” As long as your friend knew which was the Siskel bit and which was the Ebert bit, and that a 1 bit meant thumbs-up and a 0 bit meant thumbs-down, your answer would be perfectly understandable. But you and your friend have to know the code.

We could have declared initially that a 1 bit meant a thumbs-down and a 0 bit meant a thumbs-up. That might seem counterintuitive. Naturally, we like to think of a 1 bit as representing something affirmative and a 0 bit as the opposite, but it’s really just an arbitrary assignment. The only requirement is that everyone who uses the code must know what the 0 and 1 bits mean.

The meaning of a particular bit or collection of bits is always understood contextually. The meaning of a yellow ribbon around a particular oak tree is probably known only to the person who put it there and the person who’s supposed to see it. Change the color, the tree, or the date, and it’s just a meaningless scrap of cloth. Similarly, to get some useful information out of Siskel and Ebert’s hand gestures, at the very least we need to know what movie is under discussion.

If while watching At the Movies you maintained a list of the films and how Siskel and Ebert voted with their thumbs, you could have added another bit to the mix to include your own opinion. Adding this third bit increases the number of different possibilities to eight:

000= Siskel hated it; Ebert hated it; I hated it.

001= Siskel hated it; Ebert hated it; I loved it.

010= Siskel hated it; Ebert loved it; I hated it.

011= Siskel hated it; Ebert loved it; I loved it.

100= Siskel loved it; Ebert hated it; I hated it.

101= Siskel loved it; Ebert hated it; I loved it.

110= Siskel loved it; Ebert loved it; I hated it.

111= Siskel loved it; Ebert loved it; I loved it.

One bonus of using bits to represent this information is that we know that we’ve accounted for all the possibilities. We know there can be eight and only eight possibilities and no more or fewer. With 3 bits, we can count only from zero to seven. There are no more three-digit binary numbers. As you discovered toward the end of the previous chapter, these three-digit binary numbers can also be expressed as octal numbers 0 through 7.

Whenever we talk about bits, we often talk about a certain number of bits. The more bits we have, the greater the number of different possibilities we can convey.

It’s the same situation with decimal numbers, of course. For example, how many telephone area codes are there? The area code is three decimal digits long, and if all the combinations of three digits are used (which they aren’t, but we’ll ignore that), there are 103, or 1000, codes, ranging from 000 through 999. How many seven-digit phone numbers are possible within the 212 area code? That’s 107, or 10,000,000. How many phone numbers can you have with a 212 area code and a 260 prefix? That’s 104, or 10,000.

Similarly, in binary the number of possible codes is always equal to 2 to the power of the number of bits:

Number of Bits

Number of Codes

1

21 = 2

2

22 = 4

3

23 = 8

4

24 = 16

5

25 = 32

6

26 = 64

7

27 = 128

8

28 = 256

9

29 = 512

10

210 = 1024

Every additional bit doubles the number of codes.

If you know how many codes you need, how can you calculate how many bits you need? In other words, how do you go backward in the preceding table?

The math you need is the base-two logarithm. The logarithm is the opposite of the power. We know that 2 to the 7th power equals 128. The base-two logarithm of 128 equals 7. To use more mathematical notation, this statement

equ123-01.jpg

is equivalent to this statement:

equ123-02.jpg

So if the base-two logarithm of 128 is 7 and the base-two logarithm of 256 is 8, then what’s the base-two logarithm of numbers in between 128 and 256—for example, 200? It’s actually about 7.64, but we really don’t have to know that. If we needed to represent 200 different things with bits, we’d need 8 bits, just as when Paul Revere needed two lanterns to convey one of three possibilities. Going strictly by the mathematics, the number of bits required for Paul Revere’s three possibilities is the base-two logarithm of 3, or about 1.6, but in a practical sense, he needed 2.

Bits are often hidden from casual observation deep within our electronic appliances. We can’t see the bits encoded inside our computers, or streaming through the wires of our networks, or in the electromagnetic waves surrounding Wi-Fi hubs and cell towers. But sometimes the bits are in clear view.

Such was the case on February 18, 2021, when the Perseverance rover landed on Mars. The parachute seen in a photograph from the rover was assembled from 320 orange and white strips of fabric arranged in four concentric circles:

It didn’t take long for Twitter users to decode the pattern. The key is to divide the strips of fabric into groups of seven containing both orange and white. These groups of seven strips are always separated by three white strips. The areas consisting of consecutive orange strips are ignored. In this diagram, each group of seven strips is surrounded by a heavy black line:

Each of these groups is a binary number with a white strip representing 0 and an orange strip representing 1. Right above the inner circle is the first group. Going clockwise, these seven strips encode the binary number 0000100, or decimal 4. The 4th letter of the alphabet is D. The next one going clockwise is 0000001, or decimal 1. That’s an A. Next is 0010010, or decimal 18. The 18th letter of the alphabet is R. Next is 00000101, or decimal 5, which is an E. The first word is DARE.

Now jump to the next outer level. The bits are 0001101, or decimal 13, the letter M. When you finish, you’ll spell out three words, a phrase that originated with Teddy Roosevelt and that has become the unofficial motto of the NASA Jet Propulsion Laboratory.

Around the outer circle are some encoded numbers as well, revealing the latitude and longitude of the Jet Propulsion Laboratory: 34°11′58″N 118°10′31″W. With the simple coding system used here, there’s nothing that distinguishes letters and numbers. The numbers 10 and 11 that are part of the geographic coordinates could be the letters J and K. Only the context tells us that they’re numbers.

Perhaps the most common visual display of binary digits is the ubiquitous Universal Product Code (UPC), that little barcode symbol that appears on virtually every packaged item that we purchase. The UPC is one of dozens of barcodes used for various purposes. If you have the printed version of this book, you’ll see on the back cover another type of barcode that encodes the book’s International Standard Book Number, or ISBN.

Although the UPC inspired some paranoia when it was first introduced, it’s really an innocent little thing, invented for the purpose of automating retail checkout and inventory, which it does fairly successfully. Prior to the UPC, it wasn’t possible for supermarket registers to provide an itemized sales receipt. Now it’s commonplace.

Of interest to us here is that the UPC is a binary code, although it might not seem like one at first. It might be interesting to decode the UPC and examine how it works.

In its most common form, the UPC is a collection of 30 vertical black bars of various widths, divided by gaps of various widths, along with some digits. For example, this is the UPC that appears on the 10¾-ounce can of Campbell’s Chicken Noodle Soup:

chap11fig08.jpg

That same UPC appeared in the first edition of this book. It hasn’t changed in over 20 years!

We’re tempted to try to visually interpret the UPC in terms of thin bars and black bars, narrow gaps and wide gaps, and indeed, that’s one way to look at it. The black bars in the UPC can have four different widths, with the thicker bars being two, three, or four times the width of the thinnest bar. Similarly, the wider gaps between the bars are two, three, or four times the width of the thinnest gap.

But another way to look at the UPC is as a series of bits. Keep in mind that the whole barcode symbol isn’t exactly what the scanner “sees” at the checkout counter. The scanner doesn’t try to interpret the numbers printed at the bottom, for example, because that would require a more sophisticated computing technique, known as optical character recognition, or OCR. Instead, the scanner sees just a thin slice of this whole block. The UPC is as large as it is to give the checkout person something to aim the scanner at. The slice that the scanner sees can be represented like this:

This looks almost like Morse code, doesn’t it? In fact, the original invention of scannable barcodes was partially inspired by Morse code.

As the computer scans this information from left to right, it assigns a 1 bit to the first black bar it encounters and a 0 bit to the next white gap. The subsequent gaps and bars are read as a series of 1, 2, 3, or 4 bits in a row, depending on the width of the gap or the bar. The correspondence of the scanned barcode to bits is simply:

So the entire UPC is simply a series of 95 bits. In this particular example, the bits can be grouped as follows:

tab128-01.jpg

The first 3 bits are always 101. This is known as the left-hand guard pattern, and it allows the computer-scanning device to get oriented. From the guard pattern, the scanner can determine the width of the bars and gaps that correspond to single bits. Otherwise, the UPC would have to be a specific size on all packages.

The left-hand guard pattern is followed by six groups of 7 bits each. You’ll see shortly how each of these is a code for a numeric digit 0 through 9. A 5-bit center guard pattern follows. The presence of this fixed pattern (always 01010) is a form of built-in error checking. If the computer scanner doesn’t find the center guard pattern where it’s supposed to be, it won’t acknowledge that it has interpreted the UPC. This center guard pattern is one of several precautions against a code that has been tampered with or badly printed.

The center guard pattern is followed by another six groups of 7 bits each, which are then followed by a right-hand guard pattern, which is always 101. This guard pattern at the end allows the UPC code to be scanned backward (that is, right to left) as well as forward.

So the entire UPC encodes 12 numeric digits. The left side of the UPC encodes six digits, each requiring 7 bits. You can use the following table to decode these bits:

Left-Side Codes

0001101 = 0

0110001 = 5

0011001 = 1

0101111 = 6

0010011 = 2

0111011 = 7

0111101 = 3

0110111 = 8

0100011 = 4

0001011 = 9

Notice that each 7-bit code begins with a 0 and ends with a 1. If the scanner encounters a 7-bit code on the left side that begins with a 1 or ends with a 0, it knows either that it hasn’t correctly read the UPC code or that the code has been tampered with. Notice also that each code has only two groups of consecutive 1 bits. This implies that each digit corresponds to two vertical bars in the UPC code.

Examine these codes more closely, and you’ll discover that they all have an odd number of 1 bits. This is another form of error and consistency checking, known as parity. A group of bits has even parity if it has an even number of 1 bits and odd parity if it has an odd number of 1 bits. Thus, all of these codes have odd parity.

To interpret the six 7-bit codes on the right side of the UPC, use the following table:

Right-Side Codes

1110010 = 0

1001110 = 5

1100110 = 1

1010000 = 6

1101100 = 2

1000100 = 7

1000010 = 3

1001000 = 8

1011100 = 4

1110100 = 9

These codes are the opposites or complements of the earlier codes: Wherever a 0 appeared is now a 1, and vice versa. These codes always begin with a 1 and end with a 0. In addition, they have an even number of 1 bits, which is even parity.

So now we’re equipped to decipher the UPC. Using the two preceding tables, we can determine that the 12 decimal digits encoded in the 10¾-ounce can of Campbell’s Chicken Noodle Soup are

equ129-01.jpg

This is very disappointing. As you can see, these are precisely the same numbers that are conveniently printed at the bottom of the UPC. (This makes a lot of sense: If the scanner can’t read the code for some reason, the person at the register can manually enter the numbers. Indeed, you’ve undoubtedly seen this happen.) We didn’t have to go through all that work to decode the numbers, and moreover, we haven’t come close to revealing any secret information. Yet there isn’t anything left in the UPC to decode. Those 30 vertical lines resolve to just 12 digits.

Of the 12 decimal digits, the first (a 0 in this case) is known as the number system character. A 0 means that this is a regular UPC code. If the UPC appeared on variable-weight grocery items such as meat or produce, the code would be a 2. Coupons are coded with a 5.

The next five digits make up the manufacturer code. In this case, 51000 is the code for the Campbell Soup Company. All Campbell products have this code. The five digits that follow (01251) are the code for a particular product of that company—in this case, the code for a 10 ¾-ounce can of Chicken Noodle Soup. This product code has meaning only when combined with the manufacturer’s code. Another company’s chicken noodle soup might have a different product code, and a product code of 01251 might mean something totally different from another manufacturer.

Contrary to popular belief, the UPC doesn’t include the price of the item. That information has to be retrieved from the computer that the store uses in conjunction with the checkout scanners.

The final digit (a 7 in this case) is called the modulo check character. This character enables yet another form of error checking. You can try it out: Assign each of the first 11 digits (0 51000 01251 in our example) a letter:

equ130-01.jpg

Now calculate the following:

equ130-02.jpg

and subtract that from the next highest multiple of 10. In the case of Campbell’s Chicken Noodle Soup, we have

equ130-03.jpg

The next highest multiple of 10 is 30, so

equ130-04.jpg

and that’s the modulo check character printed and encoded in the UPC. This is a form of redundancy. If the computer controlling the scanner doesn’t calculate the same modulo check character as the one encoded in the UPC, the computer won’t accept the UPC as valid.

Normally, only 4 bits would be required to specify a decimal digit from 0 through 9. The UPC uses 7 bits per digit. Overall, the UPC uses 95 bits to encode only 11 useful decimal digits. Actually, the UPC includes blank space (equivalent to nine 0 bits) at both the left and right sides of the guard pattern. That means the entire UPC requires 113 bits to encode 11 decimal digits, or over 10 bits per decimal digit!

Part of this overkill is necessary for error checking, as we’ve seen. A product code such as this wouldn’t be very useful if it could be easily altered by a customer wielding a felt-tip pen.

The UPC also benefits by being readable in both directions. If the first digits that the scanning device decodes have even parity (that is, an even number of 1 bits in each 7-bit code), the scanner knows that it’s interpreting the UPC code from right to left. The computer system then uses this table to decode the right-side digits:

Right-Side Codes in Reverse

0100111 = 0

0111001 = 5

0110011 = 1

0000101 = 6

0011011 = 2

0010001 = 7

0100001 = 3

0001001 = 8

0011101 = 4

0010111 = 9

And this table for the left-side digits:

Left-Side Codes in Reverse

1011000 = 0

1000110 = 5

1001100 = 1

1111010 = 6

1100100 = 2

1101110 = 7

1011110 = 3

1110110 = 8

1100010 = 4

1101000 = 9

These 7-bit codes are all different from the codes read when the UPC is scanned from left to right. There’s no ambiguity.

One way to cram more information in a scannable code is to move to two dimensions. Instead of a string of thick and thin bars and spaces, create a grid of black and white squares.

The most common two-dimensional barcode is probably the Quick Response (QR) code, first developed in Japan in 1994 and now used for a variety of purposes.

Creating your own QR code is free and easy. Several websites exist for that very purpose. Software is also readily available that can scan and decode QR codes through a camera on a mobile device. Dedicated QR scanners are available for industrial purposes, such as tracking shipments or taking inventory in warehouses.

Here’s a QR code that encodes the URL of the website for this book, CodeHiddenLanguage.com:

chap11fig11.jpg

If you have an app on your mobile device that can read QR codes, you can point it at that image and go to the website.

QR codes consist of a grid of squares that are called modules in the official QR specification. This particular QR code has 25 modules horizontally and vertically, which is a size called Version 2. Forty different sizes of QR codes are supported; Version 40 has 177 modules horizontally and vertically.

If each little block is interpreted as a bit—0 for white and 1 for black—a grid of this size potentially encodes 25 times 25, or 625 bits. But the real storage capability is about a third of that. Much of the information is devoted to a mathematically complex and sophisticated scheme of error correction. This protects the QR code from tampering and can also aid in recovering data that might be missing from a damaged code. I will not be discussing QR code error correction.

Mostly obviously, the QR code also contains several fixed patterns that assist the QR scanner in properly orienting the grid. In the following image, the fixed patterns are shown in black and white, and everything else is shown in gray:

chap11fig12.jpg

The three large squares at the corners are known as finder patterns; the smaller square toward the lower right is known as an alignment pattern. These assist the QR code reader in properly orienting the code and compensating for any distortion. The horizontal and vertical sequences of alternating black and white cells near the top and at the left are called timing patterns and are used for determining the number of cells in the QR code. In addition, the QR code must be entirely surrounded by a quiet zone, which is a white border four times as wide as a cell.

Programs that create a QR code have several options, including different systems of error correction. Information required for a QR code reader to perform this error correction (and other tasks) is encoded in 15 bits called format information. These 15 bits appear twice in the QR code. Here are those 15 bits labeled 0 through 14 on the right and bottom of the upper-left finder pattern, and repeated below the upper-right finder pattern and to the right of the lower-left finder pattern:

chap11fig13.jpg

Bits are sometimes labeled with numbers like this to indicate how they constitute a longer value. The bit labeled 0 is the least significant bit and appears at the far right of the number. The bit labeled 14 is the most significant bit and appears at the left. If white cells are 0 bits and black cells are 1 bits, here is that complete 15-bit number:

equ134-01.jpg

Why is bit 0 the least significant bit? Because it occupies the position in the full number corresponding to 2 to the zero power. (See the top of page 109 if you need a reminder of how bits compose a number.)

The actual numeric value of this 15-bit number is not important, because it consolidates three pieces of information. The two most significant bits indicate one of four error-correction levels. The ten least significant bits specify a 10-bit BCH code used for error correction. (BCH stands for the inventors of this type of code: Bose, Chaudhuri, and Hocquenghem. But I promised I wouldn’t discuss the QR code error correction!)

In between the 2-bit error-correction level and the 10-bit BCH code are three bits that are not used for error correction. I’ve highlighted those three bits in bold:

equ134-02.jpg

It turns out that QR code readers work best when there are approximately an equal number of black and white squares. With some encoded information, this will not be the case. The program that creates the QR code is responsible for selecting a mask pattern that evens out the number of black and white squares. This mask pattern is applied to the QR code to flip selected cells from white or black, or black to white, and hence the bits that they represent from 0 to 1 and from 1 to 0.

The documentation of the QR code defines eight different mask patterns that can be specified by the eight 3-bit sequences 000, 001, 010, 011, 100, 101, 110, and 111. The value in the QR code that we’re examining is 100, and that corresponds to a mask pattern consisting of a series of horizontal lines alternating every other row:

chap11fig14.jpg

Every cell in the original QR code that corresponds to a white area in this mask remains unchanged. Every cell that corresponds to a black area must be flipped from white to black, or from black to white. Notice that the mask avoids altering the fixed areas and the QR information area. Here’s what happens when this mask is applied to the original QR code:

chap11fig15.jpg

The mask doesn’t change the fixed and information areas. Otherwise, if you compare this image with the original QR code, you’ll see that the top row is reversed in color, the second row is the same, the third row is reversed, and so on.

Now we’re ready to start digging into the actual data. Begin with the four bits in the lower-right corner. In the following image, those cells are numbered 0 through 3, where 3 is the most significant bit and 0 is the least significant bit:

These four bits are known as the data type indicator, and they indicate what kind of data is encoded in the QR code. Here are a few of the possible values:

Date-Type Indicator

Meaning

0001

Numbers only

0010

Uppercase letters and numbers

0100

Text encoded as 8-bit values

1000

Japanese kanji

The value for this QR code is 0100, meaning that the data consists of 8-bit values that encode text.

The next item is stored in the eight cells above the data type indicator. These eight bits are numbered 0 through 7 in this illustration:

chap11fig17.jpg

This value is 00011010, which is 26 in decimal. That’s the number of characters encoded in the QR code.

The order of these characters is systematic but weird. The characters begin right above the character count. Each character usually—though not always—occupies an area that is two cells wide and four cells tall, and the characters wind through the grid like this:

chap11fig18.jpg

Not all characters occupy areas that are two cells wide and four cells tall. Fortunately, the official QR specification is quite precise about how the bits are oriented when the area is not rectangular. In this next image, the cells for each of the 26 characters are outlined in red, and the cells are numbered 0 through 7, where 0 denotes the least significant bit and 7 the most significant bit:

chap11fig19.jpg

The QR specification indicates that text is encoded in the QR code using 8-bit values defined in a standard known as ISO/IEC 8859. That’s a fancy term for a variation of the American Standard Code for Information Interchange (ASCII), which I’ll be discussing in more detail in Chapter 13.

The first character is 01110111, which is the ASCII code for w. The next character up is the same. The next character extends to the left, but it is also another w. Now proceed down the next two pairs of columns. The next character is 00101110, which is the period, then 01000011, the uppercase C followed by 01101111, o. The next character straddles the next pair of rows. It’s 01100100: d. The next character begins below the alignment pattern and continues above it. The ASCII code is 01100101, which is e. Continue in this way to spell out www.CodeHiddenLanguage.com.

That’s it. Most of what’s left in the QR code is devoted to error correction.

Codes such as the UPC and QR certainly look forbidding at first glance, and people might be forgiven for assuming that they encode secret (and perhaps devious) information. But in order for these codes to be widely used, they must be well documented and publicly available. The more that they’re used, the more potentially valuable they become as another extension of our vast array of communication media.

Bits are everywhere, but toward the end of my discussion of the QR code, I referred to “8-bit values.” There’s a special word for 8-bit values. You may have heard of it.