More Advance Praise for Blown to Bits
“Most writing about the digital world comes from techies writing about tech-
nical matter for other techies or from pundits whose turn of phrase greatly
exceeds their technical knowledge. In Blown to Bits, experts in computer
science address authoritatively the practical issues in which we all have keen
interest.”
—Howard Gardner, Hobbs Professor of Cognition and Education,
Harvard Graduate School of Education,
author of Multiple Intelligences and Changing Minds
“Regardless of your experience with computers, Blown to Bits provides a
uniquely entertaining and informative perspective from the computing indus-
try’s greatest minds.
A fascinating, insightful and entertaining book that helps you understand
computers and their impact on the world in a whole new way.
This is a rare book that explains the impact of the digital explosion in a
way that everyone can understand and, at the same time, challenges experts
to think in new ways.”
—Anne Margulies, Assistant Secretary for Information Technology and
Chief Information Officer of the Commonwealth of Massachusetts
“Blown to Bits is fun and fundamental. What a pleasure to see real teachers
offering such excellent framework for students in a digital age to explore and
understand their digital environment, code and law, starting with the insight
of Claude Shannon. I look forward to you teaching in an open online school.”
—Professor Charles Nesson, Harvard Law School,
Founder, Berkman Center for Internet and Society
“To many of us, computers and the Internet are magic. We make stuff, send
stuff, receive stuff, and buy stuff. It’s all pointing, clicking, copying, and
pasting. But it’s all mysterious. This book explains in clear and comprehen-
sive terms how all this gear on my desk works and why we should pay close
attention to these revolutionary changes in our lives. It’s a brilliant and nec-
essary work for consumers, citizens, and students of all ages.”
—Siva Vaidhyanathan, cultural historian and media scholar
at the University of Virginia and author of Copyrights and Copywrongs:
The Rise of Intellectual Property and How it Threatens Creativity
00_0137135599_FM.qxd 5/7/08 1:00 PM Page i
“The world has turned into the proverbial elephant and we the blind men. The
old and the young among us risk being controlled by, rather than in control
of, events and technologies. Blown to Bits is a remarkable and essential
Rosetta Stone for beginning to figure out how all of the pieces of the new
world we have just begun to enter—law, technology, culture, information—are
going to fit together. Will life explode with new possibilities, or contract
under pressure of new horrors? The precipice is both exhilarating and fright-
ening. Hal Abelson, Ken Ledeen, and Harry Lewis, together, have ably man-
aged to describe the elephant. Readers of this compact book describing the
beginning stages of a vast human adventure will be one jump ahead, for they
will have a framework on which to hang new pieces that will continue to
appear with remarkable speed. To say that this is a ‘must read’ sounds trite,
but, this time, it’s absolutely true.”
—Harvey Silverglate, criminal defense and civil liberties lawyer and writer
00_0137135599_FM.qxd 5/7/08 1:00 PM Page ii
Blown to Bits
Your Life, Liberty,
and Happiness After
the Digital Explosion
Hal Abelson
Ken Ledeen
Harry Lewis
Upper Saddle River, NJ • Boston • Indianapolis • San Francisco
New York • Toronto • Montreal • London • Munich • Paris • Madrid
Cape Town • Sydney • Tokyo • Singapore • Mexico City
00_0137135599_FM.qxd 5/7/08 1:00 PM Page iii
Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in this book, and the publisher was
aware of a trademark claim, the designations have been printed with initial capital letters or in
all capitals.
The authors and publisher have taken care in the preparation of this book, but make no
expressed or implied warranty of any kind and assume no responsibility for errors or omissions.
No liability is assumed for incidental or consequential damages in connection with or arising
out of the use of the information or programs contained herein.
The publisher offers excellent discounts on this book when ordered in quantity for bulk
purchases or special sales, which may include electronic versions and/or custom covers and
content particular to your business, training goals, marketing focus, and branding interests. For
more information, please contact:
U.S. Corporate and Government Sales
(800) 382-3419
corpsales@pearsontechgroup.com
For sales outside the United States, please contact:
International Sales
international@pearson.com
Visit us on the Web: www.informit.com/aw
Library of Congress Cataloging-in-Publication Data:
Abelson, Harold.
Blown to bits : your life, liberty, and happiness after the digital explosion / Hal Abelson,
Ken Ledeen, Harry Lewis.
p. cm.
ISBN 0-13-713559-9 (hardback : alk. paper) 1. Computers and civilization. 2. Information
technology—Technological innovations. 3. Digital media. I. Ledeen, Ken, 1946- II. Lewis,
Harry R. III. Title.
QA76.9.C66A245 2008
303.48’33—dc22
2008005910
Copyright © 2008 Hal Abelson, Ken Ledeen, and Harry Lewis
For information regarding permissions, write to:
Pearson Education, Inc.
Rights and Contracts Department
501 Boylston Street, Suite 900
Boston, MA 02116
Fax (617) 671 3447
00_0137135599_FM.qxd 5/7/08 1:00 PM Page iv
This work is licensed under the Creative Commons Attribution-Noncommercial-Share Alike
3.0 United States License. To view a copy of this license visit
http://creativecommons.org/licenses/by-nc-sa/3.0/us/ or send a letter to Creative Commons
171 Second Street, Suite 300, San Francisco, California, 94105, USA.
ISBN-13: 978-0-13-713559-2
ISBN-10: 0-13-713559-9
Text printed in the United States on recycled paper at RR Donnelley in Crawfordsville, Indiana.
Third printing December 2008
This Book Is Safari Enabled
The Safari® Enabled icon on the cover of your favorite technology book means the book is
available through Safari Bookshelf. When you buy this book, you get free access to the online
edition for 45 days.
Safari Bookshelf is an electronic reference library that lets you easily search thousands of
technical books, find code samples, download chapters, and access technical information
whenever and wherever you need it.
To gain 45-day Safari Enabled access to this book:
• Go to http://www.informit.com/onlineedition
• Complete the brief registration form
• Enter the coupon code 9SD6-IQLD-ZDNI-AGEC-AG6L
If you have difficulty registering on Safari Bookshelf or accessing the online edition, please
e-mail customer-service@safaribooksonline.com.
Editor in Chief
Mark Taub
Acquisitions Editor
Greg Doench
Development Editor
Michael Thurston
Managing Editor
Gina Kanouse
Senior Project Editor
Kristy Hart
Copy Editor
Water Crest Publishing, Inc.
Indexer
Erika Millen
Proofreader
Williams Woods Publishing Services
Publishing Coordinator
Michelle Housley
Interior Designer and Composition
Nonie Ratcliff
Cover Designer
Chuti Prasertsith
00_0137135599_FM.qxd 11/21/08 10:32 AM Page v
00_0137135599_FM.qxd 5/7/08 1:00 PM Page vi
To our children, Amanda, Jennifer, Joshua, Elaheh, Annie,
and Elizabeth, who will see the world changed
yet again in ways we cannot imagine.
00_0137135599_FM.qxd 5/7/08 1:00 PM Page vii
00_0137135599_FM.qxd 5/7/08 1:00 PM Page viii
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Chapter 1 Digital Explosion
Why Is It Happening, and What Is at Stake? . . . . . . . . . 1
The Explosion of Bits, and Everything Else . . . . . . . . . . 2
The Koans of Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Good and Ill, Promise and Peril . . . . . . . . . . . . . . . . . . 13
Chapter 2 Naked in the Sunlight
Privacy Lost, Privacy Abandoned . . . . . . . . . . . . . . . . . . 19
1984 Is Here, and We Like It . . . . . . . . . . . . . . . . . . . . 19
Footprints and Fingerprints . . . . . . . . . . . . . . . . . . . . . 22
Why We Lost Our Privacy, or Gave It Away . . . . . . . . . 36
Little Brother Is Watching . . . . . . . . . . . . . . . . . . . . . . 42
Big Brother, Abroad and in the U.S. . . . . . . . . . . . . . . 48
Technology Change and Lifestyle Change . . . . . . . . . . 55
Beyond Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Chapter 3 Ghosts in the Machine
Secrets and Surprises of Electronic Documents . . . . . . 73
What You See Is Not What the Computer Knows . . . . 73
Representation, Reality, and Illusion . . . . . . . . . . . . . . 80
Hiding Information in Images . . . . . . . . . . . . . . . . . . 94
The Scary Secrets of Old Disks . . . . . . . . . . . . . . . . . . 99
00_0137135599_FM.qxd 5/7/08 1:00 PM Page ix
Chapter 4 Needles in the Haystack
Google and Other Brokers in the Bits Bazaar . . . . . . . 109
Found After Seventy Years . . . . . . . . . . . . . . . . . . . . 109
The Library and the Bazaar . . . . . . . . . . . . . . . . . . . . 110
The Fall of Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . 117
It Matters How It Works . . . . . . . . . . . . . . . . . . . . . . 120
Who Pays, and for What? . . . . . . . . . . . . . . . . . . . . . 138
Search Is Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
You Searched for WHAT? Tracking Searches . . . . . . . 156
Regulating or Replacing the Brokers . . . . . . . . . . . . . 158
Chapter 5 Secret Bits
How Codes Became Unbreakable . . . . . . . . . . . . . . . . 161
Encryption in the Hands of Terrorists, and
Everyone Else . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Historical Cryptography . . . . . . . . . . . . . . . . . . . . . . 165
Lessons for the Internet Age . . . . . . . . . . . . . . . . . . . 174
Secrecy Changes Forever . . . . . . . . . . . . . . . . . . . . . 178
Cryptography for Everyone . . . . . . . . . . . . . . . . . . . 187
Cryptography Unsettled . . . . . . . . . . . . . . . . . . . . . . 191
Chapter 6 Balance Toppled
Who Owns the Bits? . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Automated Crimes—Automated Justice . . . . . . . . . . . 195
NET Act Makes Sharing a Crime . . . . . . . . . . . . . . . . 199
The Peer-to-Peer Upheaval . . . . . . . . . . . . . . . . . . . . 201
Sharing Goes Decentralized . . . . . . . . . . . . . . . . . . . 204
Authorized Use Only . . . . . . . . . . . . . . . . . . . . . . . . 209
Forbidden Technology . . . . . . . . . . . . . . . . . . . . . . . 213
Copyright Koyaanisqatsi: Life Out of Balance . . . . . . 219
The Limits of Property . . . . . . . . . . . . . . . . . . . . . . . 225
Chapter 7 You Can’t Say That on the Internet
Guarding the Frontiers of Digital Expression . . . . . . . 229
Do You Know Where Your Child Is on the
Web Tonight? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
X BLOWN TO BITS
00_0137135599_FM.qxd 5/7/08 1:00 PM Page x
CONTENTS XI
Metaphors for Something Unlike Anything Else . . . . 231
Publisher or Distributor? . . . . . . . . . . . . . . . . . . . . . 234
Neither Liberty nor Security . . . . . . . . . . . . . . . . . . . 235
The Nastiest Place on Earth . . . . . . . . . . . . . . . . . . . 237
The Most Participatory Form of Mass Speech . . . . . . 239
Protecting Good Samaritans—and a Few Bad Ones . . 242
Laws of Unintended Consequences . . . . . . . . . . . . . . 245
Can the Internet Be Like a Magazine Store? . . . . . . . 247
Let Your Fingers Do the Stalking . . . . . . . . . . . . . . . 249
Like an Annoying Telephone Call? . . . . . . . . . . . . . . 251
Digital Protection, Digital Censorship—and Self-
Censorship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
Chapter 8 Bits in the Air
Old Metaphors, New Technologies, and
Free Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Censoring the President . . . . . . . . . . . . . . . . . . . . . . . 259
How Broadcasting Became Regulated . . . . . . . . . . . . 260
The Path to Spectrum Deregulation . . . . . . . . . . . . . 273
What Does the Future Hold for Radio? . . . . . . . . . . . 288
Conclusion
After the Explosion . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
Bits Lighting Up the World . . . . . . . . . . . . . . . . . . . . 295
A Few Bits in Conclusion . . . . . . . . . . . . . . . . . . . . . 299
Appendix
The Internet as System and Spirit . . . . . . . . . . . . . . . . 301
The Internet as a Communication System . . . . . . . . . 301
The Internet Spirit . . . . . . . . . . . . . . . . . . . . . . . . . . 309
Endnotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
00_0137135599_FM.qxd 5/7/08 1:00 PM Page xi
00_0137135599_FM.qxd 5/7/08 1:00 PM Page xii
Preface
For thousands of years, people have been saying that the world is changing
and will never again be the same. Yet the profound changes happening today
are different, because they result from a specific technological development.
It is now possible, in principle, to remember everything that anyone says,
writes, sings, draws, or photographs. Everything. If digitized, the world has
enough disks and memory chips to save it all, for as long as civilization can
keep producing computers and disk drives. Global computer networks can
make it available to everywhere in the world, almost instantly. And comput-
ers are powerful enough to extract meaning from all that information, to find
patterns and make connections in the blink of an eye.
In centuries gone by, others may have dreamed these things could happen,
in utopian fantasies or in nightmares. But now they are happening. We are
living in the middle of the changes, and we can see the changes happening.
But we don’t know how things will turn out.
Right now, governments and the other institutions of human societies are
deciding how to use the new possibilities. Each of us is participating as we
make decisions for ourselves, for our families, and for people we work with.
Everyone needs to know how their world and the world around them is
changing as a result of this explosion of digital information. Everyone should
know how the decisions will affect their lives, and the lives of their children
and grandchildren and everyone who comes after.
That is why we wrote this book.
Each of us has been in the computing field for more than 40 years. The
book is the product of a lifetime of observing and participating in the changes
it has brought. Each of us has been both a teacher and a learner in the field.
This book emerged from a general education course we have taught at
Harvard, but it is not a textbook. We wrote this book to share what wisdom
we have with as many people as we can reach. We try to paint a big picture,
with dozens of illuminating anecdotes as the brushstrokes. We aim to enter-
tain you at the same time as we provoke your thinking.
You can read the chapters in any order. The Appendix is a self-contained
explanation of how the Internet works. You don’t need a computer to read
this book. But we would suggest that you use one, connected to the Internet,
00_0137135599_FM.qxd 5/7/08 1:00 PM Page xiii
to explore any topic that strikes your curiosity or excites your interest. Don’t
be afraid to type some of the things we mention into your favorite search
engine and see what comes up. We mention many web sites, and give their
complete descriptors, such as bitsbook.com, which happens to be the site for
this book itself. But most of the time, you should be able to find things more
quickly by searching for them. There are many valuable public information
sources and public interest groups where you can learn more, and can par-
ticipate in the ongoing global conversation about the issues we discuss.
We offer some strong opinions in this book. If you would like to react to
what we say, please visit the book’s web site for an ongoing discussion.
Our picture of the changes brought by the digital explosion is drawn
largely with reference to the United States and its laws and culture, but the
issues we raise are critical for citizens of all free societies, and for all people
who hope their societies will become freer.
Cambridge, Massachusetts
January 2008
XIV BLOWN TO BITS
00_0137135599_FM.qxd 5/7/08 1:00 PM Page xiv
Acknowledgments
While we take full responsibility for any errors in the book, we owe thanks
to a great many others for any enlightenment it may provide. Specifically, we
are grateful to the following individuals, who commented on parts of the
book while it was in draft or provided other valuable assistance: Lynn
Abelson, Meg Ausman, Scott Bradner, Art Brodsky, Mike Carroll, Marcus
Cohn, Frank Cornelius, Alex Curtis, Natasha Devroye, David Fahrenthold,
Robert Faris, Johann-Christoph Freytag, Wendy Gordon, Tom Hemnes, Brian
LaMacchia, Marshall Lerner, Anne Lewis, Elizabeth Lewis, Jessica Litman,
Lory Lybeck, Fred vonLohmann, Marlyn McGrath, Michael Marcus, Michael
Mitzenmacher, Steve Papa, Jonathan Pearce, Bradley Pell, Les Perelman,
Pamela Samuelson, Jeff Schiller, Katie Sluder, Gigi Sohn, Debora Spar,
René Stein, Alex Tibbetts, Susannah Tobin, Salil Vadhan, David Warsh,
Danny Weitzner, and Matt Welsh.
00_0137135599_FM.qxd 5/7/08 1:00 PM Page xv
00_0137135599_FM.qxd 5/7/08 1:00 PM Page xvi
About the Authors
Hal Abelson is Class of 1922 Professor of Computer Science and Engineering
at MIT, and an IEEE Fellow. He has helped drive innovative educational tech-
nology initiatives such MIT OpenCourseWare, cofounded Creative Commons
and Public Knowledge, and was founding director of the Free Software
Foundation. Ken Ledeen, Chairman/CEO of Nevo Technologies, has served on
the boards of numerous technology companies. Harry Lewis, former Dean of
Harvard College, is Gordon McKay Professor of Computer Science at Harvard
and Fellow of the Berkman Center for Internet and Society. He is author of
Excellence Without a Soul: Does Liberal Education Have a Future? Together,
the authors teach Quantitative Reasoning 48, an innovative Harvard course
on information for non-technical, non-mathematically oriented students.
00_0137135599_FM.qxd 7/31/08 12:16 PM Page xvii
00_0137135599_FM.qxd 5/7/08 1:00 PM Page xviii
CHAPTER 1
Digital Explosion
Why Is It Happening, and
What Is at Stake?
On September 19, 2007, while driving alone near Seattle on her way to work,
Tanya Rider went off the road and crashed into a ravine.* For eight days, she
was trapped upside down in the wreckage of her car. Severely dehydrated and
suffering from injuries to her leg and shoulder, she nearly died of kidney fail-
ure. Fortunately, rescuers ultimately found her. She spent months recuperat-
ing in a medical facility. Happily, she was able to go home for Christmas.
Tanya’s story is not just about a woman, an accident, and a rescue. It is a
story about bits—the zeroes and ones that make up all our cell phone conver-
sations, bank records, and everything else that gets communicated or stored
using modern electronics.
Tanya was found because cell phone companies keep records of cell phone
locations. When you carry your cell phone, it regularly sends out a digital
“ping,” a few bits conveying a “Here I am!” message. Your phone keeps “ping-
ing” as long as it remains turned on. Nearby cell phone towers pick up the
pings and send them on to your cellular service provider. Your cell phone
company uses the pings to direct your incoming calls to the right cell phone
towers. Tanya’s cell phone company, Verizon, still had a record of the last
location of her cell phone, even after the phone had gone dead. That is how
the police found her.
So why did it take more than a week?
If a woman disappears, her husband can’t just make the police find her by
tracing her cell phone records. She has a privacy right, and maybe she has
good reason to leave town without telling her husband where she is going. In
1
* Citations of facts and sources appear at the end of the book. A page number and a phrase
identify the passage.
01_0137135599_ch01.qxd 4/16/08 1:19 PM Page 1
Tanya’s case, her bank account showed some activity (more bits!) after her
disappearance, and the police could not classify her as a “missing person.” In
fact, that activity was by her husband. Through some misunderstanding, the
police thought he did not have access to the account. Only when the police
suspected Tanya’s husband of involvement in her disappearance did they
have legal access to the cell phone records. Had they continued to act on the
true presumption that he was blameless, Tanya might never have been found.
New technologies interacted in an odd way with evolving standards of pri-
vacy, telecommunications, and criminal law. The explosive combination
almost cost Tanya Rider her life. Her story is dramatic, but every day we
encounter unexpected consequences of data flows that could not have hap-
pened a few years ago.
When you have finished reading this book, you should see the world in a
different way. You should hear a story from a friend or on a newscast and say
to yourself, “that’s really a bits story,” even if no one mentions anything dig-
ital. The movements of physical objects and the actions of flesh and blood
human beings are only the surface. To understand what is really going on, you
have to see the virtual world, the eerie flow of bits steering the events of life.
This book is your guide to this new world.
The Explosion of Bits, and Everything Else
The world changed very suddenly. Almost everything is stored in a computer
somewhere. Court records, grocery purchases, precious family photos, point-
less radio programs…. Computers contain a lot of stuff that isn’t useful today
but somebody thinks might someday come in handy. It is all being reduced
to zeroes and ones—“bits.” The bits are stashed on disks of home computers
and in the data centers of big corporations and government agencies. The
disks can hold so many bits that there is no need to pick and choose what
gets remembered.
So much digital information, misinformation, data, and garbage is being
squirreled away that most of it will be seen only by computers, never by
human eyes. And computers are getting better and better at extracting mean-
ing from all those bits—finding patterns that sometimes solve crimes and
make useful suggestions, and sometimes reveal things about us we did not
expect others to know.
The March 2008 resignation of Eliot Spitzer as Governor of New York is a
bits story as well as a prostitution story. Under anti-money laundering (AML)
rules, banks must report transactions of more than $10,000 to federal regula-
tors. None of Spitzer’s alleged payments reached that threshold, but his
2 BLOWN TO BITS
01_0137135599_ch01.qxd 4/16/08 1:19 PM Page 2
bank’s computer found that transfers of smaller sums formed a suspicious
pattern. The AML rules exist to fight terrorism and organized crime. But while
the computer was monitoring small banking transactions in search of
big-time crimes, it exposed a simple payment for services rendered that
brought down the Governor.
Once something is on a computer, it can replicate and move around the
world in a heartbeat. Making a million perfect copies takes but an instant—
copies of things we want everyone in the world to see, and also copies of
things that weren’t meant to be copied at all.
The digital explosion is changing the world as much as printing once did—
and some of the changes are catching us unaware, blowing to bits our
assumptions about the way the world works.
When we observe the digital explosion at all, it can seem benign, amus-
ing, or even utopian. Instead of sending prints through the mail to Grandma,
we put pictures of our children on a photo album web site such as Flickr. Then
not only can Grandma see them—so can Grandma’s friends and anyone else.
So what? They are cute and harmless. But suppose a tourist takes a vacation
snapshot and you just happen to appear in the background, at a restaurant
where no one knew you were dining. If the tourist uploads his photo, the
whole world could know where you were, and when you were there.
Data leaks. Credit card records are supposed to stay locked up in a data
warehouse, but escape into the hands of identity thieves. And we sometimes
give information away just because we get something back for doing so. A
company will give you free phone calls to anywhere in the world—if you
don’t mind watching ads for the products its computers hear you talking
about.
And those are merely things that are happening today. The explosion, and
the social disruption it will create, have barely begun.
We already live in a world in which there is enough memory just in digi-
tal cameras to store every word of every book in the Library of Congress a
hundred times over. So much email is being sent that it could transmit the
full text of the Library of Congress in ten minutes. Digitized pictures and
sounds take more space than words, so emailing all the images, movies, and
sounds might take a year—but that is just today. The explosive growth is still
happening. Every year we can store more information, move it more quickly,
and do far more ingenious things with it than we could the year before.
So much disk storage is being produced every year that it could be used to
record a page of information, every minute or two, about you and every other
human being on earth. A remark made long ago can come back to haunt a
political candidate, and a letter jotted quickly can be a key discovery for a
CHAPTER 1 DIGITAL EXPLOSION 3
01_0137135599_ch01.qxd 4/16/08 1:19 PM Page 3
biographer. Imagine what it would mean to record every word every human
being speaks or writes in a lifetime. The technological barrier to that has
already been removed: There is enough storage to remember it all. Should
any social barrier stand in the way?
Sometimes things seem to work both better and worse than they used to.
A “public record” is now very public—before you get hired in Nashville,
Tennessee, your employer can figure out if you were caught ten years ago
taking an illegal left turn in Lubbock, Texas. The old notion of a “sealed court
record” is mostly a fantasy in a world where any tidbit of information is
duplicated, cataloged, and moved around endlessly. With hundreds of TV and
radio stations and millions of web sites, Americans love the variety of news
sources, but are still adjusting uncomfortably to the displacement of more
authoritative sources. In China, the situation is reversed: The technology cre-
ates greater government control of the information its citizens receive, and
better tools for monitoring their behavior.
This book is about how the digital explosion is changing everything. It
explains the technology itself—why it creates so many surprises and why
things often don’t work the way we expect them to. It is also about things the
information explosion is destroying: old assumptions about our privacy,
about our identity, and about who is in control of our lives. It’s about how
we got this way, what we are losing, and what remains that society still has
a chance to put right. The digital explosion is creating both opportunities and
risks. Many of both will be gone in a decade, settled one way or another.
Governments, corporations, and other authorities are taking advantage of the
chaos, and most of us don’t even see it happening. Yet we all have a stake in
the outcome. Beyond the science, the history, the law, and the politics, this
book is a wake-up call. The forces shaping your future are digital, and you
need to understand them.
The Koans of Bits
Bits behave strangely. They travel almost instantaneously, and they take
almost no space to store. We have to use physical metaphors to make them
understandable. We liken them to dynamite exploding or water flowing. We
even use social metaphors for bits. We talk about two computers agreeing on
some bits, and about people using burglary tools to steal bits. Getting the right
metaphor is important, but so is knowing the limitations of our metaphors. An
imperfect metaphor can mislead as much as an apt metaphor can illuminate.
4 BLOWN TO BITS
01_0137135599_ch01.qxd 4/16/08 1:19 PM Page 4
We offer seven truths about bits. We call them “koans” because they are
paradoxes, like the Zen verbal puzzles that provoke meditation and enlight-
enment. These koans are oversimplifications and over-generalizations. They
describe a world that is developing but hasn’t yet fully emerged. But even
today they are truer than we often realize. These themes will echo through
our tales of the digital explosion.
Koan 1: It’s All Just Bits
Your computer successfully creates the illusion that it contains photographs,
letters, songs, and movies. All it really contains is bits, lots of them, patterned
in ways you can’t see. Your computer was designed to store just bits—all the
files and folders and different kinds of data are illusions created by computer
programmers. When you send an email containing a photograph, the com-
puters that handle your message as it flows through the Internet have no idea
that what they are handling is part text and part graphic. Telephone calls are
also just bits, and that has helped create competition—traditional phone com-
panies, cell phone companies, cable TV companies, and Voice over IP (VoIP)
service providers can just shuffle bits around to each other to complete calls.
The Internet was designed to handle just bits, not emails or attachments,
which are inventions of software engineers. We couldn’t live without those
more intuitive concepts, but they are artifices. Underneath, it’s all just bits.
This koan is more consequential than you might think. Consider the story
of Naral Pro-Choice America and Verizon Wireless. Naral wanted to form a
CHAPTER 1 DIGITAL EXPLOSION 5
CLAUDE SHANNON
Claude Shannon (1916–2001) is the undis-
puted founding figure of information and
communication theory. While working at Bell
Telephone Laboratories after the Second
World War, he wrote the seminal paper, “A
mathematical theory of communication,”
which foreshadowed much of the subsequent
development of digital technologies.
Published in 1948, this paper gave birth to
the now-universal realization that the bit is
the natural unit of information, and to the
use of the term.
Alcatel-Lucent, http:www.bell-labs.com/news/2001/february/26/shannon2_lg.jpeg.
01_0137135599_ch01.qxd 4/16/08 1:19 PM Page 5
text messaging group to send alerts to its members. Verizon decided not to
allow it, citing the “controversial or unsavory” things the messages might
contain. Text message alert groups for political candidates it would allow, but
not for political causes it deemed controversial. Had Naral simply wanted
telephone service or an 800 number, Verizon would have had no choice.
Telephone companies were long ago declared “common carriers.” Like rail-
roads, phone companies are legally prohibited from picking and choosing
customers from among those who want their services. In the bits world, there
is no difference between a text message and a wireless phone call. It’s all just
bits, traveling through the air by radio waves. But the law hasn’t caught up
to the technology. It doesn’t treat all bits the same, and the common carriage
rules for voice bits don’t apply to text message bits.
Verizon backed down in the case
of Naral, but not on the principle. A
phone company can do whatever it
thinks will maximize its profits in
deciding whose messages to distrib-
ute. Yet no sensible engineering dis-
tinction can be drawn between text
messages, phone calls, and any other
bits traveling through the digital air-
waves.
Koan 2: Perfection
Is Normal
To err is human. When books were
laboriously transcribed by hand, in
ancient scriptoria and medieval
monasteries, errors crept in with
every copy. Computers and networks
work differently. Every copy is per-
fect. If you email a photograph to a
friend, the friend won’t receive a
fuzzier version than the original. The
copy will be identical, down to the
level of details too small for the eye
to see.
Computers do fail, of course.
Networks break down too. If the
6 BLOWN TO BITS
EXCLUSIVE AND RIVALROUS
Economists would say that bits,
unless controlled somehow, tend to
be non-exclusive (once a few peo-
ple have them, it is hard to keep
them from others) and non-
rivalrous (when someone gets them
from me, I don’t have any less). In a
letter he wrote about the nature of
ideas, Thomas Jefferson eloquently
stated both properties. If nature
has made any one thing less sus-
ceptible than all others of exclu-
sive property, it is the action of the
thinking power called an idea,
which an individual may exclu-
sively possess as long as he keeps
it to himself; but the moment it is
divulged, it forces itself into the
possession of every one, and the
receiver cannot dispossess himself
of it. Its peculiar character, too, is
that no one possesses the less,
because every other possesses the
whole of it.
01_0137135599_ch01.qxd 4/16/08 1:19 PM Page 6
power goes out, nothing works at all. So the statement that copies are nor-
mally perfect is only relatively true. Digital copies are perfect only to the
extent that they can be communicated at all. And yes, it is possible in theory
that a single bit of a big message will arrive incorrectly. But networks don’t
just pass bits from one place to another. They check to see if the bits seem to
have been damaged in transit, and correct them or retransmit them if they
seem incorrect. As a result of these error detection and correction mecha-
nisms, the odds of an actual error—a character being wrong in an email, for
example—are so low that we would be wiser to worry instead about a meteor
hitting our computer, improbable though precision meteor strikes may be.
The phenomenon of perfect copies has drastically changed the law, a story
told in Chapter 6, “Balance Toppled.” In the days when music was distributed
on audio tape, teenagers were not prosecuted for making copies of songs,
because the copies weren’t as good as the originals, and copies of copies
would be even worse. The reason that thousands of people are today receiv-
ing threats from the music and movie industries is that their copies are per-
fect—not just as good as the original, but identical to the original, so that
even the notion of “original” is meaningless. The dislocations caused by file
sharing are not over yet. The buzzword of the day is “intellectual property.”
But bits are an odd kind of property. Once I release them, everybody has
them. And if I give you my bits, I don’t have any fewer.
Koan 3: There Is Want in the Midst of Plenty
Vast as world-wide data storage is today, five years from now it will be ten
times as large. Yet the information explosion means, paradoxically, the loss
of information that is not online. One of us recently saw a new doctor at a
clinic he had been using for decades. She showed him dense charts of his
blood chemistry, data transferred from his home medical device to the clinic’s
computer—more data than any specialist could have had at her disposal five
years ago. The doctor then asked whether he had ever had a stress test and
what the test had shown. Those records should be all there, the patient
explained, in the medical file. But it was in the paper file, to which the doc-
tor did not have access. It wasn’t in the computer’s memory, and the patient’s
memory was being used as a poor substitute. The old data might as well not
have existed at all, since it wasn’t digital.
Even information that exists in digital form is useless if there are no
devices to read it. The rapid progress of storage engineering has meant that
data stored on obsolete devices effectively ceases to exist. In Chapter 3,
“Ghosts in the Machine,” we shall see how a twentieth-century update of the
CHAPTER 1 DIGITAL EXPLOSION 7
01_0137135599_ch01.qxd 4/16/08 1:19 PM Page 7
eleventh-century British Domesday Book was useless by the time it was only
a sixtieth the age of the original.
Or consider search, the subject of Chapter 4, “Needles in the Haystack.” At
first, search engines such as Google and Yahoo! were interesting conven-
iences, which a few people used for special purposes. The growth of the World
Wide Web has put so much information online that search engines are for
many people the first place to look for something, before they look in books
or ask friends. In the process, appearing prominently in search results has
become a matter of life or death for businesses. We may move on to purchase
from a competitor if we can’t find the site we wanted in the first page or two
of results. We may assume something didn’t happen if we can’t find it quickly
in an online news source. If it can’t be found—and found quickly—it’s just as
though it doesn’t exist at all.
Koan 4: Processing Is Power
The speed of a computer is usually
measured by the number of basic
operations, such as additions, that
can be performed in one second. The
fastest computers available in the
early 1940s could perform about
five operations per second. The
fastest today can perform about a
trillion. Buyers of personal comput-
ers know that a machine that seems
fast today will seem slow in a year
or two.
For at least three decades, the
increase in processor speeds was
exponential. Computers became
twice as fast every couple of years.
These increases were one conse-
quence of “Moore’s Law” (see side-
bar).
Since 2001, processor speed has
not followed Moore’s Law; in fact,
processors have hardly grown faster
at all. But that doesn’t mean that computers won’t continue to get faster. New
chip designs include multiple processors on the same chip so the work can be
split up and performed in parallel. Such design innovations promise to
8 BLOWN TO BITS
MOORE’S LAW
Gordon Moore, founder of Intel
Corporation, observed that the
density of integrated circuits
seemed to double every couple of
years. This observation is referred
to as “Moore’s Law.” Of course, it is
not a natural law, like the law of
gravity. Instead, it is an empirical
observation of the progress of
engineering and a challenge to
engineers to continue their innova-
tion. In 1965, Moore predicted that
this exponential growth would
continue for quite some time. That
it has continued for more than 40
years is one of the great marvels of
engineering. No other effort in his-
tory has sustained anything like
this growth rate.
01_0137135599_ch01.qxd 4/16/08 1:19 PM Page 8
achieve the same effect as continued increases in raw processor speed. And
the same technology improvements that make computers faster also make
them cheaper.
The rapid increase in processing power means that inventions move out of
labs and into consumer goods very quickly. Robot vacuum cleaners and self-
parking vehicles were possible in theory a decade ago, but now they have
become economically feasible. Tasks that today seem to require uniquely
human skills are the subject of research projects in corporate or academic lab-
oratories. Face recognition and voice recognition are poised to bring us new
inventions, such as telephones that know who is calling and surveillance
cameras that don’t need humans to watch them. The power comes not just
from the bits, but from being able to do things with the bits.
Koan 5: More of the Same Can Be a Whole New Thing
Explosive growth is exponential growth—doubling at a steady rate. Imagine
earning 100% annual interest on your savings account—in 10 years, your
money would have increased more than a thousandfold, and in 20 years,
more than a millionfold. A more reasonable interest rate of 5% will hit the
same growth points, just 14 times more slowly. Epidemics initially spread
exponentially, as each infected individual infects several others.
When something grows exponentially, for a long time it may seem not to
be changing at all. If we don’t watch it steadily, it will seem as though some-
thing discontinuous and radical occurred while we weren’t looking.
That is why epidemics at first go unnoticed, no matter how catastrophic
they may be when full-blown. Imagine one sick person infecting two healthy
people, and the next day each of those two infects two others, and the next
day after that each of those four infects two others, and so on. The number
of newly infected each day grows from two to four to eight. In a week, 128
people come down with the disease in a single day, and twice that number
are now sick, but in a population of ten million, no one notices. Even after
two weeks, barely three people in a thousand are sick. But after another week,
40% of the population is sick, and society collapses
Exponential growth is actually smooth and steady; it just takes very little
time to pass from unnoticeable change to highly visible. Exponential growth
of anything can suddenly make the world look utterly different than it had
been. When that threshold is passed, changes that are “just” quantitative can
look qualitative.
Another way of looking at the apparent abruptness of exponential
growth—its explosive force—is to think about how little lead time we have to
respond to it. Our hypothetical epidemic took three weeks to overwhelm the
CHAPTER 1 DIGITAL EXPLOSION 9
01_0137135599_ch01.qxd 4/16/08 1:19 PM Page 9
population. At what point was it only a half as devastating? The answer is
not “a week and a half.” The answer is on the next to last day. Suppose it took
a week to develop and administer a vaccine. Then noticing the epidemic after
a week and a half would have left ample time to prevent the disaster. But that
would have required understanding that there was an epidemic when only
2,000 people out of ten million were infected.
The information story is full of examples of unperceived changes followed
by dislocating explosions. Those with the foresight to notice the explosion
just a little earlier than everyone else can reap huge benefits. Those who move
a little too slowly may be overwhelmed by the time they try to respond. Take
the case of digital photography.
In 1983, Christmas shoppers could buy digital cameras to hook up to their
IBM PC and Apple II home computers. The potential was there for anyone to
see; it was not hidden in secret corporate laboratories. But digital photogra-
phy did not take off. Economically and practically, it couldn’t. Cameras were
too bulky to put in your pocket, and digital memories were too small to hold
many images. Even 14 years later, film photography was still a robust indus-
try. In early 1997, Kodak stock hit a record price, with a 22% increase in
quarterly profit, “fueled by healthy film and paper sales…[and] its motion pic-
ture film business,” according to a news report. The company raised its divi-
dend for the first time in eight years. But by 2007, digital memories had
become huge, digital processors had become fast and compact, and both were
cheap. As a result, cameras had become little computers. The company that
was once synonymous with photography was a shadow of its former self.
Kodak announced that its employee force would be cut to 30,000, barely a
fifth the size it was during the good times of the late 1980s. The move would
cost the company more than $3 billion. Moore’s Law moved faster than
Kodak did.
In the rapidly changing world of bits, it pays to notice even small changes,
and to do something about them.
Koan 6: Nothing Goes Away
2,000,000,000,000,000,000,000.
That is the number of bits that were created and stored away in 2007,
according to one industry estimate. The capacity of disks has followed its own
version of Moore’s Law, doubling every two or three years. For the time being
at least, that makes it possible to save everything though recent projections
suggest that by 2011, we may be producing more bits than we can store.
10 BLOWN TO BITS
01_0137135599_ch01.qxd 4/16/08 1:19 PM Page 10
In financial industries, federal laws now require massive data retention, to
assist in audits and investigations of corruption. In many other businesses,
economic competitiveness drives companies to save everything they collect
and to seek out new data to retain. Wal-Mart stores have tens of millions of
transactions every day, and every one of them is saved—date, time, item,
store, price, who made the purchase, and how—credit, debit, cash, or gift card.
Such data is so valuable to planning the supply chain that stores will pay
money to get more of it from their customers. That is really what supermarket
loyalty cards provide—shoppers are supposed to think that the store is grant-
ing them a discount in appreciation for their steady business, but actually the
store is paying them for information about their buying patterns. We might
better think of a privacy tax—we pay the regular price unless we want to keep
information about our food, alcohol, and pharmaceutical purchases from the
market; to keep our habits to ourselves, we pay extra.
The massive databases challenge our expectations about what will happen
to the data about us. Take something as simple as a stay in a hotel. When you
check in, you are given a keycard, not a mechanical key. Because the key-
cards can be deactivated instantly, there is no longer any great risk associ-
ated with losing your key, as long as you report it missing quickly. On the
other hand, the hotel now has a record, accurate to the second, of every time
you entered your room, used the gym or the business center, or went in the
back door after-hours. The same database could identify every cocktail and
steak you charged to the room, which other rooms you phoned and when, and
the brands of tampons and laxatives you charged at the hotel’s gift shop. This
data might be merged with billions like it, analyzed, and transferred to the
parent company, which owns restaurants and fitness centers as well as hotels.
It might also be lost, or stolen, or subpoenaed in a court case.
The ease of storing information has meant asking for more of it. Birth cer-
tificates used to include just the information about the child’s and parents’
names, birthplaces, and birthdates, plus the parents’ occupations. Now the
electronic birth record includes how much the mother drank and smoked dur-
ing her pregnancy, whether she had genital herpes or a variety of other med-
ical conditions, and both parents’ social security numbers. Opportunities for
research are plentiful, and so are opportunities for mischief and catastrophic
accidental data loss.
And the data will all be kept forever,
unless there are policies to get rid of it. For
the time being at least, the data sticks
around. And because databases are inten-
tionally duplicated—backed up for security,
CHAPTER 1 DIGITAL EXPLOSION 11
The data will all be kept
forever, unless there are
policies to get rid of it.
01_0137135599_ch01.qxd 4/16/08 1:19 PM Page 11
12 BLOWN TO BITS
or shared while pursuing useful analyses—it is far from certain that data can
ever be permanently expunged, even if we wish that to happen. The Internet
consists of millions of interconnected computers; once data gets out, there is
no getting it back. Victims of identity theft experience daily the distress of
having to remove misinformation from the record. It seems never to go away.
Koan 7: Bits Move Faster Than Thought
The Internet existed before there were personal computers. It predates the
fiber optic communication cables that now hold it together. When it started
around 1970, the ARPANET, as it was called, was designed to connect a hand-
ful of university and military computers. No one imagined a network con-
necting tens of millions of computers and shipping information around the
world in the blink of an eye. Along with processing power and storage capac-
ity, networking has experienced its own exponential growth, in number of
computers interconnected and the rate at which data can be shipped over
long distances, from space to earth and from service providers into private
homes.
The Internet has caused drastic shifts in business practice. Customer ser-
vice calls are outsourced to India today not just because labor costs are low
there. Labor costs have always been low in India, but international telephone
calls used to be expensive. Calls about airline reservations and lingerie
returns are answered in India today because it now takes almost no time and
costs almost no money to send to India the bits representing your voice. The
same principle holds for professional services. When you are X-rayed at your
local hospital in Iowa, the radiologist who reads the X-ray may be half a
world away. The digital X-ray moves back and forth across the world faster
than a physical X-ray could be moved between floors of the hospital. When
you place an order at a drive-through station at a fast food restaurant, the
person taking the order may be in another state. She keys the order so it
appears on a computer screen in the kitchen, a few feet from your car, and
you are none the wiser. Such developments are causing massive changes to
the global economy, as industries figure out how to keep their workers in one
place and ship their business as bits.
In the bits world, in which messages flow instantaneously, it sometimes
seems that distance doesn’t matter at all. The consequences can be startling.
One of us, while dean of an American college, witnessed the shock of a father
receiving condolences on his daughter’s death. The story was sad but famil-
iar, except that this version had a startling twist. Father and daughter were
01_0137135599_ch01.qxd 4/16/08 1:19 PM Page 12
both in Massachusetts, but the condolences arrived from half-way around the
world before the father had learned that his daughter had died. News, even
the most intimate news, travels fast in the bits world, once it gets out. In the
fall of 2007, when the government of Myanmar suppressed protests by
Buddhist monks, television stations around the world showed video clips
taken by cell phone, probably changing the posture of the U.S. government.
The Myanmar rebellion also shows the power of information control when
information is just bits. The story dropped off the front page of the news-
papers once the government took total control of the Internet and cell phone
towers.
The instantaneous communication of massive amounts of information has
created the misimpression that there is a place called “Cyberspace,” a land
without frontiers where all the world’s people can be interconnected as
though they were residents of the same small town. That concept has been
decisively refuted by the actions of the world’s courts. National and state bor-
ders still count, and count a lot. If a book is bought online in England, the
publisher and author are subject to British libel laws rather than those of the
homeland of the author or publisher. Under British law, defendants have to
prove their innocence; in the U.S., plaintiffs have to prove the guilt of the
defendants. An ugly downside to the explosion of digital information and its
movement around the world is that information may become less available
even where it would be legally protected (we return to this subject in Chapter
7, “You Can’t Say That on the Internet”). Publishers fear “libel tourism”—
lawsuits in countries with weak protection of free speech, designed to intim-
idate authors in more open societies. It may prove simpler to publish only a
single version of a work for sale everywhere, an edition omitting information
that might somewhere excite a lawsuit.
Good and Ill, Promise and Peril
The digital explosion has thrown a lot of things up for grabs and we all have
a stake in who does the grabbing. The way the technology is offered to us, the
way we use it, and the consequences of the vast dissemination of digital infor-
mation are matters not in the hands of technology experts alone. Governments
and corporations and universities and other social institutions have a say. And
ordinary citizens, to whom these institutions are accountable, can influence
their decisions. Important choices are made every year, in government offices
CHAPTER 1 DIGITAL EXPLOSION 13
01_0137135599_ch01.qxd 4/16/08 1:19 PM Page 13
and legislatures, in town meetings and police stations, in the corporate offices
of banks and insurance companies, in the purchasing departments of chain
stores and pharmacies. We all can help raise the level of discourse and under-
standing. We can all help ensure that technical decisions are taken in a con-
text of ethical standards.
We offer two basic morals. The first is that information technology is inher-
ently neither good nor bad—it can be used for good or ill, to free us or to
shackle us. Second, new technology brings social change, and change comes
with both risks and opportunities. All of us, and all of our public agencies and
private institutions, have a say in whether technology will be used for good or
ill and whether we will fall prey to its risks or prosper from the opportunities
it creates.
Technology Is Neither Good nor Bad
Any technology can be used for good or ill. Nuclear reactions create electric
power and weapons of mass destruction. The same encryption technology that
makes it possible for you to email your friends with confidence that no eaves-
dropper will be able to decipher your message also makes it possible for ter-
rorists to plan their attacks undiscovered. The same Internet technology that
facilitates the widespread distribution of educational works to impoverished
students in remote locations also enables massive copyright infringement. The
photomanipulation tools that enhance your snapshots are used by child
pornographers to escape prosecution.
The key to managing the ethical and moral consequences of technology
while nourishing economic growth is to regulate the use of technology with-
out banning or restricting its creation.
It is a marvel that anyone with a smart cell phone can use a search engine
to get answers to obscure questions almost anywhere. Society is rapidly
being freed from the old limitations of geography and status in accessing
information.
The same technologies can be used to monitor individuals, to track their
behaviors, and to control what information they receive. Search engines need
not return unbiased results. Many users of web browsers do not realize that
the sites they visit may archive their actions. Technologically, there could be
a record of exactly what you have been accessing and when, as you browse a
library or bookstore catalog, a site selling pharmaceuticals, or a service offer-
ing advice on contraception or drug overdose. There are vast opportunities to
14 BLOWN TO BITS
01_0137135599_ch01.qxd 4/16/08 1:19 PM Page 14
CHAPTER 1 DIGITAL EXPLOSION 15
use this information for invasive but
relatively benign purposes, such as
marketing, and also for more ques-
tionable purposes, such as blacklist-
ing and blackmail. Few regulations
mandate disclosure that the informa-
tion is being collected, or restrict the
use to which the data can be put.
Recent federal laws, such as the USA
PATRIOT Act, give government
agencies sweeping authority to sift
through mostly innocent data look-
ing for signs of “suspicious activity”
by potential terrorists—and to notice
lesser transgressions, such as
Governor Spitzer’s, in the process.
Although the World Wide Web now
reaches into millions of households,
the rules and regulations governing
it are not much better than those of
a lawless frontier town of the old
West.
New Technologies Bring Both Risks and Opportunities
The same large disk drives that enable anyone with a home computer to ana-
lyze millions of baseball statistics also allow anyone with access to confiden-
tial information to jeopardize its security. Access to aerial maps via the
Internet makes it possible for criminals to plan burglaries of upscale houses,
but technologically sophisticated police know that records of such queries can
also be used to solve crimes.
Even the most un-electronic livelihoods are changing because of instant
worldwide information flows. There are no more pool hustlers today—jour-
neymen wizards of the cue, who could turn up in pool halls posing as out-
of-town bumpkins just looking to bet on a friendly game, and walk away
with big winnings. Now when any newcomer comes to town and cleans up,
his name and face are on AZBilliards.com instantly for pool players every-
where to see.
BLACKLISTS AND WHITELISTS
In the bits world, providers of ser-
vices can create blacklists or
whitelists. No one on a blacklist
can use the service, but everyone
else can. For example, an auction-
eer might put people on a blacklist
if they did not pay for their pur-
chases. But service providers who
have access to other information
about visitors to their web sites
might use undisclosed and far more
sweeping criteria for blacklisting.
A whitelist is a list of parties to
whom services are available, with
everyone else excluded. For exam-
ple, a newspaper may whitelist its
home delivery subscribers for
access to its online content, allow-
ing others onto the whitelist only
after they have paid.
01_0137135599_ch01.qxd 4/16/08 1:19 PM Page 15
Social networking sites such as facebook.com, myspace.com, and
match.com have made their founders quite wealthy. They have also given
birth to many thousands of new friendships, marriages, and other ventures.
But those pretending to be your online friends may not be as they seem.
Social networking has made it easier for predators to take advantage of the
naïve, the lonely, the elderly, and the young.
In 2006, a 13-year-old girl, Megan Meier of Dardenne Prairie, Missouri,
made friends online with a 16-year-old boy named “Josh.” When “Josh”
turned against her, writing “You are a bad person and everybody hates you….
The world would be a better place without you,” Megan committed suicide. Yet
Josh did not exist. Josh was a MySpace creation—but of whom? An early
police report stated that the mother of another girl in the neighborhood
acknowledged “instigating” and monitoring the account. That woman’s lawyer
later blamed someone who worked for his client. Whoever may have sent the
final message to Megan, prosecutors are having a hard time identifying any
law that might have been broken. “I can start MySpace on every single one of
you and spread rumors about every single one of you,” said Megan’s mother,
“and what’s going to happen to me? Nothing.”
Along with its dazzling riches and vast horizons, the Internet has created
new manifestations of human evil—some of which, including the cyber-
harassment Megan Meier suffered, may not be criminal under existing law. In
a nation deeply committed to free expression as a legal right, which Internet
evils should be crimes, and which are just wrong?
Vast data networks have made it possible to move work to where the
people are, not people to the work. The results are enormous business oppor-
tunities for entrepreneurs who take advantage of these technologies and new
enterprises around the globe, and also the other side of the coin: jobs lost to
outsourcing.
The difference every one of us can make, to our workplace or to another
institution, can be to ask a question at the right time about the risks of
some new technological innovation—or to point out the possibility of doing
something in the near future that a few years ago would have been utterly
impossible.
16 BLOWN TO BITS
01_0137135599_ch01.qxd 4/16/08 1:19 PM Page 16
We begin our tour of the digital landscape with a look at our privacy, a social
structure the explosion has left in shambles. While we enjoy the benefits of
ubiquitous information, we also sense the loss of the shelter that privacy once
gave us. And we don’t know what we want to build in its place. The good and
ill of technology, and its promise and peril, are all thrown together when
information about us is spread everywhere. In the post-privacy world, we
stand exposed to the glare of noonday sunlight—and sometimes it feels
strangely pleasant.
CHAPTER 1 DIGITAL EXPLOSION 17
01_0137135599_ch01.qxd 4/16/08 1:19 PM Page 17
01_0137135599_ch01.qxd 4/16/08 1:19 PM Page 18
CHAPTER 2
Naked in the Sunlight
Privacy Lost, Privacy Abandoned
1984 Is Here, and We Like It
On July 7, 2005, London was shaken as suicide bombers detonated four
explosions, three on subways and one on a double-decker bus. The attack on
the transit system was carefully timed to occur at rush hour, maximizing its
destructive impact. 52 people died and 700 more were injured.
Security in London had already been tight. The city was hosting the G8
Summit, and the trial of fundamentalist cleric Abu Hamza al-Masri had just
begun. Hundreds of thousands of surveillance cameras hadn’t deterred the
terrorist act, but the perpetrators were caught on camera. Their pictures were
sent around the world instantly. Working from 80,000 seized tapes, police
were able to reconstruct a reconnaissance trip the bombers had made two
weeks earlier.
George Orwell’s 1984 was published in 1948. Over the subsequent years,
the book became synonymous with a world of permanent surveillance, a soci-
ety devoid of both privacy and freedom:
…there seemed to be no color in anything except the posters that were
plastered everywhere. The black-mustachio’d face gazed down from
every commanding corner. There was one on the house front immedi-
ately opposite. BIG BROTHER IS WATCHING YOU …
The real 1984 came and went nearly a quarter century ago. Today, Big
Brother’s two-way telescreens would be amateurish toys. Orwell’s imagined
19
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 19
London had cameras everywhere. His actual city now has at least half a mil-
lion. Across the UK, there is one surveillance camera for every dozen people.
The average Londoner is photographed hundreds of times a day by electronic
eyes on the sides of buildings and on utility poles.
Yet there is much about the digital world that Orwell did not imagine. He
did not anticipate that cameras are far from the most pervasive of today’s
tracking technologies. There are dozens of other kinds of data sources, and
the data they produce is retained and analyzed. Cell phone companies know
not only what numbers you call, but where you have carried your phone.
Credit card companies know not only where you spent your money, but what
you spent it on. Your friendly bank keeps electronic records of your transac-
tions not only to keep your balance right, but because it has to tell the gov-
ernment if you make huge withdrawals. The digital explosion has scattered
the bits of our lives everywhere: records of the clothes we wear, the soaps we
wash with, the streets we walk, and the cars we drive and where we drive
them. And although Orwell’s Big Brother had his cameras, he didn’t have
search engines to piece the bits together, to find the needles in the haystacks.
Wherever we go, we leave digital footprints, while computers of staggering
capacity reconstruct our movements from the tracks. Computers re-assemble
the clues to form a comprehensive image of who we are, what we do, where
we are doing it, and whom we are discussing it with.
Perhaps none of this would have surprised Orwell. Had he known about
electronic miniaturization, he might have guessed that we would develop an
astonishing array of tracking technologies. Yet there is something more fun-
damental that distinguishes the world of 1984 from the actual world of today.
We have fallen in love with this always-on world. We accept our loss of pri-
vacy in exchange for efficiency, convenience, and small price discounts.
According to a 2007 Pew/Internet Project report, “60% of Internet users say
they are not worried about how much information is available about them
online.” Many of us publish and broadcast the most intimate moments of our
lives for all the world to see, even when no one requires or even asks us to
do so. 55% of teenagers and 20% of adults have created profiles on social
networking web sites. A third of the teens with profiles, and half the adults,
place no restrictions on who can see them.
In Orwell’s imagined London, only O’Brien and other members of the Inner
Party could escape the gaze of the telescreen. For the rest, the constant gaze
was a source of angst and anxiety. Today, we willingly accept the gaze. We
either don’t think about it, don’t know about it, or feel helpless to avoid it
except by becoming hermits. We may even judge its benefits to outweigh its
risks. In Orwell’s imagined London, like Stalin’s actual Moscow, citizens spied
on their fellow citizens. Today, we can all be Little Brothers, using our search
20 BLOWN TO BITS
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 20
engines to check up on our children,
our spouses, our neighbors, our col-
leagues, our enemies, and our
friends. More than half of all adult
Internet users have done exactly
that.
The explosive growth in digital
technologies has radically altered
our expectations about what will be
private and shifted our thinking
about what should be private.
Ironically, the notion of privacy has
become fuzzier at the same time as
the secrecy-enhancing technology of
encryption has become widespread.
Indeed, it is remarkable that we no
longer blink at intrusions that a decade ago would have seemed shocking.
Unlike the story of secrecy, there was no single technological event that
caused the change, no privacy-shattering breakthrough—only a steady
advance on several technological fronts that ultimately passed a tipping point.
Many devices got cheaper, better, and smaller. Once they became useful
consumer goods, we stopped worrying about their uses as surveillance
devices. For example, if the police were the only ones who had cameras in
their cell phones, we would be alarmed. But as long as we have them too, so
we can send our friends funny pictures from parties, we don’t mind so much
that others are taking pictures of us. The social evolution that was supported
by consumer technologies in turn made us more accepting of new enabling
technologies; the social and technological evolutions have proceeded hand in
hand. Meanwhile, international terrorism has made the public in most democ-
racies more sympathetic to intrusive measures intended to protect our secu-
rity. With corporations trying to make money from us and the government
trying to protect us, civil libertarians are a weak third voice when they warn
that we may not want others to know so much about us.
So we tell the story of privacy in stages. First, we detail the enabling tech-
nologies, the devices and computational processes that have made it easy and
convenient for us to lose our privacy—some of them familiar technologies,
and some a bit more mysterious. We then turn to an analysis of how we have
lost our privacy, or simply abandoned it. Many privacy-shattering things
have happened to us, some with our cooperation and some not. As a result,
the sense of personal privacy is very different today than it was two decades
ago. Next, we discuss the social changes that have occurred—cultural shifts
CHAPTER 2 NAKED IN THE SUNLIGHT 21
PUBLIC ORGANIZATIONS INVOLVED
IN DEFENDING PRIVACY
Existing organizations have focused
on privacy issues in recent years,
and new ones have sprung up.
In the U.S., important forces are
the American Civil Liberties Union
(ACLU, www.aclu.org), the
Electronic Privacy Information
Center (EPIC, epic.org), the
Center for Democracy and
Technology (CDT, www.cdt.org),
and the Electronic Frontier
Foundation (www.eff.org).
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 21
that were facilitated by the technological diffusion, which in turn made new
technologies easier to deploy. And finally we turn to the big question: What
does privacy even mean in the digitally exploded world? Is there any hope of
keeping anything private when everything is bits, and the bits are stored,
copied, and moved around the world in an instant? And if we can’t—or
won’t—keep our personal information to ourselves anymore, how can we
make ourselves less vulnerable to the downsides of living in such an exposed
world? Standing naked in the sunlight, is it still possible to protect ourselves
against ills and evils from which our privacy used to protect us?
Footprints and Fingerprints
As we do our daily business and lead our private lives, we leave footprints
and fingerprints. We can see our footprints in mud on the floor and in the
sand and snow outdoors. We would not be surprised that anyone who went
to the trouble to match our shoes to our footprints could determine, or guess,
where we had been. Fingerprints are
different. It doesn’t even occur to us
that we are leaving them as we open
doors and drink out of tumblers.
Those who have guilty consciences
may think about fingerprints and
worry about where they are leaving
them, but the rest of us don’t.
In the digital world, we all leave both electronic footprints and electronic
fingerprints—data trails we leave intentionally, and data trails of which we
are unaware or unconscious. The identifying data may be useful for forensic
purposes. Because most of us don’t consider ourselves criminals, however, we
tend not to worry about that. What we don’t think about is that the various
small smudges we leave on the digital landscape may be useful to someone
else—someone who wants to use the data we left behind to make money or to
get something from us. It is therefore important to understand how and where
we leave these digital footprints and fingerprints.
Smile While We Snap!
Big Brother had his legions of cameras, and the City of London has theirs
today. But for sheer photographic pervasiveness, nothing beats the cameras
in the cell phones in the hands of the world’s teenagers. Consider the alleged
misjudgment of Jeffrey Berman. In early December 2007, a man about
22 BLOWN TO BITS
THE UNWANTED GAZE
The Unwanted Gaze by Jeffrey
Rosen (Vintage, 2001) details many
ways in which the legal system has
contributed to our loss of privacy.
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 22
60 years old committed a series of assaults on the Boston public transit sys-
tem, groping girls and exposing himself. After one of the assaults, a victim
took out her cell phone. Click! Within hours, a good head shot was up on the
Web and was shown on all the Boston area television stations. Within a day,
Berman was under arrest and charged with several crimes. “Obviously we,
from time to time, have plainclothes officers on the trolley, but that’s a very
difficult job to do,” said the chief of the Transit Police. “The fact that this girl
had the wherewithal to snap a picture to identify him was invaluable.”
That is, it would seem, a story with a happy ending, for the victim at least.
But the massive dissemination of cheap cameras coupled with universal
access to the Web also enables a kind of vigilante justice—a ubiquitous Little-
Brotherism, in which we can all be detectives, judges, and corrections offi-
cers. Mr. Berman claims he is innocent; perhaps the speed at which the
teenager’s snapshot was disseminated unfairly created a presumption of his
guilt. Bloggers can bring global disgrace to ordinary citizens.
In June 2005, a woman allowed her dog to relieve himself on a Korean
subway, and subsequently refused to clean up his mess, despite offers from
others to help. The incident was cap-
tured by a fellow passenger and
posted online. She soon became
known as “gae-ttong-nyue” (Korean
for “puppy poo girl”). She was iden-
tified along with her family, was
shamed, and quit school. There is
now a Wikipedia entry about the
incident. Before the digital explo-
sion—before bits made it possible to
convey information instantaneously,
everywhere—her actions would have been embarrassing and would have been
known to those who were there at the time. It is unlikely that the story would
have made it around the world, and that it would have achieved such noto-
riety and permanence.
Still, in these cases, at least someone thought someone did something
wrong. The camera just happened to be in the right hands at just the right
moment. But looking at images on the Web is now a leisure activity that any-
one can do at any time, anywhere in the world. Using Google Street View, you
can sit in a café in Tajikistan and identify a car that was parked in my drive-
way when Google’s camera came by (perhaps months ago). From Seoul, you
can see what’s happening right now, updated every few seconds, in Picadilly
Circus or on the strip in Las Vegas. These views were always available to the
public, but cameras plus the Web changed the meaning of “public.”
CHAPTER 2 NAKED IN THE SUNLIGHT 23
There are many free webcam sites,
at which you can watch what’s hap-
pening right now at places all over
the world. Here are a few:
www.camvista.com
www.earthcam.com
www.webcamworld.com
www.webworldcam.com
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 23
And an electronic camera is not just a camera. Harry Potter and the
Deathly Hallows is, as far as anyone knows, the last book in the Harry Potter
series. Its arrival was eagerly awaited, with lines of anxious Harry fans
stretching around the block at bookstores everywhere. One fan got a pre-
release copy, painstakingly photographed every page, and posted the entire
book online before the official release. A labor of love, no doubt, but a bla-
tant copyright violation as well. He doubtless figured he was just posting the
pixels, which could not be traced back to him. If that was his presumption,
he was wrong. His digital fingerprints were all over the images.
Digital cameras encode metadata along with the image. This data, known
as the Exchangeable Image File Format (EXIF), includes camera settings
(shutter speed, aperture, compression, make, model, orientation), date and
time, and, in the case of our Harry Potter fan, the make, model, and serial
number of his camera (a Canon Rebel 350D, serial number 560151117). If he
registered his camera, bought it with a credit card, or sent it in for service,
his identity could be known as well.
Knowing Where You Are
Global Position Systems (GPSs) have improved the marital lives of count-
less males too stubborn to ask directions. Put a Garmin or a Tom Tom in a
car, and it will listen to precisely timed signals from satellites reporting their
positions in space. The GPS calculates its own location from the satellites’
locations and the times their signals are received. The 24 satellites spinning
12,500 miles above the earth enable your car to locate itself within 25 feet,
at a price that makes these systems popular birthday presents.
If you carry a GPS-enabled cell phone, your friends can find you, if that
its what you want. If your GPS-enabled rental car has a radio transmitter, you
can be found whether you want it or not. In 2004, Ron Lee rented a car from
Payless in San Francisco. He headed east to Las Vegas, then back to Los
Angeles, and finally home. He was expecting to pay $150 for his little vaca-
tion, but Payless made him pay more—$1,400, to be precise. Mr. Lee forgot to
read the fine print in his rental contract. He had not gone too far; his con-
tract was for unlimited mileage. He had missed the fine print that said, “Don’t
leave California.” When he went out of state, the unlimited mileage clause
was invalidated. The fine print said that Payless would charge him $1 per
Nevada mile, and that is exactly what the company did. They knew where he
was, every minute he was on the road.
A GPS will locate you anywhere on earth; that is why mountain climbers
carry them. They will locate you not just on the map but in three dimensions,
telling you how high up the mountain you are. But even an ordinary cell
phone will serve as a rudimentary positioning system. If you are traveling in
24 BLOWN TO BITS
02_0137135599_ch02.qxd 7/31/08 1:35 PM Page 24
settled territory—any place where you can get cell phone coverage—the sig-
nals from the cell phone towers can be used to locate you. That is how Tanya
Rider was found (see Chapter 1 for details). The location is not as precise as
that supplied by a GPS—only within ten city blocks or so—but the fact that it
is possible at all means that photos can be stamped with identifying informa-
tion about where they were shot, as well as when and with what camera.
Knowing Even Where Your Shoes Are
A Radio Frequency Identification tag—RFID, for short—can be read from a
distance of a few feet. Radio Frequency Identification is like a more elaborate
version of the familiar bar codes that identify products. Bar codes typically
identify what kind of thing an item is—the make and model, as it were.
Because RFID tags have the capacity for much larger numbers, they can pro-
vide a unique serial number for each item: not just “Coke, 12 oz. can” but
“Coke can #12345123514002.” And because RFID data is transferred by radio
waves rather than visible light, the tags need not be visible to be read, and
the sensor need not be visible to do the reading.
RFIDs are silicon chips, typically embedded in plastic. They can be used to
tag almost anything (see Figure 2.1). “Prox cards,” which you wave near a
sensor to open a door, are RFID tags; a few bits of information identifying
you are transmitted from the card to the sensor. Mobil’s “Speedpass” is a lit-
tle RFID on a keychain; wave it near a gas pump and the pump knows whom
to charge for the gasoline. For a decade, cattle have had RFIDs implanted in
their flesh, so individual animals can be tracked. Modern dairy farms log the
milk production of individual cows, automatically relating the cow’s identity
to its daily milk output. Pets are commonly RFID-tagged so they can be
reunited with their owners if the animals go missing for some reason. The
possibility of tagging humans is obvious, and has been proposed for certain
high-security applications, such as controlling access to nuclear plants.
But the interesting part of the RFID story is more mundane—putting tags
in shoes, for example. RFID can be the basis for powerful inventory tracking
systems.
RFID tags are simple devices.
They store a few dozen bits of infor-
mation, usually unique to a particu-
lar tag. Most are passive devices,
with no batteries, and are quite
small. The RFID includes a tiny elec-
tronic chip and a small coil, which
acts as a two-way antenna. A weak
CHAPTER 2 NAKED IN THE SUNLIGHT 25
SPYCHIPS
This aptly named book by Katherine
Albrecht and Liz McIntyre (Plume,
2006) includes many stories of
actual and proposed RFID uses by
consumer goods manufacturers and
retailers.
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 25
current flows through the coil when the RFID passes through an electromag-
netic field—for example, from a scanner in the frame of a store, under the car-
pet, or in someone’s hand. This feeble current is just strong enough to power
the chip and induce it to transmit the identifying information. Because RFIDs
are tiny and require no connected power source, they are easily hidden. We
see them often as labels affixed to products; the one in Figure 2.1 was
between the pages of a book bought from a bookstore. They can be almost
undetectable.
26 BLOWN TO BITS
FIGURE 2.1 An RFID found between the pages of a book. A bookstore receiving a
box of RFID-tagged books can check the incoming shipment against the order
without opening the carton. If the books and shelves are scanned during stocking,
the cash register can identify the section of the store from which each purchased
copy was sold.
RFIDs are generally used to improve record-keeping, not for snooping.
Manufacturers and merchants want to get more information, more reliably,
so they naturally think of tagging merchandise. But only a little imagination
is required to come up with some disturbing scenarios. Suppose, for example,
that you buy a pair of red shoes at a chain store in New York City, and the
shoes have an embedded RFID. If you pay with a credit card, the store knows
your name, and a good deal more about you from your purchasing history. If
you wear those shoes when you walk into a branch store in Los Angeles a
month later, and that branch has an RFID reader under the rug at the
entrance, the clerk could greet you by name. She might offer you a scarf to
match the shoes—or to match anything else you bought recently from any
other branch of the store. On the other hand, the store might know that you
have a habit of returning almost everything you buy—in that case, you might
find yourself having trouble finding anyone to wait on you!
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 26
The technology is there to do it. We know of no store that has gone quite
this far, but in September 2007, the Galeria Kaufhof in Essen, Germany
equipped the dressing rooms in the men’s clothing department with RFID
readers. When a customer tries on garments, a screen informs him of avail-
able sizes and colors. The system may be improved to offer suggestions about
accessories. The store keeps track of what items are tried on together and
what combinations turn into purchases. The store will remove the RFID tags
from the clothes after they are purchased—if the customer asks; otherwise,
they remain unobtrusively and could be scanned if the garment is returned
to the store. Creative retailers everywhere dream of such ways to use devices
to make money, to save money, and to give them small advantages over their
competitors. Though Galeria Kaufhof is open about its high-tech men’s
department, the fear that customers won’t like their clever ideas sometimes
holds back retailers—and sometimes simply causes them to keep quiet about
what they are doing.
Black Boxes Are Not Just for Airplanes Anymore
On April 12, 2007, John Corzine, Governor of New Jersey, was heading back
to the governor’s mansion in Princeton to mediate a discussion between Don
Imus, the controversial radio personality, and the Rutgers University women’s
basketball team.
His driver, 34-year-old state trooper Robert Rasinski, headed north on the
Garden State Parkway. He swerved to avoid another car and flipped the
Governor’s Chevy Suburban. Governor Corzine had not fastened his seatbelt,
and broke 12 ribs, a femur, his collarbone, and his sternum. The details of
exactly what happened were unclear. When questioned, Trooper Rasinski said
he was not sure how fast they were going—but we do know. He was going 91
in a 65 mile per hour zone. There were no police with radar guns around; no
human being tracked his speed. We know his exact speed at the moment of
impact because his car, like 30 million cars in America, had a black box—an
“event data recorder” (EDR) that captured every detail about what was going
on just before the crash. An EDR is an automotive “black box” like the ones
recovered from airplane crashes.
EDRs started appearing in cars around 1995. By federal law, they will be
mandatory in the United States beginning in 2011. If you are driving a new
GM, Ford, Isuzu, Mazda, Mitsubishi, or Subaru, your car has one—whether
anyone told you that or not. So do about half of new Toyotas. Your insur-
ance company is probably entitled to its data if you have an accident. Yet
most people do not realize that they exist.
CHAPTER 2 NAKED IN THE SUNLIGHT 27
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 27
EDRs capture information about speed, braking time, turn signal status,
seat belts: things needed for accident reconstruction, to establish responsibil-
ity, or to prove innocence. CSX Railroad was exonerated of all liability in the
death of the occupants of a car when its EDR showed that the car was stopped
on the train tracks when it was hit. Police generally obtain search warrants
before downloading EDR data, but not always; in some cases, they do not
have to. When Robert Christmann struck and killed a pedestrian on October
18, 2003, Trooper Robert Frost of the New York State Police downloaded data
from the car at the accident scene. The EDR revealed that Christmann had
been going 38 MPH in an area where the speed limit was 30. When the data
was introduced at trial, Christmann claimed that the state had violated his
Fourth Amendment rights against unreasonable searches and seizures,
because it had not asked his permission or obtained a search warrant before
retrieving the data. That was not necessary, ruled a New York court. Taking
bits from the car was not like taking something out of a house, and no search
warrant was necessary.
Bits mediate our daily lives. It is almost as hard to avoid leaving digital
footprints as it is to avoid touching the ground when we walk. Yet even if we
live our lives without walking, we would unsuspectingly be leaving finger-
prints anyway.
Some of the intrusions into our pri-
vacy come because of the unexpected,
unseen side effects of things we do quite
voluntarily. We painted the hypothetical
picture of the shopper with the RFID-
tagged shoes, who is either welcomed or
shunned on her subsequent visits to the store, depending on her shopping his-
tory. Similar surprises can lurk almost anywhere that bits are exchanged. That
is, for practical purposes, pretty much everywhere in daily life.
Tracing Paper
If I send an email or download a web page, it should come as no surprise that
I’ve left some digital footprints. After all, the bits have to get to me, so some
part of the system knows where I am. In the old days, if I wanted to be anony-
mous, I could write a note, but my handwriting might be recognizable, and I
might leave fingerprints (the oily kind) on the paper. I might have typed, but
Perry Mason regularly solved crimes by matching a typewritten note with the
unique signature of the suspect’s typewriter. More fingerprints.
28 BLOWN TO BITS
It is almost as hard to avoid
leaving digital footprints as
it is to avoid touching the
ground when we walk.
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 28
So, today I would laserprint the letter and wear gloves. But even that may
not suffice to disguise me. Researchers at Purdue have developed techniques
for matching laser-printed output to a particular printer. They analyze printed
sheets and detect unique characteristics of each manufacturer and each indi-
vidual printer—fingerprints that can be used, like the smudges of old type-
writer hammers, to match output with source. It may be unnecessary to put
the microscope on individual letters to identify what printer produced a page.
The Electronic Frontier Foundation has demonstrated that many color
printers secretly encode the printer serial number, date, and time on every
page that they print (see Figure 2.2). Therefore, when you print a report, you
should not assume that no one can tell who printed it.
CHAPTER 2 NAKED IN THE SUNLIGHT 29
Source: Laser fingerprint. Electronic Frontier Foundation. http://w.2.eff.org/Privacy/printers/
docucolor/.
FIGURE 2.2 Fingerprint left by a Xerox DocuColor 12 color laser printer. The dots
are very hard to see with the naked eye; the photograph was taken under blue light.
The dot pattern encodes the date (2005-05-21), time (12:50), and the serial number of
the printer (21052857).
There was a sensible rationale behind this technology. The government
wanted to make sure that office printers could not be used to turn out sets of
hundred dollar bills. The technology that was intended to frustrate counter-
feiters makes it possible to trace every page printed on color laser printers
back to the source. Useful technologies often have unintended consequences.
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 29
Many people, for perfectly legal and valid reasons, would like to protect
their anonymity. They may be whistleblowers or dissidents. Perhaps they are
merely railing against injustice in their workplace. Will technologies that
undermine anonymity in political discourse also stifle free expression? A
measure of anonymity is essential in a healthy democracy—and in the U.S.,
has been a weapon used to advance free speech since the time of the
Revolution. We may regret a complete abandonment of anonymity in favor
of communication technologies that
leave fingerprints.
The problem is not just the existence
of fingerprints, but that no one told us
that we are creating them.
The Parking Garage Knows More Than You Think
One day in the spring of 2006, Anthony and his wife drove to Logan Airport
to pick up some friends. They took two cars, which they parked in the garage.
Later in the evening, they paid at the kiosk inside the terminal, and left—or
tried to. One car got out of the garage without a problem, but Anthony’s was
held up for more than an hour, in the middle of the night, and was not
allowed to leave. Why? Because his ticket did not match his license plate.
It turns out that every car entering the airport garage has its license plate
photographed at the same time as the ticket is being taken. Anthony had held
both tickets while he and his wife were waiting for their friends, and then he
gave her back one—the “wrong” one, as it turned out. It was the one he had
taken when he drove in. When he tried to leave, he had the ticket that
matched his wife’s license plate number. A no-no.
Who knew that if two cars arrive and try to leave at the same time, they
may not be able to exit if the tickets are swapped? In fact, who knew that
every license plate is photographed as it enters the garage?
There is a perfectly sensible explanation. People with big parking bills
sometimes try to duck them by picking up a second ticket at the end of their
trip. When they drive out, they try to turn in the one for which they would
have to pay only a small fee. Auto thieves sometimes try the same trick. So
the system makes sense, but it raises many questions. Who else gets access to
the license plate numbers? If the police are looking for a particular car, can
they search the scanned license plate numbers of the cars in the garage? How
long is the data retained? Does it say anywhere, even in the fine print, that
your visit to the garage is not at all anonymous?
30 BLOWN TO BITS
The problem is not just the
existence of fingerprints,
but that no one told us that
we are creating them.
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 30
All in Your Pocket
The number of new data sources—and the proliferation and interconnection
of old data sources—is part of the story of how the digital explosion shattered
privacy. But the other part of the technology story is about how all that data
is put together.
On October 18, 2007, a junior staff member at the British national tax
agency sent a small package to the government’s auditing agency via TNT, a
private delivery service. Three weeks later, it had not arrived at its destina-
tion and was reported missing. Because the sender had not used TNT’s “reg-
istered mail” option, it couldn’t be traced, and as of this writing has not been
found. Perhaps it was discarded by mistake and never made it out of the mail-
room; perhaps it is in the hands of criminals.
The mishap rocked the nation. As a result of the data loss, every bank and
millions of individuals checked account activity for signs of fraud or identity
theft. On November 20, the head of the tax agency resigned. Prime Minister
Gordon Brown apologized to the nation, and the opposition party accused the
Brown administration of having “failed in its first duty—to protect the
public.”
The package contained two computer disks. The data on the disks included
names, addresses, birth dates, national insurance numbers (the British equiv-
alent of U.S. Social Security Numbers), and bank account numbers of 25 mil-
lion people—nearly 40% of the British population, and almost every child in
the land. The tax office had all this data because every British child receives
weekly government payments, and most families have the money deposited
directly into bank accounts. Ten years ago, that much data would have
required a truck to transport, not two small disks. Fifty years ago, it would
have filled a building.
This was a preventable catastrophe. Many mistakes were made; quite ordi-
nary mistakes. The package should have been registered. The disks should
have been encrypted. It should not have taken three weeks for someone to
speak up. But those are all age-old mistakes. Offices have been sending pack-
ages for centuries, and even Julius Caesar knew enough to encrypt informa-
tion if he had to use intermediaries to deliver it. What happened in 2007 that
could not have happened in 1984 was the assembly of such a massive data-
base in a form that allowed it to be easily searched, processed, analyzed, con-
nected to other databases, transported—and “lost.”
Exponential growth—in storage size, processing speed, and communication
speed—have changed the same old thing into something new. Blundering, stu-
pidity, curiosity, malice, and thievery are not new. The fact that sensitive data
CHAPTER 2 NAKED IN THE SUNLIGHT 31
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 31
about everyone in a nation could fit on a laptop is new. The ability to search
for a needle in the haystack of the Internet is new. Easily connecting “pub-
lic” data sources that used to be stored in file drawers in Albuquerque and
Atlanta, but are now both electronically accessible from Algeria—that is new
too.
Training, laws, and software all can help. But the truth of the matter is that
as a society, we don’t really know how to deal with these consequences of the
digital explosion. The technology revolution is outstripping society’s capac-
ity to adjust to the changes in what can be taken for granted. The Prime
Minister had to apologize to the British nation because among the things that
have been blown to bits is the presumption that no junior staffer could do
that much damage by mailing a small parcel.
Connecting the Dots
The way we leave fingerprints and footprints is only part of what is new. We
have always left a trail of information behind us, in our tax records, hotel
reservations, and long distance telephone bills. True, the footprints are far
clearer and more complete today than ever before. But something else has
changed—the harnessing of computing power to correlate data, to connect the
dots, to put pieces together, and to create cohesive, detailed pictures from
what would otherwise have been meaningless fragments. The digital explo-
sion does not just blow things apart. Like the explosion at the core of an
atomic bomb, it blows things together as well. Gather up the details, connect
the dots, assemble the parts of the puzzle, and a clear picture will emerge.
Computers can sort through databases too massive and too boring to be
examined with human eyes. They can assemble colorful pointillist paintings
out of millions of tiny dots, when any few dots would reveal nothing. When
a federal court released half a million Enron emails obtained during the cor-
ruption trial, computer scientists quickly identified the subcommunities, and
perhaps conspiracies, among Enron employees, using no data other than the
pattern of who was emailing whom (see Figure 2.3). The same kinds of clus-
tering algorithms work on patterns of telephone calls. You can learn a lot by
knowing who is calling or emailing whom, even if you don’t know what they
are saying to each other—especially if you know the time of the communica-
tions and can correlate them with the time of other events.
Sometimes even public information is revealing. In Massachusetts, the
Group Insurance Commission (GIC) is responsible for purchasing health
insurance for state employees. When the premiums it was paying jumped one
year, the GIC asked for detailed information on every patient encounter. And
32 BLOWN TO BITS
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 32
for good reason: All kinds of health care costs had been growing at prodi-
gious rates. In the public interest, the state had a responsibility to understand
how it was spending taxpayer money. The GIC did not want to know patients’
names; it did not want to track individuals, and it did not want people to
think they were being tracked. Indeed, tracking the medical visits of individ-
uals would have been illegal.
CHAPTER 2 NAKED IN THE SUNLIGHT 33
Source: Enron, Jeffrey Heer. Figure 3 from http://jheer.org/enron/v1/.
FIGURE 2.3 Diagram showing clusters of Enron emailers, indicating which
employees carried on heavy correspondence with which others. The evident “blobs”
may be the outlines of conspiratorial cliques.
So, the GIC data had no names, no addresses, no Social Security Numbers,
no telephone numbers—nothing that would be a “unique identifier” enabling
a mischievous junior staffer in the GIC office to see who exactly had a
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 33
particular ailment or complaint. To use the official lingo, the data was
“de-identified”; that is, stripped of identifying information. The data did
include the gender, birth date, zip code, and similar facts about individuals
making medical claims, along with some information about why they had
sought medical attention. That information was gathered not to challenge
any particular person, but to learn about patterns—if the truckers in Worcester
are having lots of back injuries, for example, maybe workers in that region
need better training on how to lift heavy items. Most states do pretty much
the same kind of analysis of de-identified data about state workers.
Now this was a valuable data set not just for the Insurance Commission, but
for others studying public health and the medical industry in Massachusetts.
Academic researchers, for example, could use such a large inventory of med-
ical data for epidemiological studies. Because it was all de-identified, there was
no harm in letting others see it, the GIC figured. In fact, it was such good data
that private industry—for example, businesses in the health management sec-
tor—might pay money for it. And so the GIC sold the data to businesses. The
taxpayers might even benefit doubly from this decision: The data sale would
provide a new revenue source to the state, and in the long run, a more
informed health care industry might run more efficiently.
But how de-identified really was the material?
Latanya Sweeney was at the time a researcher at MIT (she went on to
become a computer science professor at Carnegie Mellon University). She
wondered how hard it would be for those who had received the de-identified
data to “re-identify” the records and learn the medical problems of a partic-
ular state employee—for example, the governor of the Commonwealth.
Governor Weld lived, at that time, in Cambridge, Massachusetts.
Cambridge, like many municipalities, makes its voter lists publicly available,
for a charge of $15, and free for candidates and political organizations. If you
know the precinct, they are available for only $.75. Sweeney spent a few dol-
lars and got the voter lists for Cambridge. Anyone could have done the same.
According to the Cambridge voter registration list, there were only six peo-
ple in Cambridge with Governor Weld’s birth date, only three of those were
men, and only one of those lived in Governor Weld’s five-digit zip code.
Sweeney could use that combination of factors, birth date, gender, and zip
code to recover the Governor’s medical records—and also those for members
of his family, since the data was organized by employee. This type of re-iden-
tification is straightforward. In Cambridge, in fact, birth date alone was suf-
ficient to identify more than 10% of the population. Nationally, gender, zip
code, and date of birth are all it takes to identify 87% of the U.S. population
uniquely.
34 BLOWN TO BITS
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 34
The data set contained far more than gender, zip code, and birth date. In
fact, any of the 58 individuals who received the data in 1997 could have
identified any of the 135,000 people in the database. “There is no patient con-
fidentiality,” said Dr. Joseph Heyman, president of the Massachusetts Medical
Society. “It’s gone.”
It is easy to read a story like this and scream, “Heads should roll!.” But it
is actually quite hard to figure out who, if anyone, made a mistake. Certainly
collecting the information was the right thing to do, given that health costs
are a major expense for all businesses and institutions. The GIC made an hon-
est effort to de-identify the data before releasing it. Arguably the GIC might
not have released the data to other state agencies, but that would be like say-
ing that every department of govern-
ment should acquire its heating oil
independently. Data is a valuable
resource, and once someone has col-
lected it, the government is entirely
correct in wanting it used for the pub-
lic good. Some might object to selling
the data to an outside business, but only in retrospect; had the data really
been better de-identified, whoever made the decision to sell the data might
well have been rewarded for helping to hold down the cost of government.
Perhaps the mistake was the ease with which voter lists can be obtained.
However, it is a tradition deeply engrained in our system of open elections
that the public may know who is eligible to vote, and indeed who has voted.
And voter lists are only one source of public data about the U.S. population.
How many 21-year-old male Native Hawaiians live in Middlesex County,
Massachusetts? In the year 2000, there were four. Anyone can browse the U.S.
Census data, and sometimes it can help fill in pieces of a personal picture:
Just go to factfinder.census.gov.
The mistake was thinking that the GIC data was truly de-identified, when
it was not. But with so many data sources available, and so much computing
power that could be put to work connecting the dots, it is very hard to know
just how much information has to be discarded from a database to make it
truly anonymous. Aggregating data into larger units certainly helps—releas-
ing data by five-digit zip codes reveals less than releasing it by nine-digit zip
codes. But the coarser the data, the less it reveals also of the valuable infor-
mation for which it was made available.
How can we solve a problem that results from many developments, no one
of which is really a problem in itself?
CHAPTER 2 NAKED IN THE SUNLIGHT 35
It is easy to read a story like
this and scream, “Heads
should roll!.” But it is actually
quite hard to figure out who,
if anyone, made a mistake.
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 35
Why We Lost Our Privacy, or Gave It Away
Information technology did not cause the end of privacy, any more than
automotive technology caused teen sex. Technology creates opportunities and
risks, and people, as individuals and as societies, decide how to live in the
changed landscape of new possibilities. To understand why we have less pri-
vacy today than in the past, we must look not just at the gadgets. To be sure,
we should be wary of spies and thieves, but we should also look at those who
protect us and help us—and we should also take a good look in the mirror.
We are most conscious of our personal information winding up in the
hands of strangers when we think about data loss or theft. Reports like the
one about the British tax office have become fairly common. The theft of
information about 45 million customers of TJX stores, described in Chapter 5,
“Secret Bits,” was even larger than the British catastrophe. In 2003, Scott
Levine, owner of a mass email business named Snipermail, stole more than a
billion personal information records from Acxiom. Millions of Americans are
victimized by identity theft every year, at a total cost in the tens of billions of
dollars annually. Many more of us harbor daily fears that just “a little bit” of
our financial information has leaked out, and could be a personal time bomb
if it falls into the wrong hands.
Why can’t we just keep our personal information to ourselves? Why do so
many other people have it in the first place, so that there is an opportunity
for it to go astray, and an incentive for creative crooks to try to steal it?
We lose control of our personal information because of things we do to
ourselves, and things others do to us. Of things we do to be ahead of the
curve, and things we do because everyone else is doing them. Of things we
do to save money, and things we do to save time. Of things we do to be safe
from our enemies, and things we do because we feel invulnerable. Our loss of
privacy is a problem, but there is no one answer to it, because there is no one
reason why it is happening. It is a messy problem, and we first have to think
about it one piece at a time.
We give away information about ourselves—voluntarily leave visible foot-
prints of our daily lives—because we judge, perhaps without thinking about it
very much, that the benefits outweigh the costs. To be sure, the benefits are
many.
Saving Time
For commuters who use toll roads or bridges, the risk-reward calculation is not
even close. Time is money, and time spent waiting in a car is also anxiety and
36 BLOWN TO BITS
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 36
frustration. If there is an option to get a toll booth transponder, many com-
muters will get one, even if the device costs a few dollars up front. Cruising
past the cars waiting to pay with dollar bills is not just a relief; it actually
brings the driver a certain satisfied glow.
The transponder, which the driver attaches to the windshield from inside
the car, is an RFID, powered with a battery so identifying information can be
sent to the sensor several feet away as the driver whizzes past. The sensor can
be mounted in a constricted travel lane, where a toll booth for a human toll-
taker might have been. Or it can be mounted on a boom above traffic, so the
driver doesn’t even need to change lanes or slow down
And what is the possible harm? Of course, the state is recording the fact
that the car has passed the sensor; that is how the proper account balance can
be debited to pay the toll. When the balance gets too low, the driver’s credit
card may get billed automatically to replenish the balance. All that only
makes the system better—no fumbling for change or doing anything else to
pay for your travels.
The monthly bill—for the Massachusetts Fast Lane, for example—shows
where and when you got on the highway—when, accurate to the second. It
also shows where you got off and how far you went. Informing you of the
mileage is another useful service, because Massachusetts drivers can get a
refund on certain fuel taxes, if the fuel was used on the state toll road. Of
course, you do not need a PhD to figure out that the state also knows when
you got off the road, to the second, and that with one subtraction and one
division, its computers could figure out if you were speeding. Technically, in
fact, it would be trivial for the state to print the appropriate speeding fine at
the bottom of the statement, and to bill your credit card for that amount at
the same time as it was charging for tolls. That would be taking convenience
a bit too far, and no state does it, yet.
What does happen right now, however, is that toll transponder records are
introduced into divorce and child custody cases. You’ve never been within
five miles of that lady’s house? Really? Why have you gotten off the high-
way at the exit near it so many times? You say you can be the better custo-
dial parent for your children, but the facts suggest otherwise. As one lawyer
put it, “When a guy says, ‘Oh, I’m home every day at five and I have dinner
with my kids every single night,’ you subpoena his E-ZPass and you find out
he’s crossing that bridge every night at 8:30. Oops!” These records can be
subpoenaed, and have been, hundreds of times, in family law cases. They
have also been used in employment cases, to prove that the car of a worker
who said he was working was actually far from the workplace.
But most of us aren’t planning to cheat on our spouses or our bosses, so
the loss of privacy seems like no loss at all, at least compared to the time
CHAPTER 2 NAKED IN THE SUNLIGHT 37
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 37
saved. Of course, if we actually were cheating, we would be in a big hurry,
and might take some risks to save a few minutes!
Saving Money
Sometimes it’s money, not time, which motivates us to leave footprints. Such
is the case with supermarket loyalty cards. If you do not want Safeway to
keep track of the fact that you bought the 12-pack of Yodels despite your
recent cholesterol results, you can make sure it doesn’t know. You simply pay
the “privacy tax”—the surcharge for customers not presenting a loyalty card.
The purpose of loyalty cards is to enable merchants to track individual item
purchases. (Item-level transactions are typically not tracked by credit card
companies, which do not care if you bought Yodels instead of granola, so
long as you pay the bill.) With loyalty cards, stores can capture details of
cash transactions as well. They can process all the transaction data, and draw
inferences about shoppers’ habits. Then, if a lot of people who buy Yodels
also buy Bison Brew Beer, the store’s automated cash register can automati-
cally spit out a discount coupon for Bison Brew as your Yodels are being
bagged. A “discount” for you, and more sales for Safeway. Everybody wins.
Don’t they?
As grocery stores expand their web-based business, it is even easier for
them to collect personal information about you. Reading the fine print when
you sign up is a nuisance, but it is worth doing, so you understand what you
are giving and what you are getting in return. Here are a few sentences of
Safeway’s privacy policy for customers who use its web site:
Safeway may use personal information to provide you with news-
letters, articles, product or service alerts, new product or service
announcements, saving awards, event invitations, personally tailored
coupons, program and promotional information and offers, and other
information, which may be provided to Safeway by other companies.
… We may provide personal information to our partners and suppliers
for customer support services and processing of personal information
on behalf of Safeway. We may also share personal information with
our affiliate companies, or in the course of an actual or potential sale,
re-organization, consolidation, merger, or amalgamation of our busi-
ness or businesses.
Dreary reading, but the language gives Safeway lots of leeway. Maybe you
don’t care about getting the junk mail. Not everyone thinks it is junk, and the
38 BLOWN TO BITS
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 38
company does let you “opt out” of receiving it (although in general, few
people bother to exercise opt-out rights). But Safeway has lots of “affiliates,”
and who knows how many companies with which it might be involved in a
merger or sale of part of its business. Despite privacy concerns voiced by
groups like C.A.S.P.I.A.N. (Consumers Against Supermarket Privacy Invasion
and Numbering, www.nocards.org), most shoppers readily agree to have the
data collected. The financial incentives are too hard to resist, and most con-
sumers just don’t worry about marketers knowing their purchases. But when-
ever purchases can be linked to your name, there is a record, somewhere in a
huge database, of whether you use regular or super tampons, lubricated or
unlubricated condoms, and whether you like regular beer or lite. You have
authorized the company to share it, and even if you hadn’t, the company
could lose it accidentally, have it stolen, or have it subpoenaed.
Convenience of the Customer
The most obvious reason not to worry about giving information to a com-
pany is that you do business with them, and it is in your interest to see that
they do their business with you better. You have no interest in whether they
make more money from you, but you do have a strong interest in making it
easier and faster for you to shop with them, and in cutting down the amount
of stuff they may try to sell you that you would have no interest in buying.
So your interests and theirs are, to a degree, aligned, not in opposition.
Safeway’s privacy policy states this explicitly: “Safeway Club Card informa-
tion and other information may be used to help make Safeway’s products,
services, and programs more useful to its customers.” Fair enough.
No company has been more progressive in trying to sell customers what
they might want than the online store Amazon. Amazon suggests products to
repeat customers, based on what they have bought before—or what they have
simply looked at during previous visits to Amazon’s web site. The algorithms
are not perfect; Amazon’s computers are drawing inferences from data, not
being clairvoyant. But Amazon’s guesses are pretty good, and recommending
the wrong book every now and then is a very low-cost mistake. If Amazon
does it too often, I might switch to Barnes and Noble, but there is no injury
to me. So again: Why should anyone care that Amazon knows so much about
me? On the surface, it seems benign. Of course, we don’t want the credit card
information to go astray, but who cares about knowing what books I have
looked at online?
Our indifference is another marker of the fact that we are living in an
exposed world, and that it feels very different to live here. In 1988, when a
CHAPTER 2 NAKED IN THE SUNLIGHT 39
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 39
videotape rental store clerk turned over Robert Bork’s movie rental records to
a Washington, DC newspaper during Bork’s Supreme Court confirmation
hearings, Congress was so outraged that it quickly passed a tough privacy
protection bill, The Video Privacy Protection Act. Videotape stores, if any still
exist, can be fined simply for keeping rental records too long. Twenty years
later, few seem to care much what Amazon does with its millions upon mil-
lions of detailed, fine-grained views into the brains of all its customers.
It’s Just Fun to Be Exposed
Sometimes, there can be no explanation for our willing surrender of our pri-
vacy except that we take joy in the very act of exposing ourselves to public
40 BLOWN TO BITS
HOW SITES KNOW WHO YOU ARE
1. You tell them. Log in to Gmail, Amazon, or eBay, and you are letting
them know exactly who you are.
2. They’ve left cookies on one of your previous visits. A cookie is a small
text file stored on your local hard drive that contains information that
a particular web site wants to have available during your current session
(like your shopping cart), or from one session to the next. Cookies give
sites persistent information for tracking and personalization. Your
browser has a command for showing cookies—you may be surprised how
many web sites have left them!
3. They have your IP address. The web server has to know where you are
so that it can ship its web pages to you. Your IP address is a number like
66.82.9.88 that locates your computer in the Internet (see the Appendix
for details). That address may change from one day to the next. But in
a residential setting, your Internet Service Provider (your ISP—typically
your phone or cable company) knows who was assigned each IP address
at any time. Those records are often subpoenaed in court cases.
If you are curious about who is using a particular IP address, you can check
the American Registry of Internet Numbers (www.arin.net). Services such as
whatismyip.com, whatismyip.org, and ipchicken.com also allow you to
check your own IP address. And www.whois.net allows you to check who
owns a domain name such as harvard.com—which turns out to be the
Harvard Bookstore, a privately owned bookstore right across the street from
the university. Unfortunately, that information won’t reveal who is sending
you spam, since spammers routinely forge the source of email they send you.
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 40
view. Exhibitionism is not a new phenomenon. Its practice today, as in the
past, tends to be in the province of the young and the drunk, and those wish-
ing to pretend they are one or the other. That correlation is by no means per-
fect, however. A university president had to apologize when an image of her
threatening a Hispanic male with a stick leaked out from her MySpace page,
with a caption indicating that she had to “beat off the Mexicans because they
were constantly flirting with my daughter.”
And there is a continuum of outrageousness. The less wild of the party
photo postings blend seamlessly with the more personal of the blogs, where
the bloggers are chatting mostly about their personal feelings. Here there is
not exuberance, but some simpler urge
for human connectedness. That pas-
sion, too, is not new. What is new is
that a photo or video or diary entry,
once posted, is visible to the entire
world, and that there is no taking it
back. Bits don’t fade and they don’t yellow. Bits are forever. And we don’t
know how to live with that.
For example, a blog selected with no great design begins:
This is the personal web site of Sarah McAuley. … I think sharing my
life with strangers is odd and narcissistic, which of course is why I’m
addicted to it and have been doing it for several years now. Need
more? You can read the “About Me” section, drop me an email, or you
know, just read the drivel that I pour out on an almost-daily basis.
No thank you, but be our guest. Or consider that there is a Facebook group
just for women who want to upload pictures of themselves uncontrollably
drunk. Or the Jennicam, through which Jennifer Kay Ringley opened her life
to the world for seven years, setting a standard for exposure that many since
have surpassed in explicitness, but few have approached in its endless ordi-
nariness. We are still experimenting, both the voyeurs and viewed.
Because You Can’t Live Any Other Way
Finally, we give up data about ourselves because we don’t have the time,
patience, or single-mindedness about privacy that would be required to live
our daily lives in another way. In the U.S., the number of credit, debit, and
bank cards is in the billions. Every time one is used, an electronic handshake
records a few bits of information about who is using it, when, where, and for
what. It is now virtually unheard of for people to make large purchases of
CHAPTER 2 NAKED IN THE SUNLIGHT 41
Bits don’t fade and they
don’t yellow. Bits are forever.
And we don’t know how to
live with that.
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 41
ordinary consumer goods with cash. Personal checks are going the way of
cassette tape drives, rendered irrelevant by newer technologies. Even if you
could pay cash for everything you buy, the tax authorities would have you
in their databases anyway. There even have been proposals to put RFIDs in
currency notes, so that the movement of cash could be tracked.
Only sects such as the Amish still live without electricity. It will soon be
almost that unusual to live without Internet connectivity, with all the finger-
prints it leaves of your daily searches and logins and downloads. Even the old
dumb TV is rapidly disappearing in favor of digital communications. Digital
TV will bring the advantages of video on demand—no more trips to rent
movies or waits for them to arrive in the mail—at a price: Your television ser-
vice provider will record what movies you have ordered. It will be so attrac-
tive to be able to watch what we want when we want to watch it, that we
won’t miss either the inconvenience or the anonymity of the days when all
the TV stations washed your house with their airwaves. You couldn’t pick the
broadcast times, but at least no one knew which waves you were grabbing
out of the air.
Little Brother Is Watching
So far, we have discussed losses of privacy due to things for which we could,
in principle anyway, blame ourselves. None of us really needs a loyalty card,
we should always read the fine print when we rent a car, and so on. We would
all be better off saying “no” a little more often to these privacy-busters, but
few of us would choose to live the life of constant vigilance that such res-
olute denial would entail. And even if we were willing to make those sacri-
fices, there are plenty of other privacy problems caused by things others do
to us.
The snoopy neighbor is a classic American stock figure—the busybody who
watches how many liquor bottles are in your trash, or tries to figure out
whose Mercedes is regularly parked in your driveway, or always seems to
know whose children were disorderly last Saturday night. But in Cyberspace,
we are all neighbors. We can all check up on each other, without even open-
ing the curtains a crack.
Public Documents Become VERY Public
Some of the snooping is simply what anyone could have done in the past by
paying a visit to the Town Hall. Details that were always public—but inacces-
sible—are quite accessible now.
42 BLOWN TO BITS
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 42
In 1975, Congress created the Federal Election Commission to administer
the Federal Election Campaign Act. Since then, all political contributions
have been public information. There is a difference, though, between “public”
and “readily accessible.” Making public data available on the Web shattered
the veil of privacy that came from inaccessibility.
Want to know who gave money to Al Franken for Senate? Lorne Michaels
from Saturday Night Live, Leonard Nimoy, Paul Newman, Craig Newmark (the
“craig” of craigslist.com), and Ginnie W., who works with us and may not
have wanted us to know her political leanings. Paul B., and Henry G., friends
of ours, covered their bases by giving to both Obama and Clinton.
The point of the law was to make it easy to look up big donors. But since
data is data, what about checking on your next-door neighbors? Ours defi-
nitely leaned toward Obama over Clinton, with no one in the Huckabee camp.
Or your clients? One of ours gave heartily to Dennis Kucinich. Or your daugh-
ter’s boyfriend? You can find out for yourself, at www.fec.gov or
fundrace.huffingtonpost.com. We’re not telling about our own.
Hosts of other facts are now available for armchair browsing—facts that in
the past were nominally public but required a trip to the Registrar of Deeds.
If you want to know what you neighbor paid for their house, or what it’s
worth today, many communities put all of their real estate tax rolls online. It
was always public; now it’s accessible. It was never wrong that people could
get this information, but it feels very different now that people can browse
through it from the privacy of their home.
If you are curious about someone, you can try to find him or her on
Facebook, MySpace, or just using an ordinary search engine. A college would
not peek at the stupid Facebook page of an applicant, would it? Absolutely
not, says the Brown Dean of Admissions, “unless someone says there’s some-
thing we should look at.”
New participatory websites create even bigger opportunities for informa-
tion-sharing. If you are about to go on a blind date, there are special sites just
for that. Take a look at www.dontdatehimgirl.com, a social networking site
with a self-explanatory focus. When we checked, this warning about one man
had just been posted, along with his name and photograph: “Compulsive
womanizer, liar, internet cheater; pathological liar who can’t be trusted as a
friend much less a boyfriend. Total creep! Twisted and sick—needs mental
help. Keep your daughter away from this guy!” Of course, such information
may be worth exactly what we paid for it. There is a similar site,
www.platewire.com, for reports about bad drivers. If you are not dating or
driving, perhaps you’d like to check out a neighborhood before you move in,
or just register a public warning about the obnoxious revelers who live next
door to you. If so, www.rottenneighbor.com is the site for you. When we
CHAPTER 2 NAKED IN THE SUNLIGHT 43
02_0137135599_ch02.qxd 7/31/08 1:37 PM Page 43
typed in the zip code in which one of us lives, a nice Google map appeared
with a house near ours marked in red. When we clicked on it, we got this
report on our neighbor:
you’re a pretty blonde, slim and gorgeous. hey, i’d come on to you if i
weren’t gay. you probably have the world handed to you like most
pretty women. is that why you think that you are too good to pick up
after your dog? you know that you are breaking the law as well as
being disrespectful of your neighbors. well, i hope that you step in
your own dogs poop on your way to work, or on your way to dinner.
i hope that the smell of your self importance follows you all day.
For a little money, you can get a lot more information. In January 2006, John
Aravosis, creator of Americablog.com, purchased the detailed cell phone
records of General Wesley Clark. For $89.95, he received a listing of all of
Clark’s calls for a three-day period. There are dozens of online sources for this
kind of information. You might think you’d have to be in the police or the
FBI to find out who people are calling on their cell phones, but there are
handy services that promise to provide anyone with that kind of information
for a modest fee. The Chicago Sun Times decided to put those claims to a test,
so it paid $110 to locatecell.com and asked for a month’s worth of cell
phone records of one Frank Main, who happened to be one of its own
reporters. The Sun Times did it all with a few keystrokes—provided the tele-
phone number, the dates, and a credit card number. The request went in on
Friday of a long weekend, and on Tuesday morning, a list came back in an
email. The list included 78 telephone numbers the reporter had called—
sources in law enforcement, people he was writing stories about, and editors
in the newspaper. It was a great service for law enforcement—except that
criminals can use it too, to find out whom the detectives are calling. These
incidents stimulated passage of the Telephone Records and Privacy Act of
2006, but in early 2008, links on locatecell.com were still offering to help
“find cell phone records in seconds,” and more.
If cell phone records are not enough information, consider doing a proper
background check. For $175, you can sign up as an “employer” with
ChoicePoint and gain access to reporting services including criminal records,
credit history, motor vehicle records, educational verification, employment
verification, Interpol, sexual offender registries, and warrants searchers—they
are all there to be ordered, with a la carte pricing. Before we moved from
paper to bits, this information was publicly available, but largely inaccessi-
ble. Now, all it takes is an Internet connection and a credit card. This is one
44 BLOWN TO BITS
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 44
of the most important privacy transformations. Information that was previ-
ously available only to professionals with specialized access or a legion of
local workers is now available to everyone.
Then there is real spying. Beverly O’Brien suspected her husband was hav-
ing an affair. If not a physical one, at a minimum she thought he was engag-
ing in inappropriate behavior online. So, she installed some monitoring
software. Not hard to do on the family computer, these packages are pro-
moted as “parental control software”—a tool to monitor your child’s activi-
ties, along with such other uses as employee monitoring, law enforcement,
and to “catch a cheating spouse.” Beverly installed the software, and discov-
ered that her hapless hubby, Kevin, was chatting away while playing Yahoo!
Dominoes. She was an instant spy, a domestic wire-tapper. The marketing
materials for her software neglected to tell her that installing spyware that
intercepts communications traffic was a direct violation of Florida’s Security
of Communications Act, and the trial court refused to admit any of the evi-
dence in their divorce proceeding. The legal system worked, but that didn’t
change the fact that spying has become a relatively commonplace activity,
the domain of spouses and employers, jilted lovers, and business competitors.
Idle Curiosity
There is another form of Little Brother-ism, where amateurs can sit at a com-
puter connected to the Internet and just look for something interesting—not
about their neighbors or husbands, but about anyone at all. With so much
data out there, anyone can discover interesting personal facts, with the
CHAPTER 2 NAKED IN THE SUNLIGHT 45
PERSONAL COMPUTER MONITORING SOFTWARE
PC Pandora (www.pcpandora.com) enables you to “know everything they do
on your PC,” such as “using secret email accounts, chatting with unknown
friends, accessing secret dating profiles or even your private records.” Using it,
you can “find out about secret email accounts, chat partners, dating site
memberships, and more.”
Actual Spy (www.actualspy.com) is a “keylogger which allows you to find
out what other users do on your computer in your absence. It is designed
for the hidden computer monitoring and the monitoring of the computer
activity. Keylogger Actual Spy is capable of catching all keystrokes, captur-
ing the screen, logging the programs being run and closed, monitoring the
clipboard contents.”
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 45
investment of a little time and a little imagination. To take a different kind of
example, imagine having your family’s medical history re-identified from a
paper in an online medical journal.
Figure 2.4 shows a map of the incidence of a disease, let’s say syphilis, in
a part of Boston. The “syphilis epidemic” in this illustration is actually a sim-
ulation. The data was just made up, but maps exactly like this have been
common in journals for decades. Because the area depicted is more than 10
square kilometers, there is no way to figure out which house corresponds to
a dot, only which neighborhood.
46 BLOWN TO BITS
Source: John S. Brownstein, Christopher A. Cassa, Kenneth D. Mandl, No place to hide—reverse
identification of patients from published maps, New England Journal of Medicine, 355:16,
October 19, 2007, 1741-1742.
FIGURE 2.4 Map of part of Boston as from a publication in a medical journal,
showing where a disease has occurred. (Simulated data.)
At least that was true in the days when journals were only print docu-
ments. Now journals are available online, and authors have to submit their
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 46
figures as high-resolution JPEGs. Figure 2.5 shows what happens if you
download the published journal article from the journal’s web site, blow up a
small part of the image, and superimpose it on an easily available map of the
corresponding city blocks. For each of the seven disease locations, there is
only a single house to which it could correspond. Anyone could figure out
where the people with syphilis live.
CHAPTER 2 NAKED IN THE SUNLIGHT 47
Source: John S. Brownstein, Christopher A. Cassa, Kenneth D. Mandl, No place to hide—reverse
identification of patients from published maps, New England Journal of Medicine, 355:16, October 19,
2007, 1741-1742.
FIGURE 2.5 Enlargement of Figure 2.4 superimposed on a housing map of a few
blocks of the city, showing that individual households can be identified to online
readers, who have access to the high-resolution version of the epidemiology map.
This is a re-identification problem, like the one Latanya Sweeney noted
when she showed how to get Governor Weld’s medical records. There are
things that can be done to solve this one. Perhaps the journal should not use
such high-resolution images (although that could cause a loss of crispness, or
even visibility—one of the nice things about online journals is that the visually
impaired can magnify them, to produce crisp images at a very large scale).
Perhaps the data should be “jittered” or “blurred” so what appears on the screen
for illustrative purposes is intentionally incorrect in its fine details. There are
always specific policy responses to specific re-identification scenarios.
Every scenario is a little different, however, and it is often hard to articu-
late sensible principles to describe what should be fixed.
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 47
In 2001, four MIT students attempted to re-identify Chicago homicide vic-
tims for a course project. They had extremely limited resources: no propri-
etary databases such as the companies that check credit ratings possess, no
access to government data, and very limited computing power. Yet they were
able to identify nearly 8,000 individuals from a target set of 11,000.
The source of the data was a free download from the Illinois Criminal
Justice Authority. The primary reference data source was also free. The Social
Security Administration provides a comprehensive death index including
name, birth date, Social Security Number, zip code of last residence, date of
death, and more. Rather than paying the nominal fee for the data (after all,
they were students), these researchers used one of the popular genealogy web
sites, RootsWeb.com, as a free source for the Social Security Death Index
(SSDI) data. They might also have used municipal birth and death records,
which are also publicly available.
The SSDI did not include gender, which was important to completing an
accurate match. But more public records came to the rescue. They found a
database published by the census bureau that enabled them to infer gender
from first names—most people named “Robert” are male, and most named
“Susan” are female. That, and some clever data manipulation, was all it took.
It is far from clear that it was wrong for any particular part of these data sets
to be publicly available, but the combination revealed more than was
intended.
The more re-identification problems we see, and the more ad hoc solutions
we develop, the more we develop a deep-set fear that our problems may never
end. These problems arise because there is a great deal of public data, no one
piece of which is problematic, but which creates privacy violations in combi-
nation. It is the opposite of what we know about salt—that the component ele-
ments, sodium and chlorine, are both toxic, but the compound itself is safe.
Here we have toxic compounds arising from the clever combination of harm-
less components. What can possibly be done about that?
Big Brother, Abroad and in the U.S.
Big Brother really is watching today, and his job has gotten much easier
because of the digital explosion. In China, which has a long history of track-
ing individuals as a mechanism of social control, the millions of residents of
Shenzhen are being issued identity cards, which record far more than the
bearer’s name and address. According to a report in the New York Times, the
cards will document the individual’s work history, educational background,
48 BLOWN TO BITS
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 48
religion, ethnicity, police record, medical insurance status, landlord’s phone
number, and reproductive history. Touted as a crime-fighting measure, the new
technology—developed by an American company—will come in handy in case
of street protests or any individual activity deemed suspicious by the author-
ities. The sort of record-keeping that used to be the responsibility of local
authorities is becoming automated and nationalized as the country prospers
and its citizens become increasingly mobile. The technology makes it easier to
know where everyone is, and the government is taking advantage of that
opportunity. Chinese tracking is far more detailed and pervasive than Britain’s
ubiquitous surveillance cameras.
You Pay for the Mike, We’ll Just Listen In
Planting tiny microphones where they might pick up conversations of under-
world figures used to be risky work for federal authorities. There are much
safer alternatives now that many people carry their own radio-equipped
microphones with them all the time.
Many cell phones can be reprogrammed remotely so that the microphone
is always on and the phone is transmitting, even if you think you have pow-
ered it off. The FBI used this technique in 2004 to listen to John Tomero’s con-
versations with other members of his organized crime family. A federal court
ruled that this “roving bug,” installed after due authorization, constituted a
legal from of wiretapping. Tomero could have prevented it by removing the
battery, and now some nervous business executives routinely do exactly that.
The microphone in a General Motors car equipped with the OnStar system
can also be activated remotely, a feature that can save lives when OnStar
operators contact the driver after receiving a crash signal. OnStar warns,
“OnStar will cooperate with official court orders regarding criminal investi-
gations from law enforcement and other agencies,” and indeed, the FBI has
used this method to eavesdrop on conversations held inside cars. In one case,
a federal court ruled against this way of collecting evidence—but not on pri-
vacy grounds. The roving bug disabled the normal operation of OnStar, and
the court simply thought that the FBI had interfered with the vehicle owner’s
contractual right to chat with the OnStar operators!
Identifying Citizens—Without ID Cards
In the age of global terrorism, democratic nations are resorting to digital sur-
veillance to protect themselves, creating hotly contested conflicts with tradi-
tions of individual liberty. In the United States, the idea of a national
CHAPTER 2 NAKED IN THE SUNLIGHT 49
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 49
identification card causes a furious libertarian reaction from parties not usu-
ally outspoken in defense of individual freedom. Under the REAL ID act of
2005, uniform federal standards are being implemented for state-issued
drivers’ licenses. Although it passed through Congress without debate, the law
is opposed by at least 18 states. Resistance pushed back the implementation
timetable first to 2009, and then, in early 2008, to 2011. Yet even fully imple-
mented, REAL ID would fall far short of the true national ID preferred by
those charged with fighting crime and preventing terrorism.
As the national ID card debate continues in the U.S., the FBI is making it
irrelevant by exploiting emerging technologies. There would be no need for
anyone to carry an ID card if the govern-
ment had enough biometric data on
Americans—that is, detailed records of
their fingerprints, irises, voices, walking
gaits, facial features, scars, and the shape
of their earlobes. Gather a combination of
measurements on individuals walking in
public places, consult the databases, connect the dots, and—bingo!—their
names pop up on the computer screen. No need for them to carry ID cards;
the combination of biometric data would pin them down perfectly.
Well, only imperfectly at this point, but the technology is improving. And
the data is already being gathered and deposited in the data vault of the FBI’s
Criminal Justice Information Services database in Clarksburg, West Virginia.
The database already holds some 55 million sets of fingerprints, and the FBI
processes 100,000 requests for matches every day. Any of 900,000 federal,
state, and local law enforcement officers can send a set of prints and ask the
FBI to identify it. If a match comes up, the individual’s criminal history is
there in the database too.
But fingerprint data is hard to gather; mostly it is obtained when people
are arrested. The goal of the project is to get identifying information on
nearly everyone, and to get it without bothering people too much. For exam-
ple, a simple notice at airport security could advise travelers that, as they pass
through airport security, a detailed “snapshot” will be taken as they enter the
secure area. The traveler would then know what is happening, and could have
refused (and stayed home). As an electronic identification researcher puts it,
“That’s the key. You’ve chosen it. You have chosen to say, ‘Yeah, I want this
place to recognize me.’” No REAL ID controversies, goes the theory; all the
data being gathered would, in some sense at least, be offered voluntarily.
50 BLOWN TO BITS
As the national ID card
debate continues in the
U.S., the FBI is making it
irrelevant by exploiting
emerging technologies.
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 50
Friendly Cooperation Between Big Siblings
In fact, there are two Big Brothers, who often work together. And we are, by
and large, glad they are watching, if we are aware of it at all. Only occasion-
ally are we alarmed about their partnership.
The first Big Brother is Orwell’s—the government. And the other Big
Brother is the industry about which most of us know very little: the business
of aggregating, consolidating, analyzing, and reporting on the billions of
individual transactions, financial and otherwise, that take place electronically
every day. Of course, the commercial data aggregation companies are not in
the spying business; none of their data reaches them illicitly. But they do
know a lot about us, and what they know can be extremely valuable, both to
businesses and to the government.
The new threat to privacy is that computers can extract significant infor-
mation from billions of apparently uninteresting pieces of data, in the way
that mining technology has made it economically feasible to extract precious
metals from low-grade ore. Computers can correlate databases on a massive
level, linking governmental data sources together with private and commer-
cial ones, creating comprehensive digital dossiers on millions of people. With
their massive data storage and processing power, they can make connections
in the data, like the clever connections the MIT students made with the
Chicago homicide data, but using brute force rather than ingenuity. And the
computers can discern even very faint traces in the data—traces that may help
track payments to terrorists, set our insurance rates, or simply help us be sure
that our new babysitter is not a sex offender.
And so we turn to the story of the government and the aggregators.
Acxiom is the country’s biggest customer data company. Its business is to
aggregate transaction data from all those swipes of cards in card readers all
over the world—in 2004, this amounted to more than a billion transactions a
day. The company uses its massive data about financial activity to support
the credit card industry, banks, insurers, and other consumers of information
about how people spend money. Unsurprisingly, after the War on Terror
began, the Pentagon also got interested in Acxiom’s data and the ways they
gather and analyze it. Tracking how money gets to terrorists might help find
the terrorists and prevent some of their attacks.
ChoicePoint is the other major U.S. data aggregator. ChoicePoint has more
than 100,000 clients, which call on it for help in screening employment can-
didates, for example, or determining whether individuals are good insurance
risks.
Acxiom and ChoicePoint are different from older data analysis operations,
simply because of the scale of their operations. Quantitative differences have
CHAPTER 2 NAKED IN THE SUNLIGHT 51
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 51
qualitative effects, as we said in Chapter 1; what has changed is not the tech-
nology, but rather the existence of rich data sources. Thirty years ago, credit
cards had no magnetic stripes. Charging a purchase was a mechanical oper-
ation; the raised numerals on the card made an impression through carbon
paper so you could have a receipt, while the top copy went to the company
that issued the card. Today, if you charge something using your CapitalOne
card, the bits go instantly not only to CapitalOne, but to Acxiom or other
aggregators. The ability to search through huge commercial data sources—
including not just credit card transaction data, but phone call records, travel
tickets, and banking transactions, for example—is another illustration that
more of the same can create something new.
Privacy laws do exist, of course. For a bank, or a data aggregator, to post
your financial data on its web site would be illegal. Yet privacy is still devel-
oping as an area of the law, and it is connected to commercial and govern-
ment interests in uncertain and surprising ways.
A critical development in privacy law was precipitated by the presidency
of Richard Nixon. In what is generally agreed to be an egregious abuse of
presidential power, Nixon used his authority as president to gather informa-
tion on those who opposed him—in the words of his White House Counsel at
the time, to “use the available federal machinery to screw our political ene-
mies.” Among the tactics Nixon used was to have the Internal Revenue
Service audit the tax returns of individuals on an “enemies list,” which
included congressmen, journalists, and major contributors to Democratic
causes. Outrageous as it was to use the IRS for this purpose, it was not ille-
gal, so Congress moved to ban it in the future.
The Privacy Act of 1974 established broad guidelines for when and how
the Federal Government can assemble dossiers on citizens it is not investigat-
ing for crimes. The government has to give public notice about what infor-
mation it wants to collect and why, and it has to use it only for those reasons.
The Privacy Act limits what the government can do to gather information
about individuals and what it can do with records it holds. Specifically, it
states, “No agency shall disclose any record which is contained in a system
of records by any means of communication to any person, or to another
agency, except pursuant to a written request by, or with the prior written con-
sent of, the individual to whom the record pertains, unless ….” If the govern-
ment releases information inappropriately, even to another government
agency, the affected citizen can sue for damages in civil court. The protec-
tions provided by the Privacy Act are sweeping, although not as sweeping as
they may seem. Not every government office is in an “agency”; the courts are
not, for example. The Act requires agencies to give public notice of the uses
to which they will put the information, but the notice can be buried in the
52 BLOWN TO BITS
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 52
Federal Register where the public probably won’t see it unless news media
happen to report it. Then there is the “unless” clause, which includes signifi-
cant exclusions. For example, the law does not apply to disclosures for
statistical, archival, or historical purposes, civil or criminal law enforcement
activities, Congressional investigations, or valid Freedom of Information Act
requests.
In spite of its exclusions, government practices changed significantly
because of this law. Then, a quarter century later, came 9/11. Law enforcement
should have seen it all coming, was the constant refrain as investigations
revealed how many unconnected dots were in the hands of different govern-
ment agencies. It all could have been prevented if the investigative fiefdoms
had been talking to each other. They should have been able to connect the dots.
But they could not—in part because the Privacy Act restricted inter-agency
data transfers. A response was badly needed. The Department of Homeland
Security was created to ease some of the interagency communication prob-
lems, but that government reorganization was only a start.
In January 2002, just a few months after the World Trade Center attack,
the Defense Advanced Research Projects Agency (DARPA) established the
Information Awareness Office (IAO) with a mission to:
imagine, develop, apply, integrate, demonstrate, and transition infor-
mation technologies, components and prototype, closed-loop, infor-
mation systems that will counter asymmetric threats by achieving
total information awareness useful for preemption; national security
warning; and national security decision making. The most serious
asymmetric threat facing the United States is terrorism, a threat char-
acterized by collections of people loosely organized in shadowy net-
works that are difficult to identify and define. IAO plans to develop
technology that will allow understanding of the intent of these net-
works, their plans, and potentially define opportunities for disrupting
or eliminating the threats. To effectively and efficiently carry this out,
we must promote sharing, collaborating, and reasoning to convert
nebulous data to knowledge and actionable options.
Vice Admiral John Poindexter directed the effort that came to be known as
“Total Information Awareness” (TIA). The growth of enormous private data
repositories provided a convenient way to avoid many of the prohibitions of
the Privacy Act. The Department of Defense can’t get data from the Internal
Revenue Service, because of the 1974 Privacy Act. But they can both buy it
from private data aggregators! In a May 2002 email to Adm. Poindexter, Lt.
Col Doug Dyer discussed negotiations with Acxiom.
CHAPTER 2 NAKED IN THE SUNLIGHT 53
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 53
Acxiom’s Jennifer Barrett is a lawyer and chief privacy officer. She’s
testified before Congress and offered to provide help. One of the key
suggestions she made is that people will object to Big Brother, wide-
coverage databases, but they don’t object to use of relevant data for
specific purposes that we can all agree on. Rather than getting all the
data for any purpose, we should start with the goal, tracking terrorists
to avoid attacks, and then identify the data needed (although we can’t
define all of this, we can say that our templates and models of terror-
ists are good places to start). Already, this guidance has shaped my
thinking.
Ultimately, the U.S. may need huge databases of commercial transac-
tions that cover the world or certain areas outside the U.S. This infor-
mation provides economic utility, and thus provides two reasons why
foreign countries would be interested. Acxiom could build this mega-
scale database.
The New York Times broke the story in October 2002. As Poindexter had
explained in speeches, the government had to “break down the stovepipes”
separating agencies, and get more sophisticated about how to create a big
picture out of a million details, no one of which might be meaningful in itself.
The Times story set off a sequence of reactions from the Electronic Privacy
Information Center and civil libertarians. Congress defunded the office in
2003. Yet that was not the end of the idea.
The key to TIA was data mining, looking for connections across disparate
data repositories, finding patterns, or “signatures,” that might identify terror-
ists or other undesirables. The General Accountability Office report on Data
Mining (GAO-04-548) reported on their survey of 128 federal departments.
They described 199 separate data mining efforts, of which 122 used personal
information.
Although IAO and TIA went away, Project ADVISE at the Department of
Homeland Security continued with large-scale profiling system development.
Eventually, Congress demanded that the privacy issues concerning this pro-
gram be reviewed as well. In his June 2007 report (OIG-07-56), Richard
Skinner, the DHS Inspector General, stated that “program managers did not
address privacy impacts before implementing three pilot initiatives,” and a
few weeks later, the project was shut down. But ADVISE was only one of
twelve data-mining projects going on in DHS at the time.
Similar privacy concerns led to the cancellation of the Pentagon’s TALON
database project. That project sought to compile a database of reports of
54 BLOWN TO BITS
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 54
suspected threats to defense facilities as part of a larger program of domestic
counterintelligence.
The Transportation Security Administration (TSA) is responsible for airline
passenger screening. One proposed system, CAPPS II, which was ultimately
terminated over privacy concerns, sought to bring together disparate data
sources to determine whether a particular individual might pose a transporta-
tion threat. Color-coded assessment tags would determine whether you could
board quickly, be subject to further screening, or denied access to air travel.
The government creates projects, the media and civil liberties groups raise
serious privacy concerns, the projects are cancelled, and new ones arise to
take their place. The cycle seems to be endless. In spite of Americans’ tradi-
tional suspicions about government surveillance of their private lives, the
cycle seems to be almost an inevitable consequence of Americans’ concerns
about their security, and the responsibility that government officials feel to
use the best available technologies to protect the nation. Corporate databases
often contain the best information on the people about whom the govern-
ment is curious.
Technology Change and Lifestyle Change
New technologies enable new kinds of social interactions. There were no sub-
urban shopping malls before private automobiles became cheap and widely
used. Thirty years ago, many people getting off an airplane reached for cig-
arettes; today, they reach for cell phones. As Heraclitus is reported to have
said 2,500 years ago, “all is flux”—everything keeps changing. The reach-for-
your-cell phone gesture may not last much longer, since airlines are starting
to provide onboard cell phone coverage.
The more people use a new technology, the more useful it becomes. (This
is called a “network effect”; see Chapter 4, “Needles in the Haystack.”) When
one of us got the email address lewis@harvard as a second-year graduate
student, it was a vainglorious joke; all the people he knew who had email
addresses were students in the same office with him. Email culture could not
develop until a lot of people had email, but there wasn’t much point in hav-
ing email if no one else did.
Technology changes and social changes reinforce each other. Another way
of looking at the technological reasons for our privacy loss is to recognize that
the social institutions enabled by the technology are now more important than
the practical uses for which the technology was originally conceived. Once a
lifestyle change catches on, we don’t even think about what it depends on.
CHAPTER 2 NAKED IN THE SUNLIGHT 55
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 55
Credit Card Culture
The usefulness of the data aggregated by Acxiom and its kindred data aggre-
gation services rises as the number of people in their databases goes up, and
as larger parts of their lives leave traces in those databases. When credit cards
were mostly short-term loans taken out for large purchases, the credit card
data was mostly useful for determining your creditworthiness. It is still use-
ful for that, but now that many people buy virtually everything with credit
cards, from new cars to fast-food hamburgers, the credit card transaction
database can be mined for a detailed image of our lifestyles. The information
is there, for example, to determine if you usually eat dinner out, how much
traveling you do, and how much liquor you tend to consume. Credit card
companies do in fact analyze this sort of information, and we are glad they
do. If you don’t seem to have been outside Montana in your entire life and
you turn up buying a diamond bracelet in Rio de Janeiro, the credit card com-
pany’s computer notices the deviation from the norm, and someone may call
to be sure it is really you.
The credit card culture is an economic problem for many Americans, who
accept more credit card offers than they need, and accumulate more debt than
they should. But it is hard to imagine the end of the little plastic cards, unless
even smaller RFID tags replace them. Many people carry almost no cash
today, and with every easy swipe, a few more bits go into the databases.
Email Culture
Email is culturally in between telephoning and writing a letter. It is quick, like
telephoning (and instant messaging is even quicker). It is permanent, like a
letter. And like a letter, it waits for the recipient to read it. Email has, to a
great extent, replaced both of the other media for person-to-person commu-
nication, because it has advantages of both. But it has the problems that other
communication methods have, and some new ones of its own.
Phone calls are not intended to last forever, or to be copied and redistrib-
uted to dozens of other people, or to turn up in court cases. When we use
email as though it were a telephone, we tend to forget about what else might
happen to it, other than the telephone-style use, that the recipient will read it
and throw it away. Even Bill Gates probably wishes that he had written his
corporate emails in a less telephonic voice. After testifying in an antitrust
lawsuit that he had not contemplated cutting a deal to divide the web browser
market with a competitor, the government produced a candid email he had
sent, seeming to contradict his denial: “We could even pay them money as
part of the deal, buying a piece of them or something.”
56 BLOWN TO BITS
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 56
Email is bits, traveling within an ISP and
through the Internet, using email software that
may keep copies, filter it for spam, or submit it
to any other form of inspection the ISP may
choose. If your email service provider is Google,
the point of the inspection is to attach some
appropriate advertising. If you are working within a financial services corpo-
ration, your emails are probably logged—even the ones to your grandmother—
because the company has to be able to go back and do a thorough audit if
something inappropriate happens.
Email is as public as postcards, unless it is encrypted, which it usually is
not. Employers typically reserve the right to read what is sent through com-
pany email. Check the policy of your own employer; it may be hard to find,
and it may not say what you expect. Here is Harvard’s policy, for example:
Employees must have no expectation or right of privacy in anything
they create, store, send, or receive on Harvard’s computers, networks,
or telecommunications systems. …. Electronic files, e-mail, data files,
images, software, and voice mail may be accessed at any time by
management or by other authorized personnel for any business pur-
pose. Access may be requested and arranged through the system(s)
user, however, this is not required.
Employers have good reason to retain such sweeping rights; they have to be
able to investigate wrongdoing for which the employer would be liable. As a
result, such policies are often less important than the good judgment and
ethics of those who administer them. Happily, Harvard’s are generally good.
But as a general principle, the more people who have the authority to snoop,
the more likely it is that someone will succumb to the temptation.
Commercial email sites can retain copies of messages even after they have
been deleted. And yet, there is very broad acceptance of public, free, email ser-
vices such as Google’s Gmail, Yahoo! Mail, or Microsoft’s Hotmail. The tech-
nology is readily available to make email private: whether you use encryption
tools, or secure email services such as Hushmail, a free, web-based email ser-
vice that incorporates PGP-based encryption (see Chapter 5). The usage of
these services, though, is an insignificant fraction of their unencrypted coun-
terparts. Google gives us free, reliable email service and we, in return, give up
some space on our computer screen for ads. Convenience and cost trump pri-
vacy. By and large, users don’t worry that Google, or its competitors, have all
their mail. It’s a bit like letting the post office keep a copy of every letter you
send, but we are so used to it, we don’t even think about it.
CHAPTER 2 NAKED IN THE SUNLIGHT 57
Email is as public as
postcards, unless it is
encrypted, which it
usually is not.
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 57
Web Culture
When we send an email, we think at least a little bit about the impression we
are making, because we are sending it to a human being. We may well say
things we would not say face-to-face, and live to regret that. Because we
can’t see anyone’s eyes or hear anyone’s voice, we are more likely to over-
react and be hurtful, angry, or just too smart for our own good. But because
email is directed, we don’t send email thinking that no one else will ever read
what we say.
The Web is different. Its social sites inherit their communication culture
not from the letter or telephone call, but from the wall in the public square,
littered with broadsides and scribbled notes, some of them signed and some
not. Type a comment on a blog, or post a photo on a photo album, and your
action can be as anonymous as you wish it to be—you do not know to whom
your message is going. YouTube has millions of personal videos. Photo-
archiving sites are the shoeboxes and photo albums of the twenty-first cen-
tury. Online backup now provides easy access to permanent storage for the
contents of our personal computers. We entrust commercial entities with
much of our most private information, without apparent concern. The gener-
ation that has grown up with the Web has embraced social networking in all
its varied forms: MySpace, YouTube, LiveJournal, Facebook, Xanga,
Classmates.com, Flickr, dozens more, and blogs of every shape and size. More
than being taken, personal privacy has been given away quite freely, because
everyone else is doing it—the surrender of privacy is more than a way to
social connectedness, it is a social institution in its own right. There are 70
million bloggers sharing everything from mindless blather to intimate per-
sonal details. Sites like www.loopt.com let you find your friends, while
twitter.com lets you tell the entire world where you are and what you are
doing. The Web is a confused, disorganized, chaotic realm, rich in both gold
and garbage.
The “old” web, “Web 1.0,” as we now refer to it, was just an information
resource. You asked to see something, and you got to see it. Part of the dis-
inhibition that happens on the new “Web 2.0” social networking sites is due
to the fact that they still allow the movie-screen illusion—that we are “just
looking,” or if we are contributing, we are not leaving footprints or finger-
prints if we use pseudonyms. (See Chapter 4 for more on Web 1.0 and
Web 2.0.)
But of course, that is not really the way the Web ever worked. It is impor-
tant to remember that even Web 1.0 was never anonymous, and even “just
looking” leaves fingerprints.
58 BLOWN TO BITS
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 58
In July 2006, a New York Times reporter called Thelma Arnold of Lilburn,
Georgia. Thelma wasn’t expecting the call. She wasn’t famous, nor was she
involved in anything particularly noteworthy. She enjoyed her hobbies,
helped her friends, and from time to time looked up things on the Web—stuff
about her dogs, and her friends’ ailments.
Then AOL, the search engine she used, decided to release some “anony-
mous” query data. Thelma, like most Internet users, may not have known that
AOL had kept every single topic that she, and every other one of their users,
had asked about. But it did. In a moment of unenlightened generosity, AOL
released for research use a small sample: about 20 million queries from
658,000 different users. That is actually not a lot of data by today’s standards.
For example, in July 2007, there were about 5.6 billion search engine queries,
of which roughly 340 million were AOL queries. So, 20 million queries com-
prise only a couple of days’ worth of search queries. In an effort to protect
their clients’ privacy, AOL “de-identified” the queries. AOL never mentioned
anyone by name; they used random numbers instead. Thelma was 4417149.
AOL mistakenly presumed that removing a single piece of personal identifi-
cation would make it hard to figure out who the users were. It turned out that
for some of the users, it wasn’t hard at all.
It didn’t take much effort to match Thelma with her queries. She had
searched for “landscapers in Lilburn, GA” and several people with the last
name “Arnold,” leading to the obvious question of whether there were any
Arnolds in Lilburn. Many of Thelma’s queries were not particularly useful for
identifying her, but were revealing nonetheless: “dry mouth,” “thyroid,” “dogs
that urinate on everything,” and “swing sets.”
Thelma was not the only person to be identified. User 22690686 (Terri)
likes astrology, and the Edison National Bank, Primerica, and Budweiser.
5779844 (Lawanna) was interested in credit reports, and schools. From what
he searched for, user 356693 seems to have been an aide to Chris Shays,
Congressman from Connecticut.
One of the privacy challenges that we confront as we rummage through
the rubble of the digital explosion is that information exists without context.
Was Thelma Arnold suffering from a wide range of ailments? One might read-
ily conclude that from her searches. The fact is that she often tried to help her
friends by understanding their medical problems.
Or consider AOL user 17556639, whose search history was released along
with Thelma Arnold’s. He searched for the following:
CHAPTER 2 NAKED IN THE SUNLIGHT 59
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 59
how to kill your wife 23 Mar, 22:09
wife killer 23 Mar, 22:11
poop 23 Mar, 22:12
dead people 23 Mar, 22:13
pictures of dead people 23 Mar, 22:15
killed people 23 Mar, 22:16
dead pictures 23 Mar, 22:17
murder photo 23 Mar, 22:20
steak and cheese 23 Mar, 22:22
photo of death 23 Mar, 22:30
death 23 Mar, 22:33
dead people photos 23 Mar, 22:33
photo of dead people 23 Mar, 22:35
www.murderdpeople.com 23 Mar, 22:37
decapitated photos 23 Mar, 22:39
car crashes3 23 Mar, 22:40
car crash photo 23 Mar, 22:41
Is this AOL user a potential criminal? Should AOL have called the police? Is
17556639 about to kill his wife? Is he (or she) a researcher with a spelling
problem and an interest in Philly cheese steak? Is reporting him to the police
doing a public service, or is it an invasion of privacy?
There is no way to tell just from these queries if this user was contemplat-
ing some heinous act or doing research for a novel that involves some grisly
scenes. When information is incomplete and decontextualized, it is hard to
judge meaning and intent.
In this particular case, we happen to know the answer. The user, Jason
from New Jersey, was just fooling around, trying to see if Big Brother was
watching. He wasn’t planning to kill his wife at all. Inference from incom-
plete data has the problem of false positives—thinking you have something
that you don’t, because there are other patterns that fit the same data.
Information without context often leads to erroneous conclusions. Because
our digital trails are so often retrieved outside the context within which they
were created, they sometimes suggest incorrect interpretations. Data interpre-
tation comes with balanced social responsibilities, to protect society when
there is evidence of criminal behavior or intent, and also to protect the indi-
vidual when such evidence is too limited to be reliable. Of course, for every
example of misleading and ambiguous data, someone will want to solve the
problems it creates by collecting more data, rather than less.
60 BLOWN TO BITS
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 60
Beyond Privacy
There is nothing new under the sun, and the struggles to define and enforce
privacy are no exception. Yet history shows that our concept of privacy has
evolved, and the law has evolved with it. With the digital explosion, we have
arrived at a moment where further evolution will have to take place rather
quickly.
Leave Me Alone
More than a century ago, two lawyers raised the alarm about the impact tech-
nology and the media were having on personal privacy:
Instantaneous photographs and newspaper enterprise have invaded
the sacred precincts of private and domestic life; and numerous
mechanical devices threaten to make good the prediction that “what is
whispered in the closet shall be proclaimed from the house-tops.”
This statement is from the seminal law review article on privacy, published in
1890 by Boston attorney Samuel Warren and his law partner, Louis Brandeis,
later to be a justice of the U.S. Supreme Court. Warren and Brandeis went on,
“Gossip is no longer the resource of the idle and of the vicious, but has
become a trade, which is pursued with industry as well as effrontery. To sat-
isfy a prurient taste the details of sexual relations are spread broadcast in the
columns of the daily papers. To occupy the indolent, column upon column is
filled with idle gossip, which can only be procured by intrusion upon the
domestic circle.” New technologies made this garbage easy to produce, and
then “the supply creates the demand.”
And those candid photographs and gossip columns were not merely taste-
less; they were bad. Sounding like modern critics of mindless reality TV,
Warren and Brandeis raged that society was going to hell in a handbasket
because of all that stuff that was being spread about.
Even gossip apparently harmless, when widely and persistently circu-
lated, is potent for evil. It both belittles and perverts. It belittles by
inverting the relative importance of things, thus dwarfing the
thoughts and aspirations of a people. When personal gossip attains
the dignity of print, and crowds the space available for matters of
CHAPTER 2 NAKED IN THE SUNLIGHT 61
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 61
real interest to the community, what wonder that the ignorant and
thoughtless mistake its relative importance. Easy of comprehension,
appealing to that weak side of human nature which is never wholly
cast down by the misfortunes and frailties of our neighbors, no one
can be surprised that it usurps the place of interest in brains capable
of other things. Triviality destroys at once robustness of thought and
delicacy of feeling. No enthusiasm can flourish, no generous impulse
can survive under its blighting influence.
The problem they perceived was that it was hard to say just why such inva-
sions of privacy should be unlawful. In individual cases, you could say some-
thing sensible, but the individual legal decisions were not part of a general
regime. The courts had certainly applied legal sanctions for defamation—
publishing malicious gossip that was false—but then what about malicious
gossip that was true? Other courts had imposed penalties for publishing an
individual’s private letters—but on the basis of property law, just as though
the individual’s horse had been stolen rather than the words in his letters.
That did not seem to be the right analogy either. No, they concluded, such
rationales didn’t get to the nub. When something private is published about
you, something has been taken from you, you are a victim of theft—but the
thing stolen from you is part of your identity as a person. In fact, privacy was
a right, they said, a “general right of the individual to be let alone.” That right
had long been in the background of court decisions, but the new technolo-
gies had brought this matter to a head. In articulating this new right, Warren
and Brandeis were, they asserted, grounding it in the principle of “inviolate
personhood,” the sanctity of individual identity.
Privacy and Freedom
The Warren-Brandeis articulation of privacy as a right to be left alone was
influential, but it was never really satisfactory. Throughout the twentieth cen-
tury, there were simply too many good reasons for not leaving people alone,
and too many ways in which people preferred not to be left alone. And in the
U.S., First Amendment rights stood in the way of privacy rights. As a general
rule, the government simply cannot stop me from saying anything. In partic-
ular, it usually cannot stop me from saying what I want about your private
affairs. Yet the Warren-Brandeis definition worked well enough for a long
time, because, as Robert Fano put it, “The pace of technological progress was
for a long time sufficiently slow as to enable society to learn pragmatically
how to exploit new technology and prevent its abuse, with society maintain-
ing its equilibrium most of the time.” By the late 1950s, the emerging
62 BLOWN TO BITS
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 62
electronic technologies, both computers and communication, had destroyed
that balance. Society could no longer adjust pragmatically, because surveil-
lance technologies were developing too quickly.
The result was a landmark study of privacy by the Association of the Bar
of the City of New York, which culminated in the publication, in 1967, of a
book by Alan Westin, entitled Privacy and Freedom. (Fano was reviewing
Westin’s book when he painted the picture of social disequilibrium caused by
rapid technological change.) Westin proposed a crucial shift of focus.
Brandeis and Warren had seen a loss of privacy as a form of personal
injury, which might be so severe as to cause “mental pain and distress, far
greater than could be inflicted by mere bodily injury.” Individuals had to take
responsibility for protecting themselves. “Each man is responsible for his own
acts and omissions only.” But the law had to provide the weapons with which
to resist invasions of privacy.
Westin recognized that the Brandeis-Warren formulation was too absolute,
in the face of the speech rights of other individuals and society’s legitimate
data-gathering practices. Protection might come not from protective shields,
but from control over the uses to which personal information could be put.
“Privacy,” wrote Westin, “is the claim of individuals, groups, or institutions
to determine for themselves when, how, and to what extent information
about them is communicated to others.”
… what is needed is a structured and rational weighing process, with
definite criteria that public and private authorities can apply in com-
paring the claim for disclosure or surveillance through new devices
with the claim to privacy. The following are suggested as the basic
steps of such a process: measuring the seriousness of the need to con-
duct surveillance; deciding whether there are alternative methods to
meet the need; deciding what degree of reliability will be required of
the surveillance instrument; determining whether true consent to sur-
veillance has been given; and measuring the capacity for limitation
and control of the surveillance if it is allowed.
So even if there were a legitimate reason why the government, or some other
party, might know something about you, your right to privacy might limit
what the knowing party could do with that information.
This more nuanced understanding of privacy emerged from the important
social roles that privacy plays. Privacy is not, as Warren and Brandeis had it,
the right to be isolated from society—privacy is a right that makes society
work. Fano mentioned three social roles of privacy. First, “the right to main-
tain the privacy of one’s personality can be regarded as part of the right of
CHAPTER 2 NAKED IN THE SUNLIGHT 63
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 63
self-preservation”—the right to keep your adolescent misjudgments and per-
sonal conflicts to yourself, as long as they are of no lasting significance to
your ultimate position in society. Second, privacy is the way society allows
deviations from prevailing social norms,
given that no one set of social norms is
universally and permanently satisfactory—
and indeed, given that social progress
requires social experimentation. And third,
privacy is essential to the development of
independent thought—it enables some
decoupling of the individual from society,
so that thoughts can be shared in limited
circles and rehearsed before public exposure.
Privacy and Freedom, and the rooms full of disk drives that sprouted in
government and corporate buildings in the 1960s, set off a round of soul-
searching about the operational significance of privacy rights. What, in prac-
tice, should those holding a big data bank think about when collecting the
data, handling it, and giving it to others?
Fair Information Practice Principles
In 1973, the Department of Health, Education, and Welfare issued “Fair
Information Practice Principles” (FIPP), as follows:
• Openness. There must be no personal data record-keeping systems
whose very existence is secret.
• Disclosure. There must be a way for a person to find out what infor-
mation about the person is in a record and how it is used.
• Secondary use. There must be a way for a person to prevent informa-
tion about the person that was obtained for one purpose from being
used or made available for other purposes without the person’s
consent.
• Correction. There must be a way for a person to correct or amend a
record of identifiable information about the person.
• Security. Any organization creating, maintaining, using, or dissemi-
nating records of identifiable personal data must assure the reliability
of the data for its intended use and must take precautions to prevent
misuses of the data.
64 BLOWN TO BITS
Privacy is the way society
allows deviations from
prevailing social norms,
given that social progress
requires social
experimentation.
02_0137135599_ch02.qxd 7/31/08 2:35 PM Page 64
These principles were proposed for U.S. medical data, but were never adopted.
Nevertheless, they have been the foundation for many corporate privacy poli-
cies. Variations on these principles have been codified in international trade
agreements by the Organization of Economic Cooperation and Development
(OECD) in 1980, and within the European Union (EU) in 1995. In the United
States, echoes of these principles can be found in some state laws, but federal
laws generally treat privacy on a case by case or “sectorial” basis. The 1974
Privacy Act applies to interagency data transfers within the federal govern-
ment, but places no limitations on data handling in the private sector. The
Fair Credit Reporting Act applies only to consumer credit data, but does not
apply to medical data. The Video Privacy Act applies only to videotape
rentals, but not to “On Demand” movie downloads, which did not exist when
the Act was passed! Finally, few federal or state laws apply to the huge data
banks in the file cabinets and computer systems of cities and towns.
American government is decentralized, and authority over government data
is decentralized as well.
The U.S. is not lacking in privacy laws. But privacy has been legislated
inconsistently and confusingly, and in terms dependent on technological
contingencies. There is no national consensus on what should be protected,
and how protections should be enforced. Without a more deeply informed
collective judgment on the benefits and costs of privacy, the current legisla-
tive hodgepodge may well get worse
in the United States.
The discrepancy between Ameri-
can and European data privacy stan-
dards threatened U.S. involvement in
international trade, because an EU
directive would prohibit data trans-
fers to nations, such as the U.S., that
do not meet the European “adequacy”
standard for privacy protection.
Although the U.S. sectorial approach
continues to fall short of European
requirements, in 2000 the European
Commission created a “safe harbor”
for American businesses with multi-
national operations. This allowed individual corporations to establish their
practices are adequate with respect to seven principles, covering notice, choice,
onward transfer, access, security, data integrity, and enforcement.
CHAPTER 2 NAKED IN THE SUNLIGHT 65
U.S. PRIVACY LAWS
The Council of Better Business
Bureaus has compiled a “Review of
Federal and State Privacy Laws”:
www.bbbonline.org/
UnderstandingPrivacy/library/
fed_statePrivLaws.pdf
The state of Texas has also com-
piled a succinct summary of major
privacy laws:
www.oag.state.tx.us/notice/
privacy_table.htm.
02_0137135599_ch02.qxd 7/31/08 2:35 PM Page 65
It is, unfortunately, too easy to debate whether the European omnibus
approach is more principled than the U.S. piecemeal approach, when the real
question is whether either approach accomplishes what we want it to achieve.
The Privacy Act of 1974 assured us that obscure statements would be buried
deep in the Federal Register, providing the required official notice about mas-
sive governmental data collection plans—better than nothing, but providing
“openness” only in a narrow and technical sense. Most large corporations
doing business with the public have privacy notices, and virtually no one
reads them. Only 0.3% of Yahoo! users read its privacy notice in 2002, for
example. In the midst of massive negative publicity that year when Yahoo!
changed its privacy policy to allow advertising messages, the number of users
who accessed the privacy policy rose only to 1%. None of the many U.S. pri-
vacy laws prevented the warrantless wiretapping program instituted by the
Bush administration, nor the cooperation with it by major U.S. telecommuni-
cations companies.
Indeed, cooperation between the federal government and private industry
seems more essential than ever for gathering information about drug traffick-
ing and international terrorism, because of yet another technological devel-
opment. Twenty years ago, most long-distance telephone calls spent at least
part of their time in the air, traveling by radio waves between microwave
antenna towers or between the ground and a communication satellite.
Government eavesdroppers could simply listen in (see the discussion of
Echelon in Chapter 5). Now many phone calls travel through fiber optic
cables instead, and the government is seeking the capacity to tap this pri-
vately owned infrastructure.
High privacy standards have a cost. They can limit the public usefulness
of data. Public alarm about the release of personal medical information has
led to major legislative remedies. The Health Information Portability and
Accountability Act (HIPAA) was intended both to encourage the use of elec-
tronic data interchange for health information, and to impose severe penal-
ties for the disclosure of “Protected Health Information,” a very broad
category including not just medical histories but, for example, medical pay-
ments. The bill mandates the removal of anything that could be used to
re-connect medical records to their source. HIPAA is fraught with problems
in an environment of ubiquitous data and powerful computing. Connecting
the dots by assembling disparate data sources makes it extremely difficult to
achieve the level of anonymity that HIPAA sought to guarantee. But help is
available, for a price, from a whole new industry of HIPAA-compliance advi-
sors. If you search for HIPAA online, you will likely see advertisements for
services that will help you protect your data, and also keep you out of jail.
66 BLOWN TO BITS
02_0137135599_ch02.qxd 7/31/08 2:35 PM Page 66
At the same time as HIPAA and other privacy laws have safeguarded our
personal information, they are making medical research costly and sometimes
impossible to conduct. It is likely that classic studies such as the Framingham
Heart Study, on which much public policy about heart disease was founded,
could not be repeated in today’s environment of strengthened privacy rules.
Dr. Roberta Ness, president of the American College of Epidemiology, reported
that “there is a perception that HIPAA may even be having a negative effect
on public health surveillance practices.”
The European reliance on the Fair Information Practice Principles is often
no more useful, in practice, than the American approach. Travel through
London, and you will see many signs saying “Warning: CCTV in use” to meet
the “Openness” requirement about the surveillance cameras. That kind of
notice throughout the city hardly empowers the individual. After all, even Big
Brother satisfied the FIPP Openness standard, with the ubiquitous notices that
he was watching! And the “Secondary Use” requirement, that European citi-
zens should be asked permission before data collected for one purpose is used
for another, is regularly ignored in some countries, although compliance
practices are a major administrative burden on European businesses and may
cause European businesses at least to pause and think before “repurposing”
data they have gathered. Sociologist Amitai Etzioni repeatedly asks European
CHAPTER 2 NAKED IN THE SUNLIGHT 67
EVER READ THOSE “I AGREE” DOCUMENTS?
Companies can do almost anything they want with your information, as long
as you agree. It seems hard to argue with that principle, but the deck can be
stacked against the consumer who is “agreeing” to the company’s terms. Sears
Holding Corporation (SHC), the parent of Sears, Roebuck and Kmart, gave
consumers an opportunity to join “My Sears Holding Community,” which the
company describes as “something new, something different … a dynamic and
highly interactive online community … where your voice is heard and your
opinion matters.” When you went online to sign up, the terms appeared in a
window on the screen.
The scroll box held only 10 lines of text, and the agreement was 54 boxfuls
long. Deep in the terms was a detail: You were allowing Sears to install soft-
ware on your PC that “monitors all of the Internet behavior that occurs on
the computer …, including … filling a shopping basket, completing an appli-
cation form, or checking your … personal financial or health information.”
So your computer might send your credit history and AIDS test results to
SHC, and you said it was fine!
02_0137135599_ch02.qxd 7/31/08 2:35 PM Page 67
audiences if they have ever been asked for permission to re-use data collected
about them, and has gotten only a single positive response—and that was
from a gentleman who had been asked by a U.S. company.
The five FIPP principles, and the spirit of transparency and personal con-
trol that lay behind them, have doubtless led to better privacy practices. But
they have been overwhelmed by the digital explosion, along with the insecu-
rity of the world and all the social and cultural changes that have occurred
in daily life. Fred H. Cate, a privacy scholar at the Indiana University, char-
acterizes the FIPP principles as almost a complete bust:
Modern privacy law is often expensive, bureaucratic, burdensome,
and offers surprisingly little protection for privacy. It has substituted
individual control of information, which it in fact rarely achieves, for
privacy protection. In a world rapidly becoming more global through
information technologies, multinational commerce, and rapid travel,
data protection laws have grown more fractured and protectionist.
Those laws have become unmoored from their principled basis, and
the principles on which they are based have become so varied and
procedural, that our continued intonation of the FIPPS mantra no
longer obscures the fact that this emperor indeed has few if any
clothes left.
Privacy as a Right to Control Information
It is time to admit that we don’t even really know what we want. The bits are
everywhere; there is simply no locking them down, and no one really wants
to do that anymore. The meaning of pri-
vacy has changed, and we do not have a
good way of describing it. It is not the right
to be left alone, because not even the most
extreme measures will disconnect our digi-
tal selves from the rest of the world. It is
not the right to keep our private informa-
tion to ourselves, because the billions of
atomic factoids don’t any more lend themselves into binary classification,
private or public.
Reade Seligmann would probably value his privacy more than most
Americans alive today. On Monday, April 17, 2006, Seligmann was indicted
in connection with allegations that a 27-year-old performer had been raped
at a party at a Duke fraternity house. He and several of his lacrosse team-
mates instantly became poster children for everything that is wrong with
68 BLOWN TO BITS
The bits are everywhere;
there is simply no locking
them down, and no one
really wants to do
that anymore.
02_0137135599_ch02.qxd 7/31/08 2:35 PM Page 68
American society—an example of national over-exposure that would leave
even Warren and Brandeis breathless if they were around to observe it.
Seligmann denied the charges, and at first it looked like a typical he-said,
she-said scenario, which could be judged only on credibility and presump-
tions about social stereotypes.
But during the evening of that fraternity party, Seligmann had left a trail
of digital detritus. His data trail indicated that he could not have been at the
party long enough, or at the right time, to have committed the alleged rape.
Time-stamped photos from the party showed that the alleged victim of his
rape was dancing at 12:02 AM. At 12:24 AM, he used his ATM card at a bank,
and the bank’s computers kept records of the event. Seligmann used his cell
phone at 12:25 AM, and the phone company tracked every call he made, just
as your phone company keeps a record of every call you make and receive.
Seligmann used his prox card to get into his dormitory room at 12:46 AM,
and the university’s computer kept track of his comings and goings, just as
other computers keep track of every card swipe or RFID wave you and I make
in our daily lives. Even during the ordinary movements of a college student
going to a fraternity party, every step along the way was captured in digital
detail. If Seligmann had gone to the extraordinary lengths necessary to avoid
leaving digital fingerprints—not using a modern camera, a cell phone, or a
bank, and living off campus to avoid electronic locks—his defense would have
lacked important exculpatory evidence.
Which would we prefer—the new world with digital fingerprints every-
where and the constant awareness that we are being tracked, or the old world
with few digital footprints and a stronger sense of security from prying eyes?
And what is the point of even asking the question, when the world cannot be
restored to its old information lock-down?
In a world that has moved beyond the old notion of privacy as a wall
around the individual, we could instead regulate those who would inappro-
priately use information about us. If I post a YouTube video of myself danc-
ing in the nude, I should expect to suffer some personal consequences.
Ultimately, as Warren and Brandeis said, individuals have to take responsibil-
ity for their actions. But society has drawn lines in the past around which
facts are relevant to certain decisions, and which are not. Perhaps, the border
of privacy having become so porous, the border of relevancy could be
stronger. As Daniel Weitzner explains:
New privacy laws should emphasize usage restrictions to guard
against unfair discrimination based on personal information, even if
it’s publicly available. For instance, a prospective employer might be
able to find a video of a job applicant entering an AIDS clinic or a
CHAPTER 2 NAKED IN THE SUNLIGHT 69
02_0137135599_ch02.qxd 7/31/08 2:35 PM Page 69
mosque. Although the individual might have already made such facts
public, new privacy protections would preclude the employer from
making a hiring decision based on that information and attach real
penalties for such abuse.
In the same vein, it is not intrinsically wrong that voting lists and political
contributions are a matter of public record. Arguably, they are essential to the
good functioning of the American democracy. Denying someone a promotion
because of his or her political inclinations would be wrong, at least for most
jobs. Perhaps a nuanced classification of the ways in which others are
allowed to use information about us would relieve some of our legitimate
fears about the effects of the digital explosion.
In The Transparent Society, David Brin wrote:
Transparency is not about eliminating privacy. It’s about giving us the
power to hold accountable those who would violate it. Privacy implies
serenity at home and the right to be let alone. It may be irksome how
much other people know about me, but I have no right to police their
minds. On the other hand I care very deeply about what others do to
me and to those I love. We all have a right to some place where we
can feel safe.
Despite the very best efforts, and the most sophisticated technologies, we can-
not control the spread of our private information. And we often want infor-
mation to be made public to serve our own, or society’s purposes.
Yet there can still be principles of accountability for the misuse of infor-
mation. Some ongoing research is outlining a possible new web technology,
which would help ensure that information is used appropriately even if it is
known. Perhaps automated classification and reasoning tools, developed to
help connect the dots in networked information systems, can be retargeted to
limit inappropriate use of networked information. A continuing border war is
likely to be waged, however, along an existing free speech front: the line sep-
arating my right to tell the truth about you from your right not to have that
information used against you. In the realm of privacy, the digital explosion
has left matters deeply unsettled.
70 BLOWN TO BITS
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 70
Always On
In 1984, the pervasive, intrusive technology could be turned off:
As O’Brien passed the telescreen a thought seemed to strike him. He
stopped, turned aside and pressed a switch on the wall. There was a
sharp snap. The voice had stopped.
Julia uttered a tiny sound, a sort of squeak of surprise. Even in the
midst of his panic, Winston was too much taken aback to be able to
hold his tongue.
“You can turn it off!” he said.
“Yes,” said O’Brien, “we can turn it off. We have that privilege. …Yes,
everything is turned off. We are alone.”
Sometimes we can still turn it off today, and should. But mostly we don’t
want to. We don’t want to be alone; we want to be connected. We find it con-
venient to leave it on, to leave our footprints and fingerprints everywhere, so
we will be recognized when we come back. We don’t want to have to keep
retyping our name and address when we return to a web site. We like it when
the restaurant remembers our name, perhaps because our phone number
showed up on caller ID and was linked to our record in their database. We
appreciate buying grapes for $1.95/lb instead of $3.49, just by letting the
store know that we bought them. We may want to leave it on for ourselves
because we know it is on for criminals. Being watched reminds us that they
are watched as well. Being watched also means we are being watched over.
And perhaps we don’t care that so much is known about us because that
is the way human society used to be—kinship groups and small settlements,
where knowing everything about everyone else was a matter of survival.
Having it on all the time may resonate with inborn preferences we acquired
millennia ago, before urban life made anonymity possible. Still today, privacy
means something very different in a small rural town than it does on the
Upper East Side of Manhattan.
We cannot know what the cost will be of having it on all the time. Just as
troubling as the threat of authoritarian measures to restrict personal liberty is
the threat of voluntary conformity. As Fano astutely observed, privacy allows
limited social experimentation—the deviations from social norms that are
much riskier to the individual in the glare of public exposure, but which can
be, and often have been in the past, the leading edges of progressive social
changes. With it always on, we may prefer not to try anything unconven-
tional, and stagnate socially by collective inaction.
CHAPTER 2 NAKED IN THE SUNLIGHT 71
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 71
For the most part, it is too late, realistically, ever to turn it off. We may
once have had the privilege of turning it off, but we have that privilege no
more. We have to solve our privacy problems another way.
The digital explosion is shattering old assumptions about who knows what.
Bits move quickly, cheaply, and in multiple perfect copies. Information that
used to be public in principle—for example, records in a courthouse, the price
you paid for your house, or stories in a small-town newspaper—is now
available to everyone in the world. Information that used to be private and
available to almost no one—medical records and personal snapshots, for
example—can become equally widespread through carelessness or malice. The
norms and business practices and laws of society have not caught up to the
change.
The oldest durable communication medium is the written document. Paper
documents have largely given way to electronic analogs, from which paper
copies are produced. But are electronic documents really like paper docu-
ments? Yes and no, and misunderstanding the document metaphor can be
costly. That is the story to which we now turn.
72 BLOWN TO BITS
02_0137135599_ch02.qxd 4/16/08 1:21 PM Page 72
CHAPTER 3
Ghosts in the Machine
Secrets and Surprises of Electronic
Documents
What You See Is Not What the Computer
Knows
On March 4, 2005, Italian journalist Giuliana Sgrena was released from cap-
tivity in Baghdad, where she had been held hostage for a month. As the car
conveying her to safety approached a checkpoint, it was struck with gunfire
from American soldiers. The shots wounded Sgrena and her driver and killed
an Italian intelligence agent, Nicola Calipari, who had helped engineer her
release.
A fierce dispute ensued about why U.S soldiers had rained gunfire on a car
carrying citizens of one of its Iraq war allies. The Americans claimed that the
car was speeding and did not slow when warned. The Italians denied both
claims. The issue caused diplomatic tension between the U.S. and Italy and
was a significant political problem for the Italian prime minister.
The U.S. produced a 42-page report on the incident, exonerating the U.S.
soldiers. The report enraged Italian officials. The Italians quickly released
their own report, which differed from the U.S. report in crucial details.
Because the U.S. report included sensitive military information, it was
heavily redacted before being shared outside military circles (see Figure 3.1).
In another time, passages would have been blacked out with a felt marker,
and the document would have been photocopied and given to reporters. But
in the information age, the document was redacted and distributed electron-
ically, not physically. The redacted report was posted on a web site the allies
used to provide war information to the media. In an instant, it was visible to
any of the world’s hundreds of millions of Internet users.
73
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 73
Source: http://www.corriere.it/Media/Documenti/Classified.pdf, extract from page 10.
FIGURE 3.1 Section from page 10 of redacted U.S. report on the death of Italian
journalist Nicola Calipari. Information that might have been useful to the enemy was
blacked out.
One of those Internet users was an Italian blogger, who scrutinized the U.S.
report and quickly recovered the redacted text using ordinary office software.
The blogger posted the full text of the report (see Figure 3.2) on his own web
site. The unredacted text disclosed positions of troops and equipment, rules
of engagement, procedures followed by allied troops, and other information
of interest to the enemy. The revelations were both dangerous to U.S. soldiers
and acutely embarrassing to the U.S. government, at a moment when tempers
were high among Italian and U.S. officials. In the middle of the most high-
tech war in history, how could this fiasco have happened?
74 BLOWN TO BITS
Source: http://www.corriere.it/Media/Documenti/Unclassified.doc.
FIGURE 3.2 The text of Figure 3.1 with the redaction bars electronically removed.
Paper documents and electronic documents are useful in many of the same
ways. Both can be inspected, copied, and stored. But they are not equally use-
ful for all purposes. Electronic documents are easier to change, but paper doc-
uments are easier to read in the bathtub. In fact, the metaphor of a series of
bits as a “document” can be taken only so far. When stretched beyond its
breaking point, the “document” metaphor can produce surprising and dam-
aging results—as happened with the Calipari report.
Office workers love “WYSIWYG” interfaces—“What You See Is What You
Get.” They edit the electronic document on the screen, and when they print
it, it looks just the same. They are deceived into thinking that what is in the
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 74
computer is a sort of miniaturized duplicate of the image on the screen,
instead of computer codes that produce the picture on the screen. In fact, the
WYSIWYG metaphor is imperfect, and therefore risky. The report on the death
of Nicola Calipari illustrates what can go wrong when users accept such a
metaphor too literally. What the authors of the document saw was dramati-
cally different from what they got.
The report had been prepared using software that creates PDF files. Such
software often includes a “Highlighter Tool,” meant to mimic the felt markers
that leave a pale mark on ordinary paper, through which the underlying text
is visible (see Figure 3.3). The software interface shows the tool’s icon as a
marker writing a yellow stripe, but the user can change the color of the stripe.
Probably someone tried to turn the Highlighter Tool into a redaction tool by
changing its color to black, unaware that what was visible on the screen was
not the same as the contents of the electronic document.
CHAPTER 3 GHOSTS IN THE MACHINE 75
Reprinted with permission from Adobe Systems Incorporated.
FIGURE 3.3 Adobe Acrobat Highlighter Tool, just above the middle. On the screen,
the “highlighter” is writing yellow ink, but with a menu command, it can be changed
to any other color.
The Italian blogger guessed that the black bars were nothing more than
overlays created using the Highlighter Tool, and that the ghostly traces of the
invisible words were still part of the electronic document that was posted on
the web. With that realization, he easily undid the black “highlighting” to
reveal the text beneath.
Just as disturbing as this mistake is the fact that two major newspapers had
quite publicly made the same mistake only a few years before. On April 16,
2000, the New York Times had detailed a secret CIA history of attempts by
the U.S. to overthrow Iran’s government in 1953. The newspaper reproduced
sections of the CIA report, with black redaction bars to obscure the names of
CIA operatives within Iran. The article was posted on the Web in mid-June,
2000, accompanied by PDFs of several pages of the CIA report. John Young,
who administers a web site devoted to publishing government-restricted doc-
uments, removed the redaction bars and revealed the names of CIA agents. A
controversy ensued about the ethics and legality of the disclosure, but the
names are still available on the Web as of this writing.
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 75
The Washington Post made exactly the same mistake in 2002, when it pub-
lished an article about a demand letter left by the Washington snipers, John
Allen Muhammad and John Lee Malvo. As posted on the Post’s web site,
certain information was redacted in a way that was easily reversed by an
inquisitive reader of the online edition of the paper (see Figure 3.4). The paper
fixed the problem quickly after its discovery, but not quickly enough to pre-
vent copies from being saved.
76 BLOWN TO BITS
Source: Washington Post web site, transferred to web.bham.ac.uk/forensic/news/02/sniper2.html.
Actual images taken from slide 29 of http://www.ccc.de/congress/2004/fahrplan/files/
316-hidden-data-slides.pdf.
FIGURE 3.4 Letter from the Washington snipers. On the left, the redacted letter as
posted on the Washington Post web site. On the right, the letter with the redaction
bars electronically removed.
What might have been done in these cases, instead of posting the PDF with
the redacted text hidden but discoverable? The Adobe Acrobat software has a
security feature, which uses encryption (discussed in Chapter 5, “Secret Bits”)
to make it impossible for documents to be altered by unauthorized persons,
while still enabling anyone to view them. Probably those who created these
documents did not know about this feature, or about commercially available
software called Redax, which government agencies use to redact text from
documents created by Adobe Acrobat.
A clumsier, but effective, option would be to scan the printed page, com-
plete with its redaction bars. The resulting file would record only a series of
black and white dots, losing all the underlying typographical structure—font
names and margins, for example. Whatever letters had once been “hidden”
under the redaction bars could certainly not be recovered, yet this solution
has an important disadvantage.
One of the merits of formatted text documents such as PDFs is that they
can be “read” by a computer. They can be searched, and the text they con-
tain can be copied. With the document reduced to a mass of black and white
dots, it could no longer be manipulated as text.
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 76
A more important capability would be lost as well. The report would be
unusable by programs that vocalize documents for visually impaired readers.
A blind reader could “read” the U.S. report on the Calipari incident, because
software is available that “speaks” the contents of PDF documents. A blind
reader would find a scanned version of the same document useless.
Tracking Changes—and Forgetting That They Are
Remembered
In October, 2005, UN prosecutor Detlev Mehlis released to the media a report
on the assassination of former Lebanese Prime Minister Rafik Hariri. Syria
had been suspected of engineering the killing, but Syrian President Bashar
al-Assad denied any involvement. The report was not final, Mehlis said, but
there was “evidence of both Lebanese and Syrian involvement.” Deleted, and
yet uncovered by the reporters who were given the document, was an incen-
diary claim: that Assad’s brother Maher, commander of the Republican Guard,
was personally involved in the assassination.
Microsoft Word offers a “Track Changes” option. If enabled, every change
made to the document is logged as part of the document itself—but ordinar-
ily not shown. The document bears its entire creation history: who made each
change, when, and what it was. Those editing the document can also add
comments—which would not appear in the final document, but may help edi-
tors explain their thinking to their colleagues as the document moves around
electronically within an office.
Of course, information about strategic planning is not meant for outsiders
to see, and in the case of legal documents, can have catastrophic conse-
quences if revealed. It is a simple matter to remove these notes about the doc-
ument’s history—but someone has to remember to do it! The UN prosecutor
neglected to remove the change history from his Microsoft Word document,
and a reporter discovered the deleted text (see Figure 3.5). (Of course, in
Middle Eastern affairs, one cannot be too suspicious. Some thought that
Mehlis had intentionally left the text in the document, as a warning to the
Syrians that he knew more than he was yet prepared to acknowledge.)
A particularly negligent example of document editing involved SCO
Corporation, which claimed that several corporations violated its intellectual
property rights. In early 2004, SCO filed suit in a Michigan court against
Daimler Chrysler, claiming Daimler had violated terms of its Unix software
agreement with SCO. But the electronic version of its complaint carried its
modification history with it, revealing a great deal of information about SCO’s
litigation planning. In particular, when the change history was revealed, it
CHAPTER 3 GHOSTS IN THE MACHINE 77
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 77
Source: Section of UN report, posted on Washington Post web site, www.washingtonpost.com/wp-srv/
world/syria/mehlis.report.doc.
FIGURE 3.5 Section from the UN report on the assassination of Rafik Hariri. An
earlier draft stated that Maher Assad and others were suspected of involvement in
the killing, but in the document as it was released, their names were replaced with
the phrase “senior Lebanese and Syrian officials.”
Saved Information About a Document
An electronic document (for exam-
ple, one produced by text-processing
software) often includes information
that is about the document—so-called
metadata. The most obvious example
is the name of the file itself. File
names carry few risks. For example,
when we send someone a file as an
email attachment, we realize that the
recipient is going to see the name of
the file as well as its contents.
But the file is often tagged with
much more information than just its
name. The metadata generally
includes the name associated with
the owner of the computer, and the
dates the file was created and last
modified—often useful information,
since the recipient can tell whether
she is receiving an older or newer
version than the version she already
turned out that until exactly 11:10 a.m. on February 18, 2004, SCO had instead
planned to sue a different company, Bank of America, in federal rather than
state court, for copyright infringement rather than breach of contract!
78 BLOWN TO BITS
FORGING METADATA
Metadata can help prove or refute
claims. Suppose Sam emails his
teacher a homework paper after
the due date, with a plea that the
work had been completed by the
deadline, but was undeliverable due
to a network failure. If Sam is a
cheater, he could be exposed if he
doesn’t realize that the “last modi-
fied” date is part of the document.
However, if Sam is aware of this,
he could “stamp” the document
with the right time by re-setting
the computer’s clock before saving
the file. The name in which the
computer is registered and other
metadata are also forgeable, and
therefore are of limited use as
evidence in court cases.
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 78
has. Some word processors include version information as well, a record of
who changed what, when, and why. But the unaware can be trapped even by
such innocent information, since it tends not to be visible unless the recipi-
ent asks to see it. In Figure 3.6, the metadata reveals the name of the military
officer who created the redacted report on the death of Nicola Calipari.
CHAPTER 3 GHOSTS IN THE MACHINE 79
Reprinted with permission from Adobe Systems Incorporated.
FIGURE 3.6 Part of the metadata of the Calipari report, as revealed by the
“Properties” command of Adobe Acrobat Reader. The data shows that Richard Thelin
was the author, and that he altered the file less than two minutes after creating it.
Thelin was a Lieutenant Colonel in the U.S. Marine Corps at the time of the incident.
Authorship information leaked in this way can have real consequences. In
2003, the British government of Tony Blair released documentation of its case
for joining the U.S. war effort in Iraq. The document had many problems—large
parts of it turned out to have been plagiarized from a 13-year-old PhD thesis.
Equally embarrassing was that the electronic fingerprints of four civil servants
who created it were left on the document when it was released electronically
on the No. 10 Downing Street web site. According to the Evening Standard of
London, “All worked in propaganda units controlled by Alastair Campbell, Tony
Blair’s director of strategy and communications,” although the report had sup-
posedly been the work of the Foreign Office. The case of the “dodgy dossier”
caused an uproar in Parliament.
You don’t have to be a businessperson or government official to be
victimized by documents bearing fingerprints. When you send someone a
document as an attachment to an email, very likely the document’s metadata
shows who actually created it, and when. If you received it from someone else
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 79
and then altered it, that may show as well. If you put the text of the docu-
ment into the body of your email instead, the metadata won’t be included;
the message will be just the text you see on the screen. Be sure of what you
are sending before you send it!
Can the Leaks Be Stopped?
Even in the most professional organizations, and certainly in ordinary house-
holds, knowledge about technological dangers and risks does not spread
instantaneously to everyone who should know it. The Calipari report was pub-
lished five years after the New York Times had been embarrassed. How can
users of modern information technology—today, almost all literate people—
stay abreast of knowledge about when and how to protect their information?
It is not easy to prevent the leakage of sensitive information that is hid-
den in documents but forgotten by their creators, or that is captured as meta-
data. In principle, offices should have a check-out protocol so that documents
are cleansed before release. But in a networked world, where email is a criti-
cal utility, how can offices enforce document release protocols without ren-
dering simple tasks cumbersome? A rather harsh measure is to prohibit use
of software that retains such information; that was the solution adopted by
the British government in the aftermath of the “dodgy dossier” scandal. But
the useful features of the software are then lost at the same time. A protocol
can be established for converting “rich” document formats such as that of
Microsoft Word to formats that retain less information, such as Adobe PDF.
But it turns out that measures used to eradicate personally identifiable infor-
mation from documents don’t achieve as thorough a cleansing as is com-
monly assumed.
At a minimum, office workers need education. Their software has great
capabilities they may find useful, but many of those useful features have risks
as well. And we all just need to think about what we are doing with our doc-
uments. We all too mindlessly re-type keystrokes we have typed a hundred
times in the past, not pausing to think that the hundred and first situation
may be different in some critical way!
Representation, Reality, and Illusion
René Magritte, in his famous painting of a pipe, said “This isn’t a pipe” (see
Figure 3.7). Of course it isn’t; it’s a painting of a pipe. The image is made out
80 BLOWN TO BITS
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 80
of paint, and Magritte was making a metaphysical joke. The painting is enti-
tled “The treachery of images,” and the statement that the image isn’t the
reality is part of the image itself.
CHAPTER 3 GHOSTS IN THE MACHINE 81
Los Angeles County Museum of Art. Purchased with funds provided by the Mr. and Mrs. William
Preston Harrison Collection. Photograph © 2007 Museum Associates/LACMA.
FIGURE 3.7 Painting by Magritte. The legend says “This isn’t a pipe.” Indeed, it’s
only smudges of paint that make you think of a pipe, just as an electronic document
is only bits representing a document.
When you take a photograph, you capture inside the camera something
from which an image can be produced. In a digital camera, the bits in an elec-
tronic memory are altered according to some pattern. The image, we say, is
“represented” in the camera’s memory. But if you took out the memory and
looked at it, you couldn’t see the image. Even if you printed the pattern of 0s
and 1s stored in the memory, the image wouldn’t appear. You’d have to know
how the bits represent the image in order to get at the image itself. In the
world of digital photography, the format of the bits has been standardized, so
that photographs taken on a variety of cameras can be displayed on a vari-
ety of computers and printed on a variety of printers.
The general process of digital photography is shown in Figure 3.8. Some
external reality—a scene viewed through a camera lens, for example—is
turned into a string of bits. The bits somehow capture useful information
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 81
about reality, but there is nothing “natural” about the way reality is captured.
The representation is a sort of ghost of the original, not identical to the orig-
inal and actually quite unlike it, but containing enough of the soul of the
original to be useful later on. The representation follows rules. The rules are
arbitrary conventions and the product of human invention, but they have
been widely accepted so photographs can be exchanged.
82 BLOWN TO BITS
01000001010101
10110000000111
10010101001011
11100101010010
00001001010101
00001010010101
MODELING RENDERING
REALITY REPRESENTATION
OR MODEL
IMAGE
FIGURE 3.8 Reproducing an image electronically is a two-stage process. First, the
scene is translated into bits, creating a digital model. Then the model is rendered as a
visible image. The model can be stored indefinitely, communicated from one place to
another, or computationally analyzed and enhanced to produce a different model
before it is rendered. The same basic structure applies to the reproduction of video
and audio.
The representation of the photograph in bits is called a model and the
process of capturing it is called modeling. The model is turned into an image
by rendering the model; this is what happens when you transfer the bits rep-
resenting a digital photograph to a computer screen or printer. Rendering
brings the ghost back to life. The image resembles, to the human eye, the
original reality—provided that the model is good enough. Typically, a model
that is not good enough—has too few bits, for example—cannot produce an
image that convincingly resembles the reality it was meant to capture.
Modeling always omits information. Magritte’s painting doesn’t smell like
a pipe; it has a different patina than a pipe; and you can’t turn it around to
see what the other side of the pipe looks like. Whether the omitted informa-
tion is irrelevant or essential can’t be judged without knowing how the model
is going to be used. Whoever creates the model and renders it has the power
to shape the experience of the viewer.
The process of modeling followed by rendering applies to many situations
other than digital photography. For example, the same transformations hap-
pen when music is captured on a CD or as an MP3. The rendering process pro-
duces audible music from a digital representation, via stereo speakers or a
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 82
headset. CDs and MP3s use quite distinct modeling methods, with CDs gen-
erally capturing music more accurately, using a larger number of bits.
Knowing that digital representations don’t resemble the things they repre-
sent explains the difference between the terms “analog” and “digital.” An
analog telephone uses a continuously varying electric signal to represent a
continuously varying sound—the voltage of the telephone signal is an “ana-
log” of the sound it resembles—in the same way that Magritte applied paint
smoothly to canvas to mimic the shape of the pipe. The shift from analog to
digital technologies, in telephones, televisions, cameras, X-ray machines, and
many other devices, at first seems to lose the immediacy and simplicity of the
old devices. But the enormous processing power of modern computers makes
the digital representation far more flexible and useful.
Indeed, the same general pro-
cesses are at work in situations
where there is no “reality” because
the images are of things that have
never existed. Examples are video
games, animated films, and virtual
walk-throughs of unbuilt architec-
ture. In these cases, the first step of
Figure 3.8 is truncated. The “model”
is created not by capturing reality in
an approximate way, but by pure
synthesis: as the strokes of an artist’s
electronic pen, or the output of com-
puter-aided design software.
The severing of the immediate
connection between representation
and reality in the digital world has
created opportunities, dangers, and
puzzles. One of the earliest triumphs
of “digital signal processing,” the
science of doing computations on
the digital representations of reality,
was to remove the scratches and
noise from old recordings of the
great singer Enrico Caruso. No amount of analog electronics could have
cleaned up the old records and restored the clarity to Caruso’s voice.
And yet the growth of digital “editing” has its dark side as well. Photo-
editing software such as Photoshop can be used to alter photographic evi-
dence presented to courts of law.
CHAPTER 3 GHOSTS IN THE MACHINE 83
CAN WE BE SURE A PHOTO
IS UNRETOUCHED?
Cryptographic methods (discussed
in Chapter 5) can establish that a
digital photograph has not been
altered. A special camera gets a
digital key from the “image verifi-
cation system,” attaches a “digital
signature” (see Chapter 5) to the
image and uploads the image and
the signature to the verification
system. The system processes the
received image with the same key
and verifies that the same signa-
ture results. The system is secure
because it is impossible, with any
reasonable amount of computation,
to produce another image that
would yield the same signature
with this key.
03_0137135599_ch03.qxd 7/31/08 2:53 PM Page 83
The movie Toy Story and its descendants are unlikely to put human actors
out of work in the near future, but how should society think about synthetic
child pornography? “Kiddie porn” is absolutely illegal, unlike other forms of
pornography, because of the harm done to the children who are abused to
produce it. But what about pornographic images of children who do not exist
and never have—who are simply the creation of a skilled graphic synthesizer?
Congress outlawed such virtual kiddie porn in 1996, in a law that prohibited
any image that “is, or appears to be, of a minor engaging in sexually explicit
conduct.” The Supreme Court overturned the law on First Amendment
grounds. Prohibiting images that “appear to” depict children is going too far,
the court ruled—such synthetic pictures, no matter how abhorrent, are consti-
tutionally protected free speech.
In this instance at least, real-
ity matters, not what images
appear to show. Chapter 7, “You
Can’t Say That on the Internet,”
discusses other cases in which
society is struggling to control
social evils that are facilitated
by information technology. In
the world of exploded assumptions about reality and artifice, laws that com-
bat society’s problems may also compromise rights of free expression.
What Is the Right Representation?
Figure 3.9 is a page from the Book of
Kells, one of the masterpieces of
medieval manuscript illumination,
produced around A.D. 800 in an Irish
monastery. The page contains a few
words of Latin, portrayed in an
astoundingly complex interwoven
lacework of human and animal fig-
ures, whorls, and crosshatching. The
book is hundreds of pages long, and
in the entire work no two of the let-
ters or decorative ornaments are
drawn the same way. The elaborately
ornate graphic shows just 21 letters
(see Figure 3.10).
84 BLOWN TO BITS
In the world of exploded
assumptions about reality and
artifice, laws that combat society’s
problems may also compromise
rights of free expression.
DIGITAL CAMERAS AND MEGAPIXELS
Megapixels—millions of pixels—are
a standard figure of merit for digi-
tal cameras. If a camera captures
too few pixels, it can’t take good
photographs. But no one should
think that more pixels invariably
yield a better image. If a digital
camera has a low-quality lens,
more pixels will simply produce a
more precise representation of a
blurry picture!
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 84
CHAPTER 3 GHOSTS IN THE MACHINE 85
Copyright © Trinity College, Dublin.
FIGURE 3.9 Opening page of the Gospel of St. John from the Book of Kells.
IN PRINCIPIO ERAT VERBUM
FIGURE 3.10 The words of the beginning of the gospel of St. John. In the book of
Kells, the easiest word to spot is ERAT, just to the left of center about a quarter of
the way up the page.
Do these two illustrations contain the same information? The answer
depends on what information is meant to be recorded. If the only important
thing were the Latin prose, then either representation might be equally good,
though Figure 3.10 is easier to read. But the words themselves are far from
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 85
the only important thing in the Book of Kells. It is one of the great works of
Western art and craftsmanship.
A graphic image such as Figure 3.9 is represented as a rectangular grid of
many rows and columns, by recording the color at each position in the grid
(see Figure 3.11). To produce such a representation, the page itself is scanned,
one narrow row after the next, and each row is divided horizontally into tiny
square “picture elements” or pixels. An image representation based on a divi-
sion into pixels is called a raster or bitmap representation. The representation
corresponds to the structure of a computer screen (or a digital TV screen),
which is also divided into a grid of individual pixels—how many pixels, and
how small they are, affect the quality and price of the display.
86 BLOWN TO BITS
Copyright © Trinity College, Dublin.
FIGURE 3.11 A detail enlarged from the upper-right corner of the opening page of
John from the Book of Kells.
What would be the computer representation of the mere Latin text, Figure
3.10? The standard code for the Roman alphabet, called ASCII for the
American Standard Code for Information Interchange, assigns a different 8-
bit code to each letter or symbol. ASCII uses one byte (8 bits) per character.
For example, A = 01000001, a = 01100001, $ = 00100100, and 7 = 00110111.
The equation 7 = 00110111 means that the bit pattern used to represent
the symbol “7” in a string of text is 00110111. The space character has its own
code, 00100000. Figure 3.12 shows the ASCII representation of the characters
“IN PRINCIPIO ERAT VERBUM,” a string of 24 bytes or 192 bits. We’ve
separated the long string of bits into bytes to improve readability ever so
slightly! But inside the computer, it would just be one bit after the next.
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 86
FIGURE 3.12 ASCII bit string for the characters of “IN PRINCIPIO ERAT VERBUM.”
So 01001001 represents the letter I. But not always! Bit strings are used to
represent many things other than characters. For example, the same bit string
01001001, if interpreted as the representation of a whole number in binary
notation, represents 73. A computer cannot simply look at a bit string
01001001 and know whether it is supposed to represent the letter I or the
number 73 or data of some other type, a color perhaps. A computer can inter-
pret a bit string only if it knows the conventions that were used to create the
document—the intended interpretation of the bits that make up the file.
The meaning of a bit string is a matter of convention. Such conventions
are arbitrary at first. The code for the letter I could have been 11000101 or
pretty much anything else. Once conventions have become accepted through
a social process of agreement and economic incentive, they became nearly as
inflexible as if they were physical laws. Today, millions of computers assume
CHAPTER 3 GHOSTS IN THE MACHINE 87
FILENAME EXTENSIONS
The three letters after the dot at the end of a filename indicate how the
contents are to be interpreted. Some examples are as follows:
Extension File Type
.doc Microsoft Word document
.odt OpenDocument text document
.ppt Microsoft PowerPoint document
.ods OpenDocument Spreadsheet
.pdf Adobe Portable Document Format
.exe Executable program
.gif Graphics Interchange Format (uses 256-color palette)
.jpg JPEG graphic file (Joint Photographic Experts Group)
.mpg MPEG movie file (Moving Picture Experts Group)
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 87
that 01001001, if interpreted as a character, represents the letter I, and the
universal acceptance of such conventions is what makes worldwide informa-
tion flows possible.
The document format is the key to turning the representation into a view-
able document. If a program misinterprets a document as being in a different
format from the one in which it was created, only nonsense will be rendered.
Computers not equipped with software matching the program that created a
document generally refuse to open it.
Which representation is “better,” a raster image or ASCII? The answer
depends on the use to which the document is to be put. For representation of
freeform shapes in a great variety of shades and hues, a raster representation
is unbeatable, provided the pixels are small enough and there are enough of
them. But it is hard even for a trained human to find the individual letters
within Figure 3.9, and it would be virtually impossible for a computer pro-
gram. On the other hand, a document format based on ASCII codes for char-
acters, such as the PDF format, can easily be searched for text strings.
The PDF format includes more than simply the ASCII codes for the text.
PDF files include information about typefaces, the colors of the text and of
the background, and the size and exact positions of the letters. Software that
produces PDFs is used to typeset elegant documents such as this one. In other
words, PDF is actually a page description language and describes visible fea-
tures that are typographically meaningful. But for complicated pictures, a
graphical format such as JPG must be used. A mixed document, such as these
pages, includes graphics within PDF files.
Reducing Data, Sometimes Without Losing Information
Let’s take another look at the page from the Book of Kells, Figure 3.9, and
the enlargement of a small detail of that image, Figure 3.11. The computer file
from which Figure 3.9 was printed is 463 pixels wide and 651 pixels tall, for
a total of about 300,000 individual pixels. The pages of the Book of Kells
measure about 10 by 13 inches, so the raster image has only about 50 pixels
per inch of the original work. That is too few to capture the rich detail of the
original—Figure 3.11 actually shows one of the animal heads in the top-right
corner of the page. A great deal of detail was lost when the original page was
scanned and turned into pixels. The technical term for the problem is under-
sampling. The scanning device “samples” the color value of the original doc-
ument at discrete points to create the representation of the document, and in
this case, the samples are too far apart to preserve detail that is visible to the
naked eye in the original.
88 BLOWN TO BITS
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 88
Credit as in Wikipedia, en.wikipedia.org/wiki/Image:Resolution_illustration.png.
FIGURE 3.13 A shape shown at various resolutions, from 1 × 1 to 100 × 100 pixels.
A square block consisting of many pixels of a single shade can be represented much
more compactly than by repeating the code for that shade as many times as there
are pixels.
But, of course, a price is paid for increased resolution. The more pixels in
the representation of an image, the more memory is needed to hold the rep-
resentation. Double the resolution, and the memory needed goes up by a fac-
tor of four, since the resolution doubles both vertically and horizontally.
Standard software uses a variety of representational techniques to repre-
sent raster graphics more concisely. Compression techniques are of two kinds:
“lossless” and “lossy.” A lossless representation is one that allows exactly the
same image to be rendered. A lossy
representation allows an approxima-
tion to the same image to be ren-
dered—an image that is different
from the original in ways the human
eye may or may not be able to
discern.
One method used for lossless
image compression takes advantage
of the fact that in most images, the
color doesn’t change from pixel to
pixel—the image has spatial coher-
ence, to use the official term.
Looking at the middle and rightmost
images in Figure 3.13, for example,
makes clear that in the 100 × 100
resolution image, the 100 pixels in a
The answer to undersampling is to increase the resolution of the scan—the
number of samples per inch. Figure 3.13 shows how the quality of an image
improves with the resolution. In each image, each pixel is colored with the
“average” color of part of the original.
CHAPTER 3 GHOSTS IN THE MACHINE 89
AUDIO COMPRESSION
MP3 is a lossy compression method
for audio. It uses a variety of tricks
to create small data files. For exam-
ple, human ears are not far enough
apart to hear low-frequency sounds
stereophonically, so MP3s may
record low frequencies in mono
and play the same sound to both
speakers, while recording and
playing the higher frequencies in
stereo! MP3s are “good enough” for
many purposes, but a trained and
sensitive ear can detect the loss of
sound quality.
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 89
10 × 10 square in the top-left corner are all the same color; there is no need
to repeat a 24-bit color value 100 times in the representation of the image.
Accordingly, graphic representations have ways of saying “all pixels in this
block have the same color value.” Doing so can reduce the number of bits sig-
nificantly.
Depending on how an image will be used, a lossy compression method
might be acceptable. What flashes on your TV is gone before you have time
to scrutinize the individual pixels. But in some cases, only lossless compres-
sion is satisfactory. If you have the famous Zapruder film of the Kennedy
assassination and want to preserve it in a digital archive, you want to use a
lossless compression method once you have digitized it at a suitably fine res-
olution. But if you are just shipping off the image to a low-quality printer
such as those used to print newspapers, lossy compression might be fine.
Technological Birth and Death
The digital revolution was possible because the capacity of memory chips
increased, relentlessly following Moore’s Law. Eventually, it became possible to
store digitized images and sounds at such high resolution that their quality was
higher than analog representations. Moreover, the price became low enough
that the storage chips could be included in consumer goods. But more than
electrical engineering is involved. At more than a megabyte per image, digital
cameras and HD televisions would still be exotic rarities. A megabyte is about
a million bytes, and that is just too much data per image. The revolution also
required better algorithms—better computational methods, not just better hard-
ware—and fast, cheap processing chips to carry out those algorithms.
For example, digital video compression utilizes temporal coherence as well
as spatial coherence. Any portion of the image is unlikely to change much in
color from frame to frame, so large parts of a picture typically do not have
to be retransmitted to the home when the frame changes after a thirtieth of a
second. At least, that is true in principle. If a woman in a TV image walks
across a fixed landscape, only her image, and a bit of landscape that newly
appears from behind her once she passes it, needs be transmitted—if it is com-
putationally feasible to compare the second frame to the first before it is
transmitted and determine exactly where it differs from its predecessor. To
keep up with the video speed, there is only a thirtieth of a second to do that
computation. And a complementary computation has to be carried out at the
other end—the previously transmitted frame must be modified to reflect the
newly transmitted information about what part of it should change one frame
time later.
90 BLOWN TO BITS
03_0137135599_ch03.qxd 7/31/08 3:13 PM Page 90
Digital movies could not have happened without an extraordinary increase
in speed and drop in price in computing power. Decompression algorithms are
built into desktop photo printers and cable TV boxes, cast in silicon in chips
more powerful than the fastest computers of only a few years ago. Such com-
pact representations can be sent quickly through cables and as satellite sig-
nals. The computing power in the cable boxes and television sets is today
powerful enough to reconstruct the image from the representation of what
has changed. Processing is power.
By contrast, part of the reason the compact disk is dying as a medium for
distributing music is that it doesn’t hold enough data. At the time the CD for-
mat was adopted as a standard, decompression circuitry for CD players would
have been too costly for use in homes and automobiles, so music could not
be recorded in compressed form. The magic of Apple’s iPod is not just the
huge capacity and tiny physical size of its disk—it is the power of the pro-
cessing chip that renders the stored model as music.
The birth of new technologies presage the death of old technologies.
Digital cameras killed the silver halide film industry; analog television sets
will soon be gone; phonograph records gave way to cassette tapes, which in
turn gave way to compact disks, which are themselves now dying in favor of
digital music players with their highly compressed data formats.
The periods of transition between technologies, when one emerges and
threatens another that is already in wide use, are often marked by the exer-
cise of power, not always progressively. Businesses that dominate old tech-
nologies are sometimes innovators, but often their past successes make them
slow to change. At their worst, they may throw up roadblocks to progress in
an attempt to hold their ground in the marketplace. Those roadblocks may
include efforts to scare the public about potential disruptions to familiar prac-
tices, or about the dollar costs of progress.
Data formats, the mere conventions used to intercommunicate informa-
tion, can be remarkably contentious, when a change threatens the business of
an incumbent party, as the Commonwealth of Massachusetts learned when it
tried to change its document formats. The tale of Massachusetts and
OpenDocument illustrates how hard change can be in the digital world,
although it sometimes seems to change on an almost daily basis.
Data Formats as Public Property
No one owns the Internet, and everyone owns the Internet. No government
controls the whole system, and in the U.S., the federal government controls
only the computers of government agencies. If you download a web page to
CHAPTER 3 GHOSTS IN THE MACHINE 91
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 91
your home computer, it will reach you through the cooperation of several,
perhaps dozens, of private companies between the web server and you.
This flexible and constantly
changing configuration of computers
and communication links developed
because the Internet is in its essence
not hardware, but protocols—the
conventions that computers use for
sending bits to each other (see the
Appendix). The most basic Internet
Protocol is known as IP. The Internet
was a success because IP and the
designs for the other protocols
became public standards, available
for anyone to use. Anyone could
build on top of IP. Any proposed
higher-level protocol could be
adopted as a public standard if it met
the approval of the networking com-
munity. The most important protocol
exploiting IP is known as TCP. TCP is
used by email and web software to
ship messages reliably between com-
puters, and the pair of protocols is known as TCP/IP. The Internet might not
have developed that way had proprietary networking protocols taken hold in
the early days of networking.
It was not always thus. Twenty to thirty years ago, all the major computer
companies—IBM, DEC, Novell, and Apple—had their own networking proto-
cols. The machines of different companies did not intercommunicate easily,
and each company hoped that the rest of the world would adopt its protocols
as standards. TCP/IP emerged as a standard because agencies of the U.S. gov-
ernment insisted on its use in research that it sponsored—the Defense
Department for the ARPANET, and the National Science Foundation for
NSFnet. TCP/IP was embedded in the Berkeley Unix operating system, which
was developed under federal grants and came to be widely used in universi-
ties. Small companies quickly moved to use TCP/IP for their new products.
The big companies moved to adopt it more slowly. The Internet, with all of
its profusion of services and manufacturers, could not have come into exis-
tence had one of the incumbent manufacturers won the argument—and they
failed even though their networking products were technologically superior
to the early TCP/IP implementations.
92 BLOWN TO BITS
UPLOADING AND DOWNLOADING
Historically, we thought of the
Internet as consisting of powerful
corporate “server” machines
located “above” our little home
computers. So when we retrieved
material from a server, we were
said to be “downloading,” and
when we transferred material from
our machine to a server, we were
“uploading.” Many personal
machines are now so powerful that
the “up” and “down” metaphors
are no longer descriptive, but the
language is still with us. See the
Appendix, and also the explanation
of “peer-to-peer” in Chapter 6,
“Balance Toppled.”
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 92
File formats stand at a similar fork in the road today. There is increasing
concern about the risks of commercial products evolving into standards.
Society will be better served, goes the argument, if documents are stored in
formats hammered out by standards organizations, rather than disseminated
as part of commercial software packages. But consensus around one de facto
commercial standard, the .doc format of Microsoft Word, is already well
advanced.
Word’s .doc format is proprietary, developed by Microsoft and owned by
Microsoft. Its details are now public, but Microsoft can change them at any
time, without consultation. Indeed, it does so regularly, in order to enhance
the capabilities of its software—and new releases create incompatibilities with
legacy documents. Some documents created with Word 2007 can’t be opened
in Word 2003 without a software add-on, so even all-Microsoft offices risk
document incompatibilities if they don’t adjust to Microsoft’s format changes.
Microsoft does not exclude competitors from adopting its format as their own
document standard—but competitors would run great risks in building on a
format they do not control.
In a large organization, the cost of licensing Microsoft Office products for
thousands of machines can run into the millions of dollars. In an effort to
create competition and to save money, in 2004 the European Union advanced
the use of an “OpenDocument Format” for exchange of documents among EU
businesses and governments. Using ODF, multiple companies could enter the
market, all able to read documents produced using each other’s software.
In September, 2005, the Commonwealth of Massachusetts decided to fol-
low the EU initiative. Massachusetts announced that effective 15 months
later, all the state’s documents would have to be stored in OpenDocument
Format. About 50,000 state-owned computers would be affected. State offi-
cials estimated the cost savings at about $45 million. But Eric Kriss, the state’s
secretary of administration and finance, said that more than software cost
was at stake. Public documents were public property; access should never
require the cooperation of a single private corporation.
Microsoft did not accept the state’s decision without an argument. The
company rallied advocates for the disabled to its side, claiming that no avail-
able OpenDocument software had the accessibility features Microsoft offered.
Microsoft, which already had state contracts that extended beyond the
switchover date, also argued that adopting the ODF standard would be unfair
to Microsoft and costly to Massachusetts. “Were this proposal to be adopted,
the significant costs incurred by the Commonwealth, its citizens, and the pri-
vate sector would be matched only by the levels of confusion and incompat-
ibility that would result….” Kriss replied, “The question is whether a sovereign
state has the obligation to ensure that its public documents remain forever free
CHAPTER 3 GHOSTS IN THE MACHINE 93
03_0137135599_ch03.qxd 7/31/08 3:13 PM Page 93
and unencumbered by patent, license,
or other technical impediments. We
say, yes, this is an imperative. Micro-
soft says they disagree and want the
world to use their proprietary for-
mats.” The rhetoric quieted down, but
the pressure increased. The stakes
were high for Microsoft, since where
Massachusetts went, other states
might follow.
Three months later, neither Kriss
nor Quinn was working for the state.
Kriss returned to private industry as
he had planned to do before joining
the state government. The Boston
Globe published an investigation of
Quinn’s travel expenses, but the state
found him blameless. Tired of the
mudslinging, under attack for his
decision about open standards, and
lacking Kriss’s support, on December
24, Quinn announced his resigna-
tion. Quinn suspected “Microsoft
money and its lobbyist machine” of
being behind the Globe investigation
and the legislature’s resistance to his
open standard initiative.
The deadline for Massachusetts to
move to OpenDocuments has passed, and as of the fall of 2007, the state’s
web site still says the switchover will occur in the future. In the intervening
months, the state explains, it became possible for Microsoft software to read
and write OpenDocument formats, so the shift to OpenDocument would not
eliminate Microsoft from the office software competition. Nonetheless, other
software companies would not be allowed to compete for the state’s office
software business until “accessibility characteristics of the applications meet
or exceed those of the currently deployed office suite”—i.e., Microsoft’s. For
the time being, Microsoft has the upper hand, despite the state’s effort to
wrest from private hands the formats of its public documents.
Which bits mean what in a document format is a multi-billion dollar busi-
ness. As in any big business decisions, money and politics count, reason
becomes entangled with rhetoric, and the public is only one of the stake-
holders with an interest in the outcome.
94 BLOWN TO BITS
OPENDOCUMENT,
OPEN SOURCE, FREE
These three distinct concepts all
aim, at least in part, to slow the
development of software monopo-
lies. OpenDocument (opendocu-
ment.xml.org) is an open standard
for file formats. Several major
computer corporations have backed
the effort, and have promised not
to raise intellectual property issues
that would inhibit the development
of software meeting the standards.
Open source (opensource.org) is a
software development methodol-
ogy emphasizing shared effort and
peer review to improve quality. The
site openoffice.org provides a full
suite of open source office produc-
tivity tools, available without
charge. Free software—”Free as
in freedom, not free beer”
(www.fsf.org, www.gnu.org)—”is a
matter of the users’ freedom to
run, copy, distribute, study, change,
and improve the software.”
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 94
Hiding Information in Images
The surprises in text documents are mostly things of which the authors were
ignorant or unaware. Image documents provide unlimited opportunities for
hiding things intentionally—hiding secrets from casual human observers, and
obscuring open messages destined for human recipients so anti-spam soft-
ware won’t filter them out.
The Spam Wars
Many of us are used to receiving email pleas such as this one: I am Miss
Faatin Rahman the only child/daughter of late mrs helen rahman Address:
Rue 142 Marcory Abidjan Cote d’ivoire west africa, I am 20 years old girl. I
lost my parent, and I have an inheritance from my late mother, My parents
were very wealthy farmers and cocoa merchant when they were alive, After
the death of my father, long ago, my mother was controling his business untill
she was poisoned by her business associates which she suffered and died, …
I am crying and seeking for your kind assistance in the following ways: To
provide a safe bank account into where the money will be transferred for
investment….
If you get such a request, don’t respond to it! Money will flow out of, not
into, your bank account. Most people know not to comply. But mass emails
are so cheap that getting one person out of a million to respond is enough to
make the spammer financially successful.
“Spam filters” are programs that intercept email on its way into the in-box
and delete messages like these before we read them. This kind of spam fol-
lows such a standard style that it is easy to spot automatically, with minimal
risk that any real correspondence with banks or African friends will be fil-
tered out by mistake.
But the spam artists have fought back. Many of us have received emails
like the one in Figure 3.14. Why can’t the spam filter catch things like this?
Word-processing software includes the name and size of the font in con-
junction with the coded characters themselves, as well as other information,
such as the color of the letters and the color of the background. Because the
underlying text is represented as ASCII codes, however, it remains relatively
easy to locate individual letters or substrings, to add or delete text, and to per-
form other such common text-processing operations. When a user positions a
cursor over the letter on the screen, the program can figure out the location
within the file of the character over which the cursor is positioned. Computer
software can, in turn, render the character codes as images of characters.
CHAPTER 3 GHOSTS IN THE MACHINE 95
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 95
FIGURE 3.14 Graphic spam received by one of the authors. Although it looks like
text, the computer “sees” it as just an image, like a photograph. Because it doesn’t
realize that the pixels are forming letters, its spam filters cannot identify it as spam.
But just because a computer screen shows a recognizable letter of the
alphabet, this does not mean that the underlying representation is by means
of standard character codes. A digitized photograph of text may well look
identical to an image rendered from a word-processing document—that is, the
two utterly different representations may give rise to exactly the same image.
And that is one reason why, in the battle between spam producers and
makers of spam filters, the spam producers currently have the upper hand.
The spam of Figure 3.14 was produced in graphical form, even though what
is represented is just text. As the underlying representation is pixels and not
ASCII, spam like this makes it through all the filters we know about!
The problem of converting raster graphics to ASCII text is called character
recognition. The term optical character recognition, or OCR, is used when the
original document is a printed piece of paper. The raster graphic representa-
tion is the result of scanning the document, and then some character recog-
nition algorithm is used to convert the image into a sequence of character
codes. If the original document is printed in a standard typeface and is rela-
tively free of smudges and smears, contemporary OCR software is quite accu-
rate, and is now incorporated into commercially available scanners
commonly packaged as multipurpose devices that also print, photocopy, and
fax. Because OCR algorithms are now reasonably effective and widely avail-
able, the next generation of spam filters will likely classify emails such as
Figure 3.14 as spam.
96 BLOWN TO BITS
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 96
OCR and spam are merely an illustration of a larger point. Representation
determines what can be done with data. In principle, many representations
may be equivalent. But in practice, the secrecy of formatting information and
the computation required to convert one format to another may limit the use-
fulness of the data itself.
Hiding Information in Plain Sight
During World War I, the German Embassy in Washington, DC sent a message
to Berlin that began thus: “PRESIDENT’S EMBARGO RULING SHOULD HAVE
IMMEDIATE NOTICE.” U.S. intelligence was reading all the German
telegrams, and this one might have seemed innocuous enough. But the first
letters of the words spelled out “PERSHING,” the name of a U.S. Navy vessel.
The entire telegram had nothing to do with embargoes. It was about U.S. ship
movements, and the initial letters read in full, “PERSHING SAILS FROM N.Y.
JUNE 1.”
Steganography is the art of sending secret messages in imperceptible ways.
Steganography is different from cryptography, which is the art of sending
messages that are indecipherable. In a cryptographic communication, it is
assumed that if Alice sends a message to Bob, an adversary may well inter-
cept the message and recognize that it holds a secret. The objective is to make
the message unreadable, except to Bob, if it falls into the hands of such an
eavesdropper or enemy. In the world of electronic communication, sending
an encrypted message is likely to arouse suspicion of electronic monitoring
software. By contrast, in a steganographic message from Alice to Bob, the
communication itself arouses no suspicion. It may even be posted on a web
site and seem entirely innocent. Yet hidden in plain sight, in a way known
only to Alice and Bob, is a coded message.
Steganography has been in use for a long time. The Steganographia of
Johannes Trithemius (1462–1516) is an occult text that includes long conju-
rations of spirits. The first letters of the words of these mystic incantations
encode other hidden messages, and the book was influential for a century
after it was written. Computers have created enormous opportunities for
steganographic communications. As a very simple example, consider an ordi-
nary word-processing document—a simple love letter, for example. Print it
out or view it on the screen, and it seems to be about Alice’s sweet nothings
to Bob, and nothing more. But perhaps Alice included a paragraph at the end
in which she changed the font color to white. The software renders the white
text on the white background, which looks exactly like the white background.
CHAPTER 3 GHOSTS IN THE MACHINE 97
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 97
But Bob, if he knows what to look for, can make it visible—for example, by
printing on black paper (just as the text could be recovered from the electron-
ically redacted Calipari report).
If an adversary has any reason to think a trick like this might be in use,
the adversary can inspect Alice’s electronic letter using software that looks
for messages hidden using just this technique. But there are many places to
look for steganographic messages, and many ways to hide the information.
Since each Roman letter has an eight-bit ASCII code, a text can be hidden
within another as long as there is an agreed-upon method for encoding 0s
and 1s. For example, what letter is hidden in this sentence?
Steganographic algorithms hide messages inside photos, text, and
other data.
The answer is “I,” the letter whose ASCII character code is 01001001. In the
first eight words of the sentence, words beginning with consonants encode 0
bits and words beginning with vowels encode 1s (see Figure 3.15).
98 BLOWN TO BITS
Steganographic algorithms hide messages inside photos, text, and other data.
0 1 0 0 1 0 0 1
FIGURE 3.15 A steganographic encoding of text within text. Initial consonants
encode 0, vowels encode 1, and the first eight words encode the 8-bit ASCII code for
the letter “I.”
A steganographic method that would seem to be all but undetectable
involves varying ever so slightly the color values of individual pixels within
a photograph. Red, green, and blue components of a color determine the color
itself. A color is represented internally as one byte each for red, green, and
blue. Each 8-bit string represents a numerical value between 0 and 255.
Changing the rightmost bit from a 1 to a 0 (for example, changing 00110011
to 00110010), changes the numerical value by subtracting one—in this case,
changing the color value from 51 to 50. That results in a change in color so
insignificant that it would not be noticed, certainly not as a change in a sin-
gle pixel. But the rightmost bits of the color values of pixels in the graphics
files representing photographs can then carry quite large amounts of infor-
mation, without raising any suspicions. The recipient decodes the message
not by rendering the bits as visible images, but by inspecting the bits them-
selves, and picking out the significant 0s and 1s.
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 98
Who uses steganography today, if anyone? It is very hard to know. USA
Today reported that terrorists were communicating using steganography in
early 2001. A number of software tools are freely available that make
steganography easy. Steganographic detectors—what are properly known as
steganalysis tools—have also been developed, but their usefulness as yet
seems to be limited. Both steganography and steganalysis software is freely
available on the World Wide Web (see, for example, www.cotse.com/tools/
stega.htm and www.outguess.org/detection.php).
The use of steganography to transmit secret messages is today easy, cheap,
and all but undetectable. A foreign agent who wanted to communicate with
parties abroad might well encode a bit string in the tonal values of an MP3
or the color values of pixels in a pornographic image on a web page. So much
music and pornography flows between the U.S. and foreign countries that the
uploads and downloads would arouse no suspicion!
The Scary Secrets of Old Disks
By now, you may be tempted to delete all the files on your disk drive and
throw it away, rather than run the risk that the files contain unknown secrets.
That isn’t the solution: Even deleted files hold secrets!
A few years ago, two MIT researchers bought 158 used disk drives, mostly
from eBay, and recovered what data they could. Most of those who put the
disks up for sale had made some effort to scrub the data. They had dragged
files into the desktop trash can. Some had gone so far as to use the Microsoft
Windows FORMAT command, which warns that it will destroy all data on
the disk.
Yet only 12 of the 158 disk drives had truly been sanitized. Using several
methods well within the technical capabilities of today’s teenagers, the
researchers were able to recover user data from most of the others. From 42
of the disks, they retrieved what appeared to be credit card numbers. One of
the drives seemed to have come from an Illinois automatic teller machine and
contained 2,868 bank account numbers and account balances. Such data
from single business computers would be a treasure trove for criminals. But
most of the drives from home computers also contained information that the
owners would consider extremely sensitive: love letters, pornography, com-
plaints about a child’s cancer therapy, and grievances about pay disputes, for
example. Many of the disks contained enough data to identify the primary
user of the computer, so that the sensitive information could be tied back to
an individual whom the researchers could contact.
CHAPTER 3 GHOSTS IN THE MACHINE 99
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 99
The users of the computers had
for the most part done what they
thought they were supposed to do—
they deleted their files or formatted
their disks. They probably knew not
to release toxic chemicals by dump-
ing their old machines in a landfilll,
but they did not realize that by
dumping them on eBay, they might
be releasing personal information
into the digital environment. Any-
one in the world could have bought
the old disks for a few dollars, and
all the data they contained. What is
going on here, and is there anything
to do about it?
Disks are divided into blocks,
which are like the pages of a book—
each has an identifying address, like
a page number, and is able to hold a
few hundred bytes of data, about the
same amount as a page of text in a
book. If a document is larger than
one disk block, however, the docu-
ment is typically not stored in con-
secutive disk blocks. Instead, each
block includes a piece of the docu-
ment, and the address of the block
where the document is continued. So
the entire document may be physically scattered about the disk, although log-
ically it is held together as a chain of references of one block to another.
Logically, the structure is that of a magazine, where articles do not necessar-
ily occupy contiguous pages. Part of an article may end with “Continued on
page 152,” and the part of the article on page 152 may indicate the page on
which it is continued from there, and so on.
Because the files on a disk begin at random places on disk, an index
records which files begin where on the disk. The index is itself another disk
file, but one whose location on the disk can be found quickly. A disk index
is very much like the index of a book—which always appears at the end, so
readers know where to look for it. Having found the index, they can quickly
find the page number of any item listed in the index and flip to that page.
100 BLOWN TO BITS
CLOUD COMPUTING
One way to avoid having problems
with deleted disk files and expen-
sive document-processing software
is not to keep your files on your
disks in the first place! In “cloud
computing,” the documents stay on
the disks of a central service
provider and are accessed through
a web browser. “Google Docs” is
one such service, which boasts very
low software costs, but other major
software companies are rumored to
be exploring the market for cloud
computing. If Google holds your
documents, they are accessible
from anywhere the Internet
reaches, and you never have to
worry about losing them—Google’s
backup procedures are better than
yours could ever be. But there are
potential disadvantages. Google’s
lawyers would decide whether to
resist subpoenas. Federal investiga-
tors could inspect bits passing
through the U.S., even on a trip
between other countries.
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 100
Why aren’t disks themselves organized like books, with documents laid out
on consecutive blocks? Because disks are different from books in two impor-
tant respects. First, they are dynamic. The information on disks is constantly
being altered, augmented, and removed. A disk is less like a book than like a
three-ring binder, to which pages are regularly added and removed as infor-
mation is gathered and discarded. Second, disks are perfectly re-writable. A
disk block may contain one string of 0s and 1s at one moment, and as a result
of a single writing operation, a different string of 0s and 1s a moment later.
Once a 0 or a 1 has been written in a particular position on the disk, there is
no way to tell whether the bit previously in that position was a 0 or a 1. There
is nothing analogous to the faint traces of pencil marks on paper that are left
after an erasure. In fact, there is no notion of “erasure” at all on a disk—all
that ever happens is replacement of some bits by others.
Because disks are dynamic, there are many advantages to breaking the file
into chained, noncontiguous blocks indexed in this way. For example, if the
file contains a long text document and a user adds a few words to the mid-
dle of the text, only one or two blocks in the middle of the chain are affected.
If enough text is added that those blocks must be replaced by five new ones,
the new blocks can be logically threaded into the chain without altering any
of the other blocks comprising the document. Similarly, if a section of text is
deleted, the chain can be altered to “jump over” the blocks containing the
deleted text.
Blocks that are no longer part of any file are added to a “pool” of avail-
able disk blocks. The computer’s software keeps track of all the blocks in the
pool. A block can wind up in the pool either because it has never been used
or because it has been used but abandoned. A block may be abandoned
because the entire file of which it was part has been deleted or because the
file has been altered to exclude the block. When a fresh disk block is needed
for any purpose—for example, to start a new file or to add to an existing file—
a block is drawn from the pool of available blocks.
What Happens to the Data in Deleted Files?
Disk blocks are not re-written when they are abandoned and added to the
pool. When the block is withdrawn from the pool and put back to work as
part of another file, it is overwritten and the old data is obliterated. But until
then, the block retains its old pattern of zeroes and ones. The entire disk file
may be intact—except that there is no easy way to find it. A look in the index
will reveal nothing. But “deleting” a file in this way merely removes the index
entry. The information is still there on the disk somewhere. It has no more
CHAPTER 3 GHOSTS IN THE MACHINE 101
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 101
102 BLOWN TO BITS
been eradicated than the information in a book would be expunged by tear-
ing out the index from the back of the volume. To find something in a book
without an index, you just have to go through the book one page at a time
looking for it—tedious and time-consuming, but not impossible.
And that is essentially what the MIT researchers did with the disks they
bought off eBay—they went through the blocks, one at a time, looking for rec-
ognizable bit patterns. A sequence of sixteen ASCII character codes represent-
ing decimal digits, for example, looks suspiciously like a credit card number.
Even if they were unable to recover an entire file, because some of the blocks
comprising it had already been recycled, they could recognize significant
short character strings such as account numbers.
Of course, there would be a simple way to prevent sensitive information
from being preserved in fragments of “deleted” files. The computer could be
programmed so that, instead of simply putting abandoned blocks into the
pool, it actually over-wrote the
blocks, perhaps by “zeroing” them—
that is, writing a pattern of all 0s.
Historically, computer and software
manufacturers have thought the ben-
efits of zeroing blocks far less than
the costs. Society has not found
“data leakage” to be a critical prob-
lem until recently—although that
may be changing. And the costs of
constantly zeroing disk blocks would
be significant. Filling blocks with
zeroes might take so much time that
the users would complain about how
slowly their machines were running
if every block were zeroed immediately. With some clever programming the
process could be made unnoticeable, but so far neither Microsoft nor Apple
has made the necessary software investment.
And who has not deleted a file and then immediately wished to recover it?
Happily for all of us who have mistakenly dragged the wrong file into the
trash can, as computers work today, deleted files are not immediately added
to the pool—they can be dragged back out. Files can be removed only until
you execute an “Empty trash” command, which puts the deleted blocks into
the pool, although it does not zero them.
But what about the Windows “FORMAT” command, shown in Figure 3.16?
It takes about 20 minutes to complete. Apparently it is destroying all the bits
on the disk, as the warning message implies. But that is not what is happen-
THE LAW ADJUSTS
Awareness is increasing that
deleted data can be recovered from
disks. The Federal Trade Commission
now requires “the destruction or
erasure of electronic media con-
taining consumer information so
that the information cannot practi-
cably be read or reconstructed,”
and a similar provision is in a 2007
Massachusetts Law about security
breaches.
03_0137135599_ch03.qxd 7/31/08 3:17 PM Page 102
FIGURE 3.16 Warning screen of Microsoft Windows FORMAT command. The
statement that all the data will be lost is misleading—in fact, a great deal of it can be
recovered.
As if the problems with disks were not troubling enough, exactly the same
problems afflict the memory of cell phones. When people get rid of their old
phones, they forget the call logs and email messages they contain. And if they
do remember to delete them, using the awkward combinations of button-
pushes described deep in the phone’s
documentation, they may not really
have accomplished what they hoped.
A researcher bought ten cell phones
on eBay and recovered bank account
numbers and passwords, corporate
strategy plans, and an email
exchange between a woman and her
married boyfriend, whose wife was
getting suspicious. Some of this
information was recovered from
phones whose previous owners had
scrupulously followed the manufac-
turer’s instructions for clearing the
memory.
ing. It is simply looking for faulty spots on the disk. Physical flaws in the
magnetic surface can make individual disk blocks unusable, even though
mechanically the disk is fine and most of the surface is flawless as well. The
FORMAT command attempts to read every disk block in order to identify
blocks that need to be avoided in the future. Reading every block takes a long
time, but rewriting them all would take twice as long. The FORMAT command
identifies the bad blocks and re-initializes the index, but leaves most of the
data unaltered, ready to be recovered by an academic researcher—or an
inventive snooper.
CHAPTER 3 GHOSTS IN THE MACHINE 103
SOFTWARE TO SCRUB YOUR DISK
If you really want to get rid of all
the data on your disk, a special
“Secure empty trash” command is
available on Macintosh computers.
On Windows machines, DBAN is
free software that really will zero
your disk, available through
dban.sourceforge.net, which has
lots of other useful free software.
Don’t use DBAN on your disk until
you are sure you don’t want any-
thing on it anymore!
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 103
In a global sense, bits turn out to be very hard to eradicate. And most of
the time, that is exactly the way we want it. If our computer dies, we are glad
that Google has copies of our data. When our cell phone dies, we are happy
if our contact lists reappear, magically downloaded from our cellular service
provider to our replacement phone. There are upsides and downsides to the
persistence of bits.
Physical destruction always works as a method of data deletion. One of us
uses a hammer; another of us prefers his axe. Alas, these methods, while
effective, do not meet contemporary standards for recovery and recycling of
potentially toxic materials.
Can Data Be Deleted Permanently?
Rumors arise every now and then
that engineers equipped with very
sensitive devices can tell the differ-
ence between a 0 that was written
over a 0 on a disk and a 0 that was
written over a 1. The theory goes that
successive writing operations are not
perfectly aligned in physical space—a
“bit” has width. When a bit is rewrit-
ten, its physical edges may slightly
overlap or fall short of its previous
position, potentially revealing the
previous value. If such microscopic
misalignments could be detected, it
would be possible to see, even on a
disk that has been zeroed, what the
bits were before it was zeroed.
No credible authentication of such
an achievement has ever been pub-
lished, however, and as the density of
hard disks continues to rise, the like-
lihood wanes that such data recovery can be accomplished. On the other hand,
the places most likely to be able to achieve this feat are government intelli-
gence agencies, which do not boast of their successes! So all that can be said
for certain is that recovering overwritten data is within the capabilities of at
most a handful of organizations—and if possible at all, is so difficult and costly
that the data would have to be extraordinarily valuable to make the recovery
attempt worthwhile.
104 BLOWN TO BITS
COPIES MAKE DATA HARD
TO DELETE
If your computer has ever been
connected to a network, destroying
its data will not get rid of copies of
the same information that may
exist on other machines. Your
emails went to and from other
people—who may have copies on
their machines, and may have
shared them with others. If you use
Google’s Gmail, Google may have
copies of your emails even after
you have deleted them. If you
ordered some merchandise online,
destroying the copy of the invoice
on your personal computer cer-
tainly won’t affect the store’s
records.
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 104
How Long Will Data Really Last?
As persistent as digital information seems to be, and as likely to disclose
secrets unexpectedly, it also suffers from exactly the opposite problem.
Sometimes electronic records become unavailable quite quickly, in spite of
best efforts to save them permanently.
Figure 3.17 shows an early geopolitical and demographic database—the
Domesday Book, an inventory of English lands compiled in 1086 by Norman
monks at the behest of William the Conqueror. The Domesday Book is one of
Britain’s national treasures and rests in its archives, as readable today as it
was in the eleventh century.
CHAPTER 3 GHOSTS IN THE MACHINE 105
British National Archives.
FIGURE 3.17 The Domesday Book of 1086.
In honor of the 900th anniversary of the Domesday Book, the BBC issued a
modern version, including photographs, text, and maps documenting how
Britain looked in 1986. Instead of using vellum, or even paper, the material was
assembled in digital formats and issued on 12-inch diameter video disks, which
could be read only by specially equipped computers (see Figure 3.18). The
project was meant to preserve forever a detailed snapshot of late twentieth-
century Britain, and to make it available immediately to schools and libraries
everywhere.
By 2001, the modern Domesday Book was unreadable. The computers and
disk readers it required were obsolete and no longer manufactured. In 15
years, the memory even of how the information was formatted on the disks
had been forgotten. Mocking the project’s grand ambitions, a British news-
paper exclaimed, “Digital Domesday Book lasts 15 years not 1000.”
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 105
“Domesday Redux,” from Ariadne, Issue 56.
FIGURE 3.18 A personal computer of the mid-1980s configured to read the 12-inch
videodisks on which the modern “Domesday Book” was published.
Paper and papyrus thousands of years older even than the original
Domesday Book are readable today. Electronic records become obsolete in a
matter of years. Will the vast amounts of information now available because
of the advances in storage and communication technology actually be usable
a hundred or a thousand years in the future, or will the shift from paper to
digital media mean the loss of history?
The particular story of the modern Domesday Book has a happy ending.
The data was recovered, though just barely, thanks to a concerted effort by
many technicians. Reconstructing the data formats required detective work
on masses of computer codes (see Figure 3.19) and recourse to data structure
books of the period—so that programmers in 2001 could imagine how others
would have attacked the same data representation problems only 15 years
earlier! In the world of computer science, “state of the art” expertise dies very
quickly.
The recovered modern Domesday Book is accessible to anyone via the
Internet. Even the data files of the original Domesday Book have been trans-
ferred to a web site that is accessible via the Internet.
106 BLOWN TO BITS
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 106
FIGURE 3.19 Efforts to reconstruct, shortly after the year 2000, the forgotten data
formats for the modern “Domesday Book,” designed less than 20 years earlier.
But there is a large moral for any
office or library worker. We cannot
assume that the back-ups and saved
disks we create today will be useful
even ten years from now for retriev-
ing the vast quantities of information
they contain. It is an open question
whether digital archives—much less
the box of disk drives under your bed
in place of your grandmother’s box
of photographs—will be as permanent
as the original Domesday Book. An
extraordinary effort is underway to
archive the entire World Wide Web,
taking snapshots of every publicly
accessible web page at period inter-
vals. Can the effort succeed, and can
the disks on which the archive is held
CHAPTER 3 GHOSTS IN THE MACHINE 107
PRESERVING THE WEB
The Internet Archive
(www.archive.org) periodically
records “snapshots” of publicly
accessible web pages and stores
them away. Anyone can retrieve a
page from the past, even if it no
longer exists or has been altered.
By installing a “Wayback” button
(available from the Internet
Archive) on your web browser, you
can instantly see how any web
page looked in the past—just go to
the web page and click the
Wayback button; you get a list of
the archived copies of the page,
and you can click on any of them
to view it.
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 107
themselves be updated periodically so that the information will be with us
forever?
Or would we be wisest to do the apparently Luddite thing: to print every-
thing worth preserving for the long run—electronic journals, for example—so
that documents will be preserved in the only form we are certain will remain
readable for thousands of years?
The digital revolution put the power to document ideas into the hands of
ordinary people. The technology shift eliminated many of the intermediaries
once needed to produce office memoranda and books. Power over the
thoughts in those documents shifted as well. The authority that once accom-
panied the physical control of written and printed works has passed into the
hands of the individuals who write them. The production of information has
been democratized—although not always with happy results, as the mishaps
discussed in this chapter tellingly illustrate.
We now turn to the other half of the story: how we get the information
that others have produced. When power over documents was more central-
ized, the authorities were those who could print books, those who had the
keys to the file cabinets, and those with the most complete collections of doc-
uments and publications. Document collections were used both as informa-
tion choke points and as instruments of public enlightenment. Libraries, for
example, have been monuments to imperial power. University libraries have
long been the central institutions of advanced learning, and local public
libraries have been key democratizing forces in literate nations.
If everything is just bits and everyone can have as many bits as they want,
the problem may not be having the information, but finding it. Having a fact
on the disk in your computer, sitting a few inches from your eyes and brain,
is irrelevant, if what you want to know is irretrievably mixed with billions of
billions of other bits. Having the haystack does you no good if you can’t find
your precious needle within it. In the next chapter, we ask: Where does the
power now go, in the new world where access to information means finding
it, as well as having it?
108 BLOWN TO BITS
03_0137135599_ch03.qxd 5/2/08 8:52 AM Page 108
CHAPTER 4
Needles in the Haystack
Google and Other Brokers in the
Bits Bazaar
Found After Seventy Years
Rosalie Polotsky was 10 years old when she waved goodbye to her cousins,
Sophia and Ossie, at the Moscow train station in 1937. The two sisters were
fleeing the oppression of Soviet Russia to start a new life. Rosalie’s family
stayed behind. She grew up in Moscow, taught French, married Nariman
Berkovich, and raised a family. In 1990, she emigrated to the U.S. and settled
near her son, Sasha, in Massachusetts.
Rosalie, Nariman, and Sasha always wondered about the fate of Sophia
and Ossie. The Iron Curtain had utterly severed communication among Jewish
relatives. By the time Rosalie left for the U.S., her ties to Sophia and Ossie had
been broken for so long that she had little hope of reconnecting with them—
and, as the years wore on, less reason for optimism that her cousins were still
alive. Although his grandfather dreamed of finding them, Sasha’s search of
immigrant records at Ellis Island and the International Red Cross provided no
clues. Perhaps, traveling across wartime Europe, the little girls had never even
made it to the U.S.
Then one day, Sasha’s cousin typed “Polotsky” into Google’s search win-
dow and found a clue. An entry on a genealogical web site mentioned
“Minacker,” the name of Sophia’s and Ossie’s father. In short order, Rosalie,
Sophia, and Ossie were reunited in Florida, after 70 years apart. “All the time
when he was alive, he asked me to do something to find them,” said Sasha,
recalling his grandfather’s wish. “It’s something magic.”
The digital explosion has produced vast quantities of informative data, the
Internet has scattered that data across the globe, and the World Wide Web has
109
04_0137135599_ch04.qxd 5/2/08 8:03 AM Page 109
put it within reach of millions of ordinary people. But you can’t reach for
something if you don’t know where it is. Most of that vast store of digital
information might as well not exist without a way to find it. For most of us,
the way to find things on the Web is with web search engines. Search is a
wondrous, transformative technology, which both fulfills dreams and shapes
human knowledge. The search tools that help us find needles in the digital
haystack have become the lenses through which we view the digital land-
scape. Businesses and governments use them to distort our picture of reality.
The Library and the Bazaar
In the beginning, the Web was a library. Information providers—mostly
businesses and universities, which could afford to create web pages—posted
information for others to see. Information consumers—mostly others in busi-
ness and academia—found out where to get the information and downloaded
it. They might know where to look because someone sent them the URL (the
“Uniform Resource Locator”), such as mit.edu (the URL for MIT). Ordinary peo-
ple didn’t use the Web. Instead, they
used services such as CompuServe for
organized access to databases of var-
ious kinds of information.
As the Web went commercial,
directories began to appear, includ-
ing printed “Yellow Pages.” These
directories listed places to go on the
Web for various products and ser-
vices. If you wanted to buy a car, you
looked in one place, and you looked
in another place to find a job. These
lists resembled the categories AOL
and CompuServe provided in the
days before consumers could connect
directly to the Internet. Human
beings constructed these lists—
editors decided what went in each
category, and what got left out
entirely.
The Web has changed drastically
since the mid-1990s. First, it is no
110 BLOWN TO BITS
WEB 1.0 VS. WEB 2.0
In contemporary jargon, the newer,
more participatory web sites to
which users can contribute are
dubbed “Web 2.0.” The older, more
passive web sites are now called
“Web 1.0.” These look like software
release numbers, but “Web 2.0”
describes something subtler and
more complex. Web 2.0 sites—
Facebook and Wikipedia, for
example—exploit what economists
call “network effects.” Because
users are contributing information
as well as utilizing information
others supply, these sites become
more valuable the more people are
using them. See http://www.
oreillynet.com/lpt/a/6228 for a
fuller explanation of Web 2.0.
04_0137135599_ch04.qxd 5/2/08 8:03 AM Page 110
longer a passive information resource. Blogs, Wikipedia, and Facebook are
contributory structures, where peer involvement makes the information use-
ful. Web sites are cheap and easy to create; ordinary individuals and even the
smallest of organizations can now have them. As a result, the content and
connectedness of the Web are changing all the time.
Second, the Web has gotten so big and so unstructured that it is not
humanly possible to split it up into neat categories. Web pages simply don’t
lend themselves to organization in a nice structure, like an outline. There is
no master plan for the Web—vast numbers of new pages are added daily in
an utterly unstructured way. You certainly can’t tell what a web page con-
tains by looking at its URL.
Moreover, hierarchical organization is useless in helping you find informa-
tion if you can’t tell where in the hierarchy it might belong. You don’t usu-
ally go to the Web to look for a web page. You go to look for information,
and are glad to get it wherever you can find it. Often, you can’t even guess
where to look for what you want to know, and a nice, structured organiza-
tion of knowledge would do you no good. For example, any sensible organ-
ization of human knowledge, such as an encyclopedia, would have a section
on cows and a section on the moon. But if you didn’t know that there was a
nursery rhyme about the cow jumping over the moon, neither the “cow” nor
the “moon” entry would help you figure out what the cow supposedly did to
the moon. If you typed both words into a search engine, however, you would
find out in the blink of an eye.
Search is the new paradigm for finding information—and not just on the
Web as a whole. If you go to Wal-Mart’s web site, you can trace through its
hierarchical organization. At the top level, you get to choose between “acces-
sories,” “baby,” “boys,” “girls,” and so on. If you click “baby,” your next click
takes you to “infant boys,” “toddler girls,” and so on. There is also a search
window at the top. Type whatever you want, and you may be taken directly
to what you are looking for—but only on Wal-Mart’s site. Such limited search
engines help us share photos, read newspapers, buy books online from
Amazon or Barnes and Noble, and even find old email on our own laptops.
Search makes it possible to find things in
vast digital repositories. But search is more
than a quick form of look-up in a digital
library. Search is a new form of control over
information.
Information retrieval tools such as Google are extraordinarily democratiz-
ing—Rosalie and Sasha Berkovich did not need to hire a professional people-
finder. But the power that has been vested in individuals is not the only kind
CHAPTER 4 NEEDLES IN THE HAYSTACK 111
Search is a new form of
control over information.
04_0137135599_ch04.qxd 5/2/08 8:03 AM Page 111
that search has created. We have given search engines control over where we
get reliable information—the same control we used to assign to authoritative
sources, such as encyclopedias and “newspapers of record.” If we place
absolute trust in a search engine to find things for us, we are giving the
search engine the power to make it hard or impossible for us to know things.
Use Google in China, and your searches will “find” very different information
about democracy than they will “find” if you use Google in the United States.
Search for “401(K)” on John Hancock’s web site, and Fidelity’s 401(K) plans
will seem not to exist.
For the user, search is the power to find things, and for whoever controls
the engine, search is the power to shape what you see. Search is also power
of a third kind. Because the search
company records all our search
queries, we are giving the search
company the power that comes with
knowing what we want to know. In
its annual “Zeitgeist” report, Google
takes the pulse of the population by
revealing the questions its search
engine is most often asked. It was amusing to know that of the most popular
“Who is …?” searches of 2007, “God” was #1 and “Satan” was #10, with
“Buckethead” beating “Satan” at #6. Search engines also gather similar infor-
mation about each one of us individually. For example, as discussed in
Chapter 2, Amazon uses the information to suggest books you might like to
read once you have used its web site for a bit.
The Web is no longer a library. It is a chaotic marketplace of the billions
of ideas and facts cast up by the bits explosion. Information consumers and
information producers constantly seek out each other and morph into each
other’s roles. In this shadowy bits bazaar, with all its whispers and its couri-
ers running to and fro, search engines are brokers. Their job is not to supply
the undisputed truth, nor even to judge the accuracy of material that others
provide. Search engines connect willing producers of information to willing
consumers. They succeed or fail not on the quality of the information they
provide, because they do not produce content at all. They only make connec-
tions. Search engines succeed or fail depending on whether we are happy
with the connections they make, and nothing more. In the bazaar, it is not
always the knowledgeable broker who makes the most deals. To stay in busi-
ness, a broker just has to give most people what they want, consistently over
time.
Search does more than find things for us. Search helps us discover things
we did not know existed. By searching, we can all be armchair bits detectives,
112 BLOWN TO BITS
Here are some interesting Google
Zeitgeist results from 2007: among
“What is” questions, “love” was #1
and “gout” was #10; among “How
to” queries, “kiss” was #1 and
“skateboard” was #10.
04_0137135599_ch04.qxd 5/2/08 8:03 AM Page 112
finding surprises in the book next to
the one we were pulling off the dig-
ital bookshelf, and sniffing out curi-
ous information fragments cast far
and wide by the digital explosion.
Forbidden Knowledge Is
Only a Click Away
Schizophrenia is a terrible brain dis-
ease, afflicting millions of people. If
you wanted to know about the latest
treatment options, you might try to
find some web sites and read the
information they contain.
Some people already know where
they think they can good find med-
ical information—they have book-
marked a site they trust, such as
WebMD.com or DrKoop.com. If you
were like us, however, you’d use a
search engine—Google.com, Yahoo.
com, or Ask.com, for example. You’d
type in a description of what you
were looking for and start to click
links and read. Of course, you should
not believe uncritically anything you
read from a source you don’t know
anything about—or act on the med-
ical information you got through
your browsing, without checking
with a physician.
When we tried searching for “schizophrenia drugs” using Google, we got
the results shown in Figure 4.1. The top line tells us that if we don’t like these
results, there are a quarter-million more that Google would be glad to show
us. It also says that it took six-hundredths of a second to get these results for
us—we didn’t sense that it took even that long. Three “Sponsored Links”
appear to the right. A link is “sponsored” if someone has paid Google to have
it put there—in other words, it’s an advertisement. To the left is a variety of
ordinary links that Google’s information retrieval algorithms decided were
CHAPTER 4 NEEDLES IN THE HAYSTACK 113
BRITNEY IN THE BITS BAZAAR
Providing what most people want
creates a tyranny of the majority
and a bias against minority inter-
ests. When we searched for
“spears,” for example, we got back
three pages of results about Britney
Spears and her sister, with only
three exceptions: a link to Spears
Manufacturing, which produces
PVC piping; one to comedian Aries
Spears; and one to Prof. William M.
Spears of the University of
Wyoming. Ironically, Prof. Spears’s
web page ranked far below
“Britney Spears’ Guide to Semicon-
ductor Physics,” a site maintained
by some light-hearted physicists at
the University of Essex in the UK.
That site has a distinctive URL,
britneyspears.ac—where “.ac”
stands not for “academic” but for
“Ascension Island” (which gets a
few pennies for use of the .ac URL,
wherever in the world the site may
be hosted). Whatever the precise
reason for this site’s high ranking,
the association with Britney prob-
ably didn’t hurt!
04_0137135599_ch04.qxd 5/2/08 8:03 AM Page 113
Google ™ is a registered trademark of Google, Inc. Reprinted by permission.
FIGURE 4.1 Google’s results from a search for “schizophrenia drugs.”
Just looking at this window raises
a series of important questions:
• The Web is enormous. How can a
search engine find those results
so fast? Is it finding every appro-
priate link?
• How did Google decide what is
search result number 1 and what
is number 283,000?
• If you try another search engine instead of Google, you’ll get
different results. Which is right? Which is better? Which is more
authoritative?
most likely to be useful to someone wanting information about “schizophre-
nia drugs.” Those ordinary links are called the search engine’s organic results,
as opposed to the sponsored results.
114 BLOWN TO BITS
THOSE FUNNY NAMES
Yahoo! is an acronym—it stands for
“Yet Another Hierarchical Officious
Oracle” (docs.yahoo.com/info/
misc/history.html). “Google”
comes from “googol,” which is the
number represented by a 1 followed
by 100 zeroes. The Google founders
were evidently thinking big!
04_0137135599_ch04.qxd 5/2/08 8:03 AM Page 114
• Are the sponsored links supposed to be better links than the organic
links, or worse? Is the advertising really necessary?
• How much of this does the government oversee? If a TV station kept
reporting lies as the truth, the government would get after them. Does
it do anything with search engines?
We shall take up each of these questions in due course, but for the time being,
let’s just pursue our medical adventure.
When we clicked on the first organic link, it took us to a page from the
web site of a distinguished Swedish university. That page contained some
information about the different kinds of schizophrenia drugs. One of the
drugs it mentioned was “olanzapin (Zyprexa).” The trade name rang a bell for
some reason, so we started over and searched for “Zyprexa.”
The first of the organic links we got back was to www.zyprexa.com, which
described itself as “The Official ZYPREXA Olanzapine Site.” The page was
clearly marked as maintained by Eli Lilly and Company, the drug’s manufac-
turer. It provided a great deal of information about the drug, as well as pho-
tographs of smiling people—satisfied patients, presumably—and slogans such
as “There is Hope” and “Opening the Door to Possibility.” The next few links
on our page of search results were to the medical information sites drugs.com,
rxlist.com, webmd.com, and askapatient.com.
Just below these was a link that took us in a different direction:
“ZyprexaKills wiki.” The drug was associated with some serious side effects,
it seems, and Lilly allegedly kept these side effects secret for a long time. At
the very top of that page of search results, as the only sponsored link, was
the following: “Prescription Drug Lawsuit. Zyprexa-olanzapine-lawyer.com.
Pancreatitis & diabetes caused by this drug? Get legal help today.” That link
took us to a web form where a Houston attorney offered to represent us
against Lilly.
It took only a few more mouse clicks before a document appeared that was
entitled “Olanzapine—Blood glucose changes” (see Figure 4.2). It was an inter-
nal Lilly memorandum, never meant to be seen outside the company, and
marked as a confidential exhibit in a court case. Some patients who had devel-
oped diabetes while using Zyprexa had sued Lilly, claiming that the drug had
caused the disease. In the course of that lawsuit, this memo and other confi-
dential materials were shared with the plaintiffs’ lawyers under a standard dis-
covery protocol. Through a series of improper actions by several lawyers, a
New York Times reporter procured these documents. The reporter then pub-
lished an exposé of Lilly’s slowness to acknowledge the drug’s side effects. The
documents themselves appeared on a variety of web sites.
CHAPTER 4 NEEDLES IN THE HAYSTACK 115
04_0137135599_ch04.qxd 5/2/08 8:03 AM Page 115
Source: www.furiousseasons.com/zyprexa%20documents/ZY1%20%20%2000008758.pdf.
FIGURE 4.2 Top and bottom lines of a document filed in a court case. It was
supposed to be kept secret, but once on the Web, anyone searching for “Zyprexa
documents” finds it easily.
Lilly demanded that the documents be returned, that all copies be
destroyed, and that the web sites that had posted them be required to take
them down. A legal battle ensued. On February 13, 2007, Judge Jack B.
Weinstein of the U.S. District Court in New York issued his judgment, order,
and injunction. Yes, what had been done with the documents was grievously
wrong and contrary to earlier court orders. The lawyers and the journalist had
cooked up a scam on the legal system, involving collusion with an Alaska
lawyer who had nothing to do with the case, in order to spring the docu-
ments. The lawyers who conspired to get the documents had to give them
back and not keep any copies. They were enjoined against giving any copies
to anyone else.
But, concluded Judge Weinstein, the web sites were another matter. The
judge would not order the web sites to take down their copies. Lilly was enti-
tled to the paper documents, but the bits had escaped and could not be recap-
tured. As of this writing, the documents are still viewable. We quickly found
them directly by searching for “zyprexa documents.”
The world is a different place from a time when the judge could have
ordered the return of all copies of offending materials. Even if there were
hundreds of copies in file cabinets and desk drawers, he might have been able
to insist on their return, under threat of harsh penalties. But the Web is not a
file cabinet or a desk drawer. “Web sites,” wrote Judge Weinstein, “are prima-
rily fora for speech.” Lilly had asked for an injunction against five web sites
that had posted the documents, but millions of others could post them in the
future. “Limiting the fora available to would-be disseminators by such an
116 BLOWN TO BITS
04_0137135599_ch04.qxd 5/2/08 8:03 AM Page 116
infinitesimal percentage would be a fruitless exercise,” the judge concluded.
It probably would not be effective to issue a broader injunction, and even if
it were, “the risk of unlimited inhibitions of free speech should be avoided
when practicable.”
The judge understood the gravity of the issue he was deciding.
Fundamentally, he was reluctant to use the authority of the government in a
futile attempt to prevent people from saying what they wanted to say and
finding out what they wanted to know. Even if the documents had been vis-
ible only for a short time period, unknown numbers of copies might be
circulating privately among interested parties. Grasping for an analogy, the
judge suggested that God Himself had failed in His attempt to enjoin Adam
and Eve from their pursuit of the truth!
Two sponsored links appeared when we did the search for “zyprexa docu-
ments.” One was for another lawyer offering his services for Zyprexa-related
lawsuits against Lilly. The other, triggered by the word “documents” in our
search term, was for Google itself: “Online Documents. Easily share & edit
documents online for free. Learn more today. docs.google.com.” This was an
ironic reminder that the bits are out there, and the tools to spread them are
there too, for anyone to use. Thanks to search engines, anyone can find the
information they want. Information has exploded out of the shells that used
to contain it.
In fact, the architecture of human knowledge has changed as a result of
search. In a single decade, we have been liberated from information straight-
jackets that have been with us since the dawn of recorded history. And many
who should understand what has happened, do not. In February 2008, a San
Francisco judge tried to shut down the Wikileaks web site, which posts leaked
confidential documents anonymously as an aid to whistleblowers. The judge
ordered the name “Wikileaks” removed from DNS servers, so the URL
“Wikileaks.org” would no longer correspond to the correct IP address. (In
the guts of the Internet, DNS servers provide the service of translating URLs
into IP addresses. See the Appendix.) The publicity that resulted from this
censorship attempt made it easy to find various “mirrors”—identical twins,
located elsewhere on the Web—by searching for “Wikileaks.”
The Fall of Hierarchy
For a very long time, people have been organizing things by putting them
into categories and dividing those categories into subcategories. Aristotle
tried to classify everything. Living things, for example, were either plants or
animals. Animals either had red blood or did not; red-blooded animals were
CHAPTER 4 NEEDLES IN THE HAYSTACK 117
04_0137135599_ch04.qxd 7/31/08 3:22 PM Page 117
either live-bearers or egg-bearers; live-bearers were either humans or other
mammals; egg-bearers either swam or flew; and so on. Sponges, bats, and
whales all presented classification enigmas, on which Aristotle did not think
he had the last word. At the dawn of the Enlightenment, Linnaeus provided
a more useful way of classifying living things, using an approach that gained
intrinsic scientific validity once it reflected evolutionary lines of descent.
Our traditions of hierarchical classification are evident everywhere. We
just love outline structures. The law against cracking copyright protection
(discussed in Chapter 6, “Balance Toppled”) is Title 17, Section 1201, para-
graph (a), part (1), subpart (A). In the Library of Congress system, every book
is in one of 26 major categories, designated by a Roman letter, and these
major categories are internally divided in a similar way—B is philosophy, for
example, and BQ is Buddhism.
If the categories are clear, it may be possible to use the organizing hierar-
chy to locate what you are looking for. That requires that the person doing
the searching not only know the classification system, but be skilled at mak-
ing all the necessary decisions. For example, if knowledge about living things
was organized as Aristotle had it, anyone wanting to know about whales
would have to know already whether a whale was a fish or a mammal in
order to go down the proper branch of the classification tree. As more and
more knowledge has to be stuffed into the tree, the tree grows and sprouts
twigs, which over time become branches sprouting more twigs. The classifi-
cation problem becomes unwieldy, and the retrieval problem becomes practi-
cally impossible.
The system of Web URLs started out as such a classification tree. The site
www.physics.harvard.edu is a web server, of the physics department, within
Harvard University, which is an educational institution. But with the profu-
sion of the Web, this system of domain names is now useless as a way of find-
ing anything whose URL you do not already know.
In 1991, when the Internet was barely known outside academic and gov-
ernment circles, some academic researchers offered a program called “Gopher.”
This program provided a hierarchical directory of many web sites, by organ-
izing the directories provided by the individual sites into one big outline.
Finding things using Gopher was
tedious by today’s standards, and was
dependent on the organizational skills
of the contributors. Yahoo! was
founded in 1994 as an online Internet
directory, with human editors placing
products and services in categories,
118 BLOWN TO BITS
“Gopher” was a pun—it was soft-
ware you could use to “go for”
information on the Web. It was
also the mascot of the University
of Minnesota, where the software
was first developed.
04_0137135599_ch04.qxd 5/2/08 8:03 AM Page 118
making recommendations, and generally trying to make the Internet accessi-
ble to non-techies. Although Yahoo! has long since added a search window, it
retains its basic directory function to the present day.
The practical limitations of hierarchical organization trees were foreseen
sixty years ago. During World War II, President Franklin Roosevelt appointed
Vannevar Bush of MIT to serve as Director of the Office of Strategic Research
and Development (OSRD). The OSRD coordinated scientific research in sup-
port of the war effort. It was a large effort—30,000 people and hundreds of
projects covered the spectrum of science and engineering. The Manhattan
Project, which produced the atomic bomb, was just a small piece of it.
From this vantage point, Bush saw a major obstacle to continued scientific
progress. We were producing information faster than it could be consumed,
or even classified. Decades before computers became commonplace, he wrote
about this problem in a visionary article, “As We May Think.” It appeared in
the Atlantic Monthly—a popular magazine, not a technical journal. As Bush
saw it,
The difficulty seems to be, not so much that we publish unduly … but
rather that publication has been extended far beyond our present abil-
ity to make real use of the record. The summation of human experi-
ence is being expanded at a prodigious rate, and the means we use for
threading through the consequent maze to the momentarily important
item is the same as was used in the days of square-rigged ships. …
Our ineptitude in getting at the record is largely caused by the artifi-
ciality of systems of indexing.
The dawn of the digital era was at this time barely a glimmer on the horizon.
But Bush imagined a machine, which he called a “memex,” that would aug-
ment human memory by storing and retrieving all the information needed. It
would be an “enlarged intimate supplement” to human memory, which can
be “consulted with exceeding speed and flexibility.”
Bush clearly perceived the problem, but the technologies available at the
time, microfilm and vacuum tubes, could not solve it. He understood that the
problem of finding information would eventually overwhelm the progress of
science in creating and recording knowledge. Bush was intensely aware that
civilization itself had been imperiled in the war, but thought we must proceed
with optimism about what the record of our vast knowledge might bring us.
Man “may perish in conflict before he learns to wield that record for his true
good. Yet, in the application of science to the needs and desires of man, it
would seem to be a singularly unfortunate stage at which to terminate the
process, or to lose hope as to the outcome.”
CHAPTER 4 NEEDLES IN THE HAYSTACK 119
04_0137135599_ch04.qxd 5/2/08 8:03 AM Page 119
Capabilities that were inconceivable then are commonplace now. Digital
computers, vast storage, and high-speed networks make information search
and retrieval necessary. They also make it possible. The Web is a realization
of Bush’s memex, and search is key to making it useful.
It Matters How It Works
How can Google or Yahoo! possibly take a question it may never have been
asked before and, in a split second, deliver results from machines around the
world? The search engine doesn’t “search” the entire World Wide Web in
response to your question. That couldn’t possibly work quickly enough—it
would take more than a tenth of a second just for bits to move around the
earth at the speed of light. Instead, the search engine has already built up an
index of web sites. The search engine does the best it can to find an answer
to your query using its index, and then sends its answer right back to you.
To avoid suggesting that there is anything unique about Google or Yahoo!,
let’s name our generic search engine Jen. Jen integrates several different
processes to create the illusion that you simply ask her a question and she
gives back good answers. The first three steps have nothing to do with your
particular query. They are going on repeatedly and all the time, whether any-
one is posing any queries or not. In computer speak, these steps are happen-
ing in the background:
120 BLOWN TO BITS
A FUTURIST PRECEDENT
In 1937, H. G. Wells anticipated Vannevar Bush’s 1945 vision of a “memex.”
Wells wrote even more clearly about the possibility of indexing everything,
and what that would mean for civilization:
There is no practical obstacle whatever now to the creation of an
efficient index to all human knowledge, ideas and achievements,
to the creation, that is, of a complete planetary memory for all
mankind. And not simply an index; the direct reproduction of the
thing itself can be summoned to any properly prepared spot. …
This in itself is a fact of tremendous significance. It foreshadows a
real intellectual unification of our race. The whole human memory
can be, and probably in a short time will be, made accessible to
every individual. … This is no remote dream, no fantasy.
04_0137135599_ch04.qxd 5/2/08 8:03 AM Page 120
1. Gather information. Jen explores the Web, visiting many sites on a
regular basis to learn what they contain. Jen revisits old pages
because their contents may have changed, and they may contain links
to new pages that have never been visited.
2. Keep copies. Jen retains copies of many of the web pages she visits.
Jen actually has a duplicate copy of a large part of the Web stored on
her computers.
3. Build an index. Jen constructs a huge index that shows, at a mini-
mum, which words appear on which web pages.
When you make a query, Jen goes through four more steps, in the foreground:
4. Understand the query. English has lots of ambiguities. A query like
“red sox pitchers” is fairly challenging if you haven’t grown up with
baseball!
5. Determine the relevance of each possible result to the query. Does
the web page contain information the query asks about?
6. Determine the ranking of the relevant results. Of all the relevant
answers, which are the “best”?
7. Present the results. The results need not only to be “good”; they have
to be shown to you in a form you find useful, and perhaps also in a
form that serves some of Jen’s other purposes—selling more advertis-
ing, for example.
Each of these seven steps involves technical challenges that computer scien-
tists love to solve. Jen’s financial backers hope that her engineers solve them
better than the engineers of competing search engines.
We’ll go through each step in more detail, as it is important to understand
what is going on—at every step, more than technology is involved. Each step
also presents opportunities for Jen to use her information-gathering and edi-
torial powers in ways you may not have expected—ways that shape your view
of the world through the lens of Jen’s search results.
The background processing is like the set-building and rehearsals for a
theatrical production. You couldn’t have a show without it, but none of it
happens while the audience is watching, and it doesn’t even need to happen
on any particular schedule.
CHAPTER 4 NEEDLES IN THE HAYSTACK 121
04_0137135599_ch04.qxd 5/2/08 8:03 AM Page 121
Step 1: Gather Information
Search engines don’t index everything. The ones we think of as general util-
ities, such as Google, Yahoo!, and Ask, find information rather indiscrimi-
nately throughout the Web. Other search engines are domain-specific. For
example, Medline searches only through medical literature. ArtCylopedia
indexes 2,600 art sites. The FindLaw LawCrawler searches only legal web
sites. Right from the start, with any search engine, some things are in the
index and some are out, because some sites are visited during the gathering
step and others are not. Someone decides what is worth remembering and
what isn’t. If something is left out in Step 1, there is no possibility that you
will see it in Step 7.
Speaking to the Association of National Advertisers in October 2005, Eric
Schmidt, Google’s CEO, observed that of the 5,000 terabytes of information
in the world, only 170 terabytes had been indexed. (A terabyte is about a tril-
lion bytes.) That’s just a bit more than 3%, so 97% was not included. Another
estimate puts the amount of indexed information at only .02% of the size of
the databases and documents reachable via the Web. Even in the limited con-
text of the World Wide Web, Jen needs to decide what to look at, and how
frequently. These decisions implicitly define what is important and what is
not, and will limit what Jen’s users can find.
How often Jen visits web pages to index them is one of her precious trade
secrets. She probably pays daily visits to news sites such as CNN.com, so that
if you ask tonight about something that happened this morning, Jen may
point you to CNN’s story. In fact, there is most likely a master list of sites to
be visited frequently, such as whitehouse.gov—sites that change regularly
and are the object of much public interest. On the other hand, Jen probably
has learned from her repeated visits that some sites don’t change at all. For
example, the Web version of a paper published ten years ago doesn’t change.
After a few visits, Jen may decide to revisit it once a year, just in case. Other
pages may not be posted long enough to get indexed at all. If you post a
futon for sale on Craigslist.com, the ad will become accessible to potential
buyers in just a few minutes. If it sells quickly, however, Jen may never see
it. Even if the ad stays up for a while, you probably won’t be able to find it
with most search engines for several days.
Jen is clever about how often she revisits pages—but her cleverness also
codifies some judgments, some priorities—some control. The more important
Jen judges your page to be, the less time it will take for your new content to
show up as responses to queries to Jen’s search engine.
Jen roams the Web to gather information by following links from the
pages she visits. Software that crawls around the Web is (in typical geek
122 BLOWN TO BITS
04_0137135599_ch04.qxd 5/2/08 8:03 AM Page 122
irony) called a “spider.” Because the spidering process takes days or even
weeks, Jen will not know immediately if a web page is taken down—she will
find out only when her spider next visits the place where it used to be. At
that point, she will remove it from her index, but in the meantime, she may
respond to queries with links to pages that no longer exist. Click on such a
link, and you will get a message such as “Page not found” or “Can’t find the
server.”
Because the Web is unstructured, there is no inherently “correct” order in
which to visit the pages, and no obvious way to know when to stop. Page A
may contain references to page B, and also page B to page A, so the spider
has to be careful not to go around in circles. Jen must organize her crawl of
CHAPTER 4 NEEDLES IN THE HAYSTACK 123
HOW A SPIDER EXPLORES THE WEB
Search engines gather information by wandering through the World Wide
Web. For example, when a spider visits the main URL of the publisher of this
book, www.pearson.com, it retrieves a page of text, of which this is a fragment:
Subsidiary sites links