Google does it. Amazon does it. Walmart does it. And, as news reports last week made clear, the United States government does it.
Does what? Uses “big data” analysis of the swelling flood of data that is being generated and stored about virtually every aspect of our lives to identify patterns of behavior and make correlations and predictive assessments.
Amazon uses customer data to give us recommendations based on our previous purchases. Google uses our search data and other information it collects to sell ads and to fuel a host of other services and products.
The National Security Agency, a news article in The Guardian revealed last week, is collecting the phone records of millions of American customers of Verizon — “indiscriminately and in bulk” and “regardless of whether they are suspected of any wrongdoing” — under a secret court order. Under another surveillance program calledPrism, The Guardian and The Washington Post reported, the agency has been collecting data from e-mails, audio and video chats, photos, documents and logins, from leading Internet companies like Microsoft, Yahoo, Google, Facebook and Apple, to track foreign targets.
Why spread such a huge net in search of a handful of terrorist suspects? Why vacuum up data so indiscriminately? “If you’re looking for a needle in the haystack, you need a haystack,” Jeremy Bash, chief of staff to Leon E. Panetta, the former director of the Central Intelligence Agency and defense secretary, said on Friday.
In “Big Data,” their illuminating and very timely book, Viktor Mayer-Schönberger, a professor of Internet governance and regulation at the Oxford Internet Institute at Oxford University, and Kenneth Cukier, the data editor for The Economist, argue that the nature of surveillance has changed.
“In the spirit of Google or Facebook,” they write, “the new thinking is that people are the sum of their social relationships, online interactions and connections with content. In order to fully investigate an individual, analysts need to look at the widest possible penumbra of data that surrounds the person — not just whom they know, but whom those people know too, and so on.”
Mr. Cukier and Mr. Mayer-Schönberger argue that big data analytics are revolutionizing the way we see and process the world — they even compare its consequences to those of the Gutenberg printing press. And in this volume they give readers a fascinating — and sometimes alarming — survey of big data’s growing effect on just about everything: business, government, science and medicine, privacy and even on the way we think. Notions of causality, they say, will increasingly give way to correlation as we try to make sense of patterns.
Data is growing incredibly fast — by one account, it is more than doubling every two years — and the authors of this book argue that as storage costs plummet and algorithms improve, data-crunching techniques, once available only to spy agencies, research labs and gigantic companies, are becoming increasingly democratized.
Big data has given birth to an array of new companies and has helped existing companies boost customer service and find new synergies. Before a hurricane, Walmart learned, sales of Pop-Tarts increased, along with sales of flashlights, and so stores began stocking boxes of Pop-Tarts next to the hurricane supplies “to make life easier for customers” while boosting sales. UPS, the authors report, has fitted its trucks with sensors and GPS so that it can monitor employees, optimize route itineraries and know when to perform preventive vehicle maintenance.
Baseball teams like Billy Beane’s Oakland A’s (immortalized in Michael Lewis’s best-seller “Moneyball”) have embraced new number-crunching approaches to scouting players with remarkable success. The 2012 Obama campaign used sophisticated data analysis to build a formidable political machine for identifying supporters and getting out the vote. And New York City has used data analytics to find new efficiencies in everything from disaster response, to identifying stores selling bootleg cigarettes, to steering overburdened housing inspectors directly to buildings most in need of their attention. In the years to come, Mr. Mayer-Schönberger and Mr. Cukier contend, big data will increasingly become “part of the solution to pressing global problems like addressing climate change, eradicating disease and fostering good governance and economic development.”
There is, of course, a dark side to big data, and the authors provide an astute analysis of the dangers they foresee. Privacy has become much more difficult to protect, especially with old strategies — “individual notice and consent, opting out and anonymization” — losing effectiveness or becoming completely beside the point.
“The ability to capture personal data is often built deep into the tools we use every day, from Web sites to smartphone apps,” the authors write. And given the myriad ways data can be reused, repurposed and sold to other companies, it’s often impossible for users to give informed consent to “innovative secondary uses” that haven’t even been imagined when the data was first collected.
The second danger Mr. Cukier and Mr. Mayer-Schönberger worry about sounds like a scenario from the sci-fi movie “Minority Report,” in which predictions seem so accurate that people can be arrested for crimes before they are committed. In the real near future, the authors suggest, big data analysis (instead of the clairvoyant Pre-Cogs in that movie) may bring about a situation “in which judgments of culpability are based on individualized predictions of future behavior.”
Already, insurance companies and parole boards use predictive analytics to help tabulate risk, and a growing number of places in the United States, the authors of “Big Data” say, employ “predictive policing,” crunching data “to select what streets, groups and individuals to subject to extra scrutiny, simply because an algorithm pointed to them as more likely to commit crime.”
Last week an NBC report noted that in so-called signature drone strikes “the C.I.A. doesn’t necessarily know who it is killing”: in signature strikes “intelligence officers and drone operators kill suspects based on their patterns of behavior — but without positive identification.”
One problem with relying on predictions based on probabilities of behavior, Mr. Mayer-Schönberger and Mr. Cukier argue, is that it can negate “the very idea of the presumption of innocence.”
“If we hold people responsible for predicted future acts, ones they may never commit,” they write, “we also deny that humans have a capacity for moral choice.”
At the same time, they observe, big data exacerbates “a very old problem: relying on the numbers when they are far more fallible than we think.” They point to escalation of the Vietnam War under Robert S. McNamara (who served as secretary of defense to Presidents John F. Kennedy and Lyndon B. Johnson) as a case study in “data analysis gone awry”: a fierce advocate of statistical analysis, McNamara relied on metrics like the body count to measure the progress of the war, even though it became clear that Vietnam was more a war of wills than of territory or numbers.
More recent failures of data analysis include the Wall Street crash of 2008, which was accelerated by hugely complicated trading schemes based upon mathematical algorithms. In his best-selling 2012 book, “The Signal and the Noise,” the statistician Nate Silver, who writes the FiveThirtyEight blog for The New York Times, pointed to failures in areas like earthquake science, finance and biomedical research, arguing that “prediction in the era of Big Data” has not been “going very well” (despite his own successful forecasts in the fields of politics and baseball).
Also, as the computer scientist and musician Jaron Lanier points out in his brilliant new book, “Who Owns the Future?,” there is a huge difference between “scientific big data, like data about galaxy formation, weather or flu outbreaks,” which with lots of hard work can be gathered and mined, and “big data about people,” which, like all things human, remains protean, contradictory and often unreliable.
To their credit, Mr. Cukier and Mr. Mayer-Schönberger recognize the limitations of numbers. Though their book leaves the reader with a keen appreciation of the tools that big data can provide in helping us “quantify and understand the world,” it also warns us about falling prey to the “dictatorship of data.”
“We must guard against overreliance on data,” they write, “rather than repeat the error of Icarus, who adored his technical power of flight but used it improperly and tumbled into the sea.”
For information on how to book Viktor Mayer-Schönberger for your next event, visit PremiereSpeakers.com/Viktor_Mayer_Schonberger.