Behold Big Data | Transparency and the Hacker Ethic
July 01, 2013
In recent weeks, the world has learned of multiple domestic spying programs that are being conducted by the National Security Agency. The revelations are the work of the elusive Edward Snowden, a whistle-blower formerly employed by the NSA contractor Booz, Allen, Hamilton.
Mr. Snowden, who spoke with journalists from the Guardian and the Washington Post, has raised the ire of the intelligence community, congressional leaders, and the Obama administration. However, many legal experts – including Harvard’s Laurence Tribe, who once mentored Obama as a student – have expressed concerns over the government’s lack of transparency regarding the surveillance programs. Still others, like Senator Rand Paul (R-Ky), have questioned the constitutionality of the laws that are being used to justify the government’s actions.
Nevertheless, in the intervening weeks since the story broke, an often capricious media has focused its gaze on the personality of Mr. Snowden. In an attempt to dismiss the importance of his revelations, Mr. Snowden is often portrayed in the media either as a high school dropout who is a “grandiose narcissist,” or as a traitorous low level functionary who is seeking to profit from his betrayal.
Describing Mr. Snowden to a congressional hearing, NSA Director General Keith Alexander said, “It’s clearly an individual who’s betrayed the trust and confidence we had in him. This is an individual who is not acting, in my opinion, with noble intent.” Congressman Mike Pompeo (R-KS) went further when he wrote that, “Edward Snowden is a modern-day Benedict Arnold.”
On the contrary, by going public Mr. Snowden is acting in the best tradition of the hacker ethic which celebrates open dialogue, transparency, and decentralization. Unfortunately, many media analysts ignore this important fact in their reporting and prefer instead to investigate Mr. Snowden’s former girlfriend. This facile approach to the story both misses the point of Mr. Snowden’s actions and fails to recognize the profound paradigm shift that his actions have exposed – a digital revolution that is the inevitable outcome of evermore powerful processing and the mercurial rise of Big Data.
Unlike many of his detractors, Mr. Snowden appears to be serious, informed, and earnest. In a recent interview with the journalists Glenn Greenwald, Laura Poitras, and Barton Gellman in Hong Kong, Mr. Snowden explained his motivation, “”When you’re in positions of privileged access… you see things that may be disturbing…and you recognize that some of these things are actually abuses… over time that awareness of wrongdoing sort of builds up and you feel compelled to talk about it [to superiors]… and the more you talk about it, the more you’re ignored…until eventually you realize that these things need to be determined by the public not by somebody who was simply hired by the government.”
Mr. Snowden possessed this privileged access because, as an information security professional, he has a unique and valuable skill set. And it is the recognition of his background – as a systems engineer; a systems administrator; a senior adviser for the CIA; a solutions consultant; and a telecommunications systems information officer – that should alert the uninitiated about the grave dangers that these programs may pose. Based on his public pronouncements, professionals in the infosec and hacking communities understand that he is more than just another IT guy with a traitor’s temperament. Mr. Snowden should not be compared to Benedict Arnold; he should be compared to Paul Revere.
Thanks to the alarm raised by Mr. Snowden, the Guardian and Post revealed that the NSA and FBI have obtained “direct access” to the systems of Microsoft, Yahoo, Google, Facebook, AOL, Apple and other US internet giants through a program called Prism. Because these internet companies have denied cooperating with the NSA in the collection of this data, it begs the question: has the NSA hacked into the systems of the the most data-rich corporations in the world?
We also now know that they collect metadata from all of Verizon’s customers; they collect metadata for internet address packets and device signatures through a program called Blarney; and in describing the program called, BoundlessInformant, the Guardian states that, “the top-secret BoundlessInformant tool details and maps by country the voluminous amount of information it collects from computer and telephone networks.”
While some of the programs initiated under the Bush administration ended as recently as 2011, documents reported on by the Guardian indicate that the collection of American’s data continues today; and with each passing day, new revelations emerge. The history of digital data collection by the NSA and other governmental agencies is long and comprehensive. Although the programs in these revelations have only recently garnered media scrutiny, the government has spent decades developing its digital data mining tools.
Data analysis is the heart of signals intelligence and an important tool in the arsenal of national defense. From the cold war’s Echelon program of the 1960’s through the post 9-11 programs like Stellar Wind, Talon, Advise, the Total Information Awareness Office, and the more recent Intelligence Community Comprehensive National Cybersecurity Initiative Data Center, the US government and its allies have made use of data mining tools. Moreover, the US government coordinates with allies in the collection of data.
A recent article in the Guardian details a program called, Tempora, operated by Britain’s spy agency, GCHQ. Tempora’s “key innovation has been GCHQ’s ability to tap into and store huge volumes of data drawn from fibre-optic cables for up to 30 days so that it can be sifted and analyzed.” As a consequence of these and other programs, the NSA, GCHQ, and their partners process and analyze vast amounts of data. Much of this data targets bad actors, but it also includes copious amounts of data on people who are completely innocent of any wrongdoing. The question isn’t whether these capabilities should or will be developed; it is already a fait accompli. The question is how will they be used?
In the aftermath of Mr. Snowden disclosures, President Barack Obama made a televised statement defending the programs by saying: “trust us.” The President went on to say that, “The people who are involved in America’s national security, they take this work very seriously. They cherish our constitution. The last thing they’d be doing is taking programs like this to listen to somebody’s phone calls.” This may be true. However, by reducing these programs to listening in on “somebody’s phone calls,” President Obama dismisses the complex, expansive, and invasive nature of what is most likely the largest aggregation of data in history.
The Guardian journalist Glenn Greenwald explains, “There is a massive apparatus within the United States government that, with complete secrecy, has been building this enormous structure that has only one goal; and that is to destroy privacy and anonymity, not just in the United States but in the world. That is not hyperbole. That is their objective. To make it so that every single form of human communication, human interaction, human behavior can never be beyond their reach. And they have developed extraordinarily sophisticated technologies and enormously expensive mechanisms in order to make that happen. And it’s well past time that we have a debate about whether that’s the kind of country and world in which we want to live, but we haven’t had that debate because it’s all done in secrecy.”
It is precisely because of the exponential growth in the capacity and power of these systems that the government is able, with very little effort, to conduct these programs in secret. Very few people outside the cloistered walls of either Silicon Valley or the intelligence community understand the emergence and consequences of the Big Data revolution. Eric Schmidt of Google recently stated that every two days we create five exabytes of data (one exabyte = one quintillion; 1018). That, according to Schmidt, is “equal to all the information created from the dawn of man through 2003.” By 2012 the amount of global data had grown to 2.7 zettabytes (one zettabyte = one sextillion; 1021) which is 500 times the amount of all data generated by 2003; and it will be three times larger still by 2015. The challenge of Big Data is that the flood of such large amounts of unstructured data does not fit neatly into existing organizational schema.
Big Data refers to complex data sets characterized by enormous volumes and expansive variety, all of which are generated at a higher velocity than ever before. With data sources growing by orders of magnitude, the organization, analysis, and security of the varied data streams must be addressed. The private and governmental sectors have both created systems that answer these challenges.
MapReduce-based computing, Google’s Dremel inter-active query system, and Apache’s Hadoop are among the available resources to address the Big Data challenge. But they are accompanied by overwhelming demands on the infrastructure which is necessary to support the distributed processing. These processes must extract data from unstructured sources and conduct statistical, relational, geo-spatial, and temporal analysis across infrastructure that is built for large-scale, distributed, data-intensive jobs that spread the queries over clusters of server nodes.
Additionally, as the universe of the data clusters expand, so too do the potential attack vectors. Consequently, agencies tasked with security must protect against malicious actors while embracing the hacker ethic of transparency. The complexity of unstructured data requires a complex yet powerful response.
The centerpiece of the NSA’s data-processing capability is Accumulo. As reported in Information Week, Accumulo is based on Google’s BigTable data model, but the NSA came up with a cell-level security feature that makes it possible to set access controls on individual bits of data. This capability allows intelligence analysts to access information which would otherwise have to wait for sanitized data sets scrubbed of personally identifiable information. The NSA has shared Accumulo with the Apache Foundation, and the technology has since been commercialized by Sqrrl, a startup launched by six former NSA employees joined with former White House cybersecurity strategy director (and now Sqrrl CE0) Ely Khan.
According to Information Week, “Sqrrl has supplemented the Accumulo technology with analytical tools including SQL interfaces, statistical analytics interfaces, text search and graph search engines, and there’s little doubt the NSA has done the same, according to Kahn. Graph search, in particular, is a powerful tool for investigation, as the NSA itself revealed last month when it shared at a Carnegie Mellon technical conference an in-depth presentation on the 4.4-trillion-node graph database it’s running on top of Accumulo.” The nexus between the governmental and commercial development of these technologies will only grow stronger with each new breakthrough. And the line between what is public and what is private will become increasingly blurred. The question is: how will these developments impact the basic fabric of the American polity?
A better understanding of these technologies by the public is essential for a meaningful dialogue to take place. One hopes that the recent revelations by Mr. Snowden will act as a catalyst for an expanded effort to educate the public about Big Data’s potential costs and anticipated benefits. Furthermore, a recognition by the government that the hacker ethic of open dialogue, transparency, and decentralization must be embraced. And as the government pursues powerful new security measures, it needs to be mindful of the privacy concerns of its citizens.
An open society requires freedom of thought and expression; a free exchange of ideas is its hallmark. Freedom of thought sometimes gives rise to the expression of ideas that challenge the received wisdom and norms of society. Great ideas do not grow in a vacuum; they are the result of deep and profound probing into the most essential questions that define every epoch. And the right to inquire, to test the unpopular idea, to speak truth to power can never endure if the right to privacy is abridged; it has an inevitable chilling effect on the free exercise of the First Amendment. This is the alarm sounded by Mr. Snowden. This will be his legacy.
by: D.L. Christopher | firstname.lastname@example.org
Edward Snowden: Sounding the Alarm
July 2, 2013 by