Metadata is data about data. That is to say, metadata reflects macro trends that track the evolution of a massive data set. Not surprisingly, in this age of Big Data and data mining, there is a lot of interest in plowing through metadata to see what it might reveal.
The biggest news in Big Data in 2013, no question, has been the revelation of massive data mining operations by the U.S. government and its contractors. The data has been culled from customer records kept by companies like Verizon, the largest phone carrier in the United States. It has been known for years that the National Security Agency (NSA) tracks phone records, emails and other individualized forms of data that are helpful to hunting terrorists. But, less noticed in the most recent revelations about government data mining efforts has been that the government appears to be doing macrodata mining.
This, however, begs an important question. What exactly does the government expect to gain from the metadata it is assembling? Obviously, tying a couple bank transfers from Russia to a Syrian extremist in Florida who just bought a ticket to Disneyland is effective in spotting a terrorist plot in the making. But, what does the government gain from looking at macro trends in pools of Big Data? Unsurprisingly, the NSA is not very forthcoming about what it intends to spot in the metadata it gathers.
Privacy advocates have equated the efforts to a police search where the government goes door-to-door without even knowing who they are searching for. These intrusions are conducted without any probable cause and without warrants to acquire the information. In American law, a warrant always requires specifics about the individuals involved, the materials that are hoped to be discovered and what the reasoning of the government is to justify the intrusion. Most privacy advocates in the United States believe that the metadata mining program amounts to the government conducting warrantless searches against the communications of all American citizens.
Security specialists are quick to point out that any intrusion into any specific individual conversation or correspondence still falls under the auspices of wiretap laws going as far back as the 1890s. Since the NSA has access to FISA courts, special secret courts that handle sensitive national security cases, to obtain warrants in cases where a criminal or national security threat exists, there is no evidence that any of the data mining activities have in fact violated the law.
Patterns in the data mined show that most of the focus is on countries that are seen as major threats to American national security. The heaviest concentration of data gathering was focused on Iran and its nuclear program. A close second is Pakistan, the country where Osama bin laden was found hiding in 2011, and where a number of al-Qaeda operatives are still believed to be at large. Intriguingly, the third most studied country is Jordan, one of America’s closest allies in the Middle East. Other hotspots included Egypt, China and India. Another surprising country under heavy surveillance by the U.S. is Germany, also a long-time ally. The NSA has said the data is anonymous. It is not possible to identify any particular individual based on any of the data.
What does the NSA hope to gain from the metadata? One can guess based on similar projects like Google’s flu tracking system. Perhaps the NSA is trying to track events like the 2011 Arab Spring like a disease outbreak. The NSA remains tight lipped about the whole affair. Anyone who knows clearly isn’t telling.
About the Author: David Angeles is currently blogging for an IT consulting new jersey company. He usually blogs about different IT services and any recent technology news.