Six degrees of separation is the theory that everyone and everything is six or fewer steps away, by way of introduction, from any other person in the world, so that a chain of "a friend of a friend" statements can be made to connect any two people in a maximum of six steps.
Which means a small dataset, when iterated to the max, can generate enormous returns, something akin to how big data can destroy anonymity using a correspondingly limited data set as per the six degrees example.
In the study, titled “Unique in the Shopping Mall: On the Reidentifiability of Credit Card Metadata,” a group of data scientists analyzed credit card transactions made by 1.1 million people in 10,000 stores over a three-month period. The data set contained details including the date of each transaction, amount charged and name of the store.
Although the information had been “anonymized” by removing personal details like names and account numbers, the uniqueness of people’s behavior made it easy to single them out.
In fact, knowing just four random pieces of information was enough to reidentify 90 percent of the shoppers as unique individuals and to uncover their records, researchers calculated. And that uniqueness of behavior — or “unicity,” as the researchers termed it — combined with publicly available information, like Instagram or Twitter posts, could make it possible to reidentify people’s records by name.
The result, coming on top of earlier demonstrations that personal identities are easy to pry from anonymized data sets, indicates that such troves need new safeguards. “In light of the results, data custodians should carefully limit access to data,” says Arvind Narayanan, a computer scientist at Princeton University who was not involved with the study. Or as the study's lead author, Yves-Alexandre de Montjoye, an applied mathematician at the Massachusetts Institute of Technology (MIT) in Cambridge, puts it: When it comes to sensitive personal information, “the open sharing of raw data sets is not the future.”