LocalBlox, a data firm that bills itself as "a powerful, scalable and distributed data acquisition platform" is the latest company to mistakenly leave data out in the open on a publicly accessible Amazon Web Services (AWS) S3 bucket.
The company, based in Bellevue, Wash. left a slew of data online; 48 million records containing information on tens of millions of individuals including names, addresses, and dates of birth. The dataset also included data apparently scraped from Twitter handles, along with LinkedIn and Facebook profiles. Data from Zillow, a popular real estate site, has also been scraped and composited into the dataset.
The company was notified of the unsecured bucket by researchers with UpGuard, a Mountain View firm that's had a knack for uncovering data sets like this as of late. The firm notified LocalBlox on February 28 and the bucket was secured later that day, UpGuard said Wednesday.
The bucket contained a single 151.3 GB compressed file that decompressed to a 1.2 terabyte Newline Delimited JSON file. According to researchers, who combed through the dataset when they first came across it in a subdomain, “lbdumps,” on February 8, each record is in JSON format.
Blog Post The Data Breach (Amazon) Bucket List |
Also of interest are source fields that appear to be scraped from other data sources.
"Some are fairly unambiguous, pointing to aggregated content, purchased marketing databases, or even information caches sold by payday loan operators to businesses seeking marketing data," researchers with the firm wrote.
Like many caches of data left unsecured in a S3 bucket, anyone could have accessed and downloaded the files without a password.
News of the leak comes almost a month to the day after it was revealed upstart voter-profiling company Cambridge Analytica had mined the private data of 87 million Facebook users, most of them from the U.S., without their permission. The data firm gathered the information via a personality quiz app designed by researcher Aleksandr Kogan which leveraged Facebook’s Graph API.
Judging by the names of some fields in the dataset UpGuard suggests LocalBlox may have scraped data on users from Facebook's HTML, instead of the API.
Regardless it demonstrates how easy it can be - or at least was - for a data analytics firm to harvest data from Facebook. Facebook of course, in wake of the Cambridge Analytica scandal, announced plans to restrict data access on the platform earlier this month. The service has since curbed how its Pages API, Groups API, and Events API, to name a few, function.