Skip to main content

Big Data Means Big Risks

by Paul Roberts on Friday August 6, 2021

Contact Us
Free Demo

The research firm Binaryedge says it found more than a petabyte of data stored in high performance databases like MongoDB is exposed to the public Internet.

We hear a lot about the incredible value of data analysis to modern businesses. Simply put: most every business these days is data-driven, either directly or indirectly. If you’re Uber, you’re using “big data” analytics to optimize the work of your drivers and your platform. If you’re Target stores, you’re using it to anticipate shoppers’ desires and preferences based on their past behavior – and those of like-minded shoppers.

Behind every big data deployment is a range of supporting technologies like databases and memory caching systems that are used to store and analyze massive data sets at lightning speeds. Many of the most popular platforms, such as MongoDB and Memcached, are also offered as open source projects – greatly reducing the cost of deploying them. That, in turn, has fueled rapid adoption of these technologies by a wide range of established companies and start-ups.

But, as we know, with great power comes great responsibility. And a new report from the security research firm Binaryedge suggests that many of the organizations that are using these powerful data storage and analysis tools are not taking adequate steps to secure them. The result is that more than a petabyte (a thousand terabytes) of stored data is accessible to anyone online with the knowledge of where and how to look for it.

In a blog post on Thursday, the firm reported the results of research that found close to 200,000 such systems that were publicly addressable. Vulnerable systems were found on networks of firms ranging from small start-ups to Fortune 500 firms. Many were running vulnerable and out of date software and lacked even basic security protections such as user authentication, the company said.

In a scan of the public Internet Binaryedge said it found 39,000 MongoDB servers that were publicly addressable and that “didn’t have any type of authentication” and another 7,000 that were addressable but did require authentication. In all, the exposed MongoDB systems contained more than 600 terabytes of data in those systems, stored in databases with names like “local,” “admin,” and “db.”

A scan for deployments of the open source Redis key-value cache and store technology uncovered 35,000 publicly addressable instances that could be accessed without any authentication. Those systems contained about 13 terabytes of data stored in memory.

Scans for Memcached, a popular open source memory caching technology that is used to accelerate database-driven web applications, revealed 118,000 exposed instances online containing around 11 terabytes of data, Binaryedge reported.

A scan for instances of ElasticSearch, a commonly used search engine based on Lucene, found around 9,000 exposed instances containing 531 terabytes of data.

Because Binaryedge didn’t interrogate the systems it found, we don’t know what kind of data is stored on these systems or how useful it might be to malicious actors. But given that there’s more than a petabyte of data out there, it is reasonable to assume that some of it is sensitive in nature. Because technologies like Memcached are used as cache servers, the data they contain is constantly changing. That means an attacker who accessed them could benefit from a continuous stream of new information, for example: authentication session information, Binaryedge noted.

Binaryedge said the problem with these systems appears to be that they are meant to be deployed in secure environments and accessed only from secure clients. Documentation included with many of the technologies makes this clear, however, many of the companies deploying them don’t recognize that limitation.

“Companies are still figuring out how to use these technologies and by default they are not secure,” the company wrote.

The post should be a wakeup call for users of the technologies profiled by Binaryedge, but also for similar open source tools and technologies that are powering public facing web applications and big data deployments. As Binaryedge noted: it only tested for six technologies. That doesn’t mean that other, vulnerable platforms aren’t also prevalent online.

The larger message to companies is that they need to be more aware of where their data is moving and residing. Now that data analytics and big data deployments have become an area of intense investment by firms, progress towards more and better visibility into mission sensitive data is moving at a breakneck pace. Unfortunately, it appears that security and data privacy concerns may be taking a back seat amid that rush.

Paul F. Roberts is the Editor in Chief of The Security Ledger and Founder of The Security of Things Forum.

Tags:  Data Security

Recommended Resources

The Definitive Guide to Data Loss Prevention
The Definitive Guide to Data Loss Prevention

All the essential information you need about DLP in one eBook.

6 Cybersecurity Thought Leaders on Data Protection
6 Cybersecurity Thought Leaders on Data Protection

Expert views on the challenges of today & tomorrow.

Digital Guardian Technical Overview
Digital Guardian Technical Overview

The details on our platform architecture, how it works, and your deployment options.