How Should You Classify Your Data? A Guide to Using Context-, Content-, and User-Based Data Classification Effectively

by Bill Bradley on Thursday December 20, 2018

Part 3 in our Definitive Guide to Data Classification series discusses different approaches to data classification with guidelines on choosing the right method for your organization.

Welcome to part three in our blog series on The Definitive Guide to Data Classification. If you have read the first two in the series you understand what data classification is and why you need it to drive your information security strategy. In this installment we will discuss the ways to classify and how to best choose the right method based on your business challenge.

Starting at the most basic level, there are two ways to perform data classification: automated and manual. Automated classification can scale quickly, while a manual approach will give a direct touch to the data. To be successful your data classification, you should leverage both methods. Analyst firm Forrester has the following to say here:

“Dynamic data classification requires the integration of both manual processes involving employees as well as tools for automation and enforcement.”¹

Within that spectrum, these three different approaches are the industry standard for data classification:

Content-based classification
Context-based classification
User-based classification

Each method analyzes a document and assigns a classification level to it; this “tag” is what drives data protection decisions and actions. How each company arrives at that decision, however, varies.

Content-based classification inspects and interprets files looking for sensitive information. Methods include fingerprinting and regular expression. This approach answers the question “What is in the document?” and relies upon examining the information inside the file, using a number of different techniques such as regular expression, fingerprinting, or Bayesian engines.

Context-based classification looks at application, location, or creator among other variables as indirect indicators of sensitive information. Context-based answers: How is the data being used? Who is accessing it? Where are they moving it? When are they accessing it? If content looks inside the box, context looks at the shipping label.

Both content- and context-based classification have varying levels of automation in them to drive rapid deployment, scalability, and accuracy.

Finally, user-based classification depends on a manual, end-user selection of each document. User-based classification relies on user knowledge and discretion at creation, edit, review, or dissemination to flag sensitive documents.

Each of those three deliver value, but to be most effective they need to align with the primary business need.

Is your challenge mainly protecting PCI/PII, PHI, or GDPR-protected data? Regulated data is often structured data with a consistent pattern. Leading with a content-based classification will provide the greatest ability to accurately classify PII, PHI, PCI, and GDPR data.

Contrast that with the “anything goes” that is typically the case with intellectual property (IP) data. To address this, context-based classification looks to other attributes of the document to assign a classification. For example, all documents created in AutoCAD likely contain proprietary engineering specifications.

When it comes to including the end user in your overall security program, user-based classification is ideal. Data owners should know their data best. A user-based classification approach allows them to apply this knowledge to improve classification accuracy.

Data classification drives amazing insights about your organization, but to realize them with accuracy you need to look for the right method. Content-, context-, and user-based approaches can both be right or wrong depending on the business need and data type. Automation helps with enterprise scalability while manual approaches apply the human understanding of data that cannot easily be achieved any other way.

Many enterprises realize each of the challenges above, and a mixed classification approach often delivers the most accuracy and visibility. Finally, it is important that any data protection solution you use can see and interpret each of this tags, understand what to do when there is a conflict between them, and apply protective measures based on classification levels. For more information about how data classification can improve your data security program read our Definitive Guide to Data Classification eBook here.