Data Classification – Building and Pitching a Rock Solid Program
In our final installment, we are going to discuss how you roll all the concepts previously covered into a plan of action. The difference between the success and failure of a data classification program is a lack of action. I have reviewed over 10 programs in my professional career and lack of action is the key failure in all. Ideas are easy to understand and easy to communicate, but hard to execute. Therefore, you need a plan. The purpose of this blog post is to provide you with enough background and understanding, based on experience, to develop a plan. Let me be clear: This is not a plan. This is instead a framework, something to develop into a plan and intelligently pitch to senior management. The following will give you the tools to persuade the required business leaders to implement a successful Data Classification program.
Corporate Obstructions
First, we must address the elephant in the room. Data Classification, at best, highlights the omissions in most corporate data management strategies. Data handling, or information governance , is a 21st century concept. Previously, the strategy was to either keep everything, or delete everything. 99% of companies kept everything forever. The question of why corporations keep so much data came to a head in the 2014 Sony Pictures breach. The amount of data and different types of data (movies, emails, financial records) exposed was staggering. The breach made every corporate board pause for a moment and reflect on their own data exposure.
“Data Classification, at best, highlights the omissions in most corporate data management strategies.”
The interlude didn’t last long. Organizations wanted to address this issue in a holistic manner, but they didn’t. The reason is simple. It is INSANELY expensive to address this problem if an organization isn’t designed that way from the beginning. When the realization occurs that they cannot fix this problem completely, the teams move to a selective, ad hoc process that focuses on the corporation’s most sensitive data.
Are There Data Types or Groups You Should Skip?
The short answer is yes. Corporate legal departments usually have some sort of a document management system (e.g., eDOCS, NetDocs, Documentum) that is used to organize and manage their documents. These legal teams must follow data retention and classification standards and requirements set by judiciaries. Another group that is historically excluded from data classification activities are compliance teams. Again, they are following requirements set by the regulatory body that governs the organization.
Four Levels of Data
The following table is a very typical example of a data classification schedule:
You will notice most of the data used and produced by the organization is classified as “Confidential.” This is normal.
Expected Outcomes
Every organization is going to be different, but the general rules remain the same:
The ratio of “Highly Confidential” data, however, is the most important takeaway. Once you sit down with business leadership, you must identify the most sensitive data in your organization. This should be easy. If the Highly Confidential data total begins to move past 3%, however, you should have another set of conversations, as the scope of Highly Confidential data probably needs to be updated or revamped. An example from IT/Cyber illustrates this point. There is usually only one file that is considered Highly Confidential in IT/Cyber: the Risk Register. Everything else is classified as Confidential.
What Tools Do I Need to Buy?
So far, everything we’ve discussed about data classification has focused on people and process. That is because the tooling for Data Classification isn’t just one tool; it’s a capability that requires many different tools. This is no different than other capabilities. Your email DLP platform, for example, will differ from your CASB or web content firewall. Several vendors have come and gone in this space over the years. The big player I have used and continue to hear good things about is Spirion (formerly IdentityFinder). As far as I know, only a few vendors do data labeling. Oracle would be an example. Data classification vendors tend to be either DLP vendors or file activity monitoring vendors like Varonis or STEALTHbits.
“A successful program must also focus on business processes rather than technology.”
The best vendor is the one that meets your requirements and use case. The very cheapest will probably be your best option because there is almost no return on investment for doing data classification or labeling in a product. You need to customize the environment around personally identifiable information (PII), protected health information (PHI), Payment Card Industry (PCI), etc., so what does a vendor get you? (I’ve exaggerated a little for effect, but you get my point).
In Conclusion
A CISO can’t build a Data Classification capability in isolation. Successful programs will include Risk Management, Legal, and Compliance. A successful program must also focus on business processes rather than technology. The global migration to data warehouse solutions reinforces this need. Data warehouses are designed to provide unlimited data integration and accessibility. They empower the business to uncover new opportunities that drive sales, marketing, and operational efficiencies. This is accomplished by removing the historic data siloing found in application-specific databases, and pooling all the data and augmenting it with other data sources. From a classification point of view, the historic cybersecurity controls are completely stripped in these systems in lieu of accessibility. Successful Data Classification programs get ahead of these challenges by being a part of the design process from the beginning. The final element of a successful program is the understanding that you cannot boil the ocean; focus on what data is important. This is only accomplished by working with the business teams to build their Highly Confidential data list. Again, focus on the business unit’s most important data. If the Highly Confidential data total exceeds 3% of the entire department’s data pool, reevaluate the data types. Otherwise, the controls surrounding Highly Confidential data handling will severely impact the business, which will be the death knell of your Data Classification program.
Additional Resources
- Report Benchmark Your Sensitive Content Communications Privacy and Compliance
- Blog Post What Is a Private Content Network?
- Blog Post Kiteworks Utilizes Its Own Private Content Network
- ArticleSecurity Risk Management [Information Risk & Assessment] –
- Blog PostUsing Virtual Data Rooms for Secure File Sharing –