I have been doing consulting around data protection services for quite some time now and have noticed three common mistakes CISOs make while choosing/implementing DLP solutions:
- They feel data classification is necessary,
- DLP Technology will solve their leakage problems and they will be able to identify sensitive data across the enterprise using storage DLP and protect at network and end point
- They are confused as every vendor comes and tells them about their “great” product and 1-2 issues with other products
Let me put my thoughts across on this subject and an approach to DLP.
First data classification and data labeling shall not be confused, while data classification is good to start to know what is critical so that it can be adequately protected, but it is not absolutely necessary. Data labeling is a tough task and changing the entire enterprise culture to help them identify, label and maintain them either using processes or some auto-classification enforcement tool is a tough task. CISOs think by labeling, DLP will read them and baseline them so that appropriate rules can then be added but truth is that the amount of data movement which is secret and confidential is a lot more to analyze and hence can’t be generalized and converted to rule base.
One has to know the authorized movement of data in all of their business processes. It means, if payroll information goes to finance from HR on 15th of every month on emails and then from finance manager to third party for processing, a simple rule asking payroll from HR to finance (person specific) and then from finance to third party shall be allowed for specific time and only through SMTP. Rest all other payroll traffic shall not be permitted is the right approach. Now to further reduce the risk, one can always classify the payroll as secret and use encryption technology integrated with DLP for encryption or rights protection. But, that is entirely different topic to reduce all data leakage risks.
Second issue of thinking that technology will solve the problem is a myth. You need a DLP solution and a solution is technology + processes.
First, setup the governance for data protection - Classification policy, exception approval and incident management. This shall be followed by creating the processes to get confidential data from businesses processes and manage them regularly. You will need their inputs to verify data movement, its importance, clarification for data movement inside and outside the enterprise etc. This shall be followed by incident management process to identify, report, escalate and close the data leakage incidents. You cannot take a call from IT/ security perspective for business data leakage. It has to be decided by business owners.
From the perspective of product vendors, one should identify its own business requirement and objective of implementing DLP and then map product features with its business requirements. Further, create use cases and ask vendors to demonstrate its usability. For example – if the objective is to protect regulated data (card information, PII, SSN etc.) then it is relatively simple and focus on the DLP tool which is easy to manage, has patterns to identify regulated data and can generate detailed compliance specific reports. If you wish to monitor structured data, focus on feature that can find exact matches, has better accuracy and performance and so on and so forth.
First, there is hue-and-cry for data classification as a first step for data protection and I am not a big fan of this because of two reasons-
- Leaving data classification in the hands of end users is difficult and they tend to confuse, over classify data
- It is impossible to classify and label terabytes of data in structured and unstructured data.
What is more effective is data flow analysis, meaning for all the business processes like payroll, billing etc. identify what data the process owner gets, from where, in which format, then what processing is done, in which format and which channel and finally what output is generated, where it is stored and transferred in which form.
Based on the input of data, processing of data and output of data, one should ascertain data ownership and its authorized movement. This will clearly indicate the authorized movement of data. Allow data to move to authorized people via authorized channel through this process only and block everything else.
This process has a challenge of accurately identifying as not everything can be patterned or protected with keywords, further there is a challenge of data dynamics, i.e. changing over a period.
Remember that Data classification is a mean and not an end, and it should not be considered as one off activity.
It is a challenging task to classify terabytes of data and its copies for DLP to recognize.
Anyway, a better way is to address the challenge is - mass classification for unstructured data. For structured data, create those data sets, which move out in some form like customer transaction statement on a monthly basis in bank, then create pattern for them and protect it accordingly.
I believe the right steps are -
- Garnering executive support – so that business owner gives time to get the data identified in its business processes
- Establish governance – Identify data owners, define clear roles and responsibilities for business, IT, compliance, information security departments and regulators, governance for ongoing data leakage incident management
- Perform data flow analysis – Identify flow of information in business processes, its sensitivity, its users, form and medium and perform information classification, ownership establishment and risk analysis
- Perform technology controls/ implementation like DLP/ DRM/ encryption etc.
- Sustenance framework – Incident management, improvement, reporting, rule base fine tuning
- If these steps are executed correctly, data protection will become much easier.
Lastly again touching base on data classification – my view is that data should just be divided into three forms - regulated data, which you need to monitor, protect and report to regulators, business sensitive data- which matters the most and routine transaction data- an output of general business processes. In this way, protection is sure for first two, reporting for first and monitoring for all.
More reading: