Data loss prevention is a noble endeavor – after all, no organization can accept knowing that sensitive data is flowing unchecked out of their systems. But the methods taken to prevent this loss are letting organizations down. All too often current solutions from DLP vendors both large and small cause more problems than they solve simply because of the way they are created to prevent any kind of sensitive data from being leaked.
While I know this to be the case, I decided to conduct a small experiment to really see what happens when a DLP rule is set.
The impact of just one DLP rule…
I decided to create one DLP rule on Google’s Workspace DLP and define it on Reco’s Google Workspace account. In case you missed it, Workspace DLP allows organization admins to create and apply rules to control the content that users can include in files that are shared externally through Google Drive. When the defined sensitive data is included in an externally shared document, Google will automatically email an alert to the Workspace admin.
In this case, I defined a rule to detect a single piece of sensitive data (a social security number) in all files shared externally from Reco through Google Drive.
Image 1: Configuring the rule on Google Workspace
10 minutes later (nice and quick), I received 22 alerts in my email relating to instances where this particular piece of sensitive data was shared with an external user.
Image 2: One of 22 Google Workspace alert emails
The alert itself provides generic information as follows: the trigger for the alert; when the event occurred; who triggered the alert; who the file was shared with; the document link; and which rule it pertains to.
However, what the alert does not tell me is the context in which this document was shared, and whether it is a justified action, so my only option as the analyst receiving the alert is to open each file and read the content to ensure that the action was justified.
Image 3: Alert details demonstrating that the defined sensitive data has been shared externally
Scaling up to all sensitive data in an organization
So, 22 emails with 22 files to open may not seem too bad, but scaling up this exercise to all pieces of sensitive data across the organization would quickly send a tsunami of emails back to the IT security team to wade through.
It would be impossible to define all the individual pieces of sensitive data by name or type, so the IT security team would need to simply define generic rules on the data to catch as much as they can. Too much information in fact, alerting for, or blocking nearly every document in the organization.
Furthermore, this alert is a false positive – the relevant information was shared with an external accounting firm to pay an employee. But the alert itself does not provide the context to understand that this alert is a false positive, so the only way to verify that for sure is to open every document to check its contents, further increasing the burden on the IT security team.
And if this person doesn’t already have access to the file, they will need to request access for each one. If they aren’t justified to have this access, they add risk to the organization simply by trying to remediate a potential risk.
Content might be useful, but context is critical
In the above example, knowing the content of the various documents containing this piece of sensitive data is useful. It enables the individual (or even system) who reads the content to make a judgment as to whether the action is justified, but it also comes with ethical and resource implications.
That’s where context comes in. A security tool that is able to understand the context of an action will ignore a large majority of actions, reducing the amount of alerts it makes. This will then save the organization time in remediating these issues, enabling quicker resolution for genuinely malicious actions, less stressed security teams, smoother workflows, and less intrusive security overall.
Research carried out by Gartner® and reported on in their 2023 Strategic Roadmap
for Data Security Platform Adoption found that:
“Semantic and contextual sensitive-data visibility and control — This is a high priority capability. Most of the vendors converging on the DSP opportunity have some sort of data discovery capability included. However, large gaps remain because the existing capabilities are examining data stores either for metadata only or for well-known identifiers — they are not finding out what something really is. For example, if a data classification tool finds a date, then it does not know whether it is a date of birth, a transaction date or the dateline of a newspaper article — effectively making data discovery useless in complex environments like large organizations.” 
For us, this means, current data security tools don’t make enough use of AI or machine learning support to analyze what a piece of data is, and the context in which it is being used.
Each of those uses of a simple date will need a different level of protection, but they do not all need to trigger an alert. Using AI or ML to understand the context in which that date is used will reduce the alerts when a document containing a date is shared, drastically reducing the number of false positives.
Creating a fit for purpose data loss prevention tool
“Paradigm shift from need-to-know to need-to-share — Traditional data security products were built with one goal in mind — to locate and block an attacker… Clients were promised that if there were a breach, all these log entries would suddenly make sense and reveal what to block. This is like seatbelts. Modern data security products are more like navigation systems. Organizations need to use or share their data (internally or externally) and security products, must enable this. Business realities mandate how data needs to be used — and security enables this. These are completely different mindsets and completely different product architectures.”
Organizations are increasingly giving external partners direct access to their documents in order to achieve collaborative working with them. As a result, security controls must enable data sharing, and they must be able to manage discoverability, reuse, and resharing of that shared data by unauthorized users.
And the way to do that is through, you’ve guessed it, context.
Gal is CTO at Reco. Reco is creating the world’s first collaboration security platform with an AI engine which uses context, not content, to identify unjustified actions, thereby only alerting for the actions that truly require alerts, and supporting organizations to collaborate securely without interruption. Request a demo to learn more about how Reco can help secure your organization’s data.
 Gartner, 2023 Strategic Roadmap for Data Security Platform Adoption, Joerg Fritsch, Brian Lowans, 22 September 2022. GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved