What is our primary use case?
Bigger companies would benefit from Google Cloud Data Loss Prevention due to their variety of data. Small companies can also utilize it effectively. For example, I have seen companies using Google BigQuery as storage for all kinds of data as their backend, using minimal but very useful services. Google Cloud Data Loss Prevention service is built in with BigQuery, which can be helpful. Some clients operate full-fledged, higher-budget projects and definitely use it. However, some startup companies wanting to conduct research with privacy or data protection considerations can utilize these specific services in a minimal way.
For instance, any company has several data sources. They claim to have sensitive data but do not know where it is located or what data is sensitive. If someone asks them to tell where their sensitive data is and classify it, they would start working manually and would take four or five months to detect only that sensitive data. Google Cloud Data Loss Prevention helps considerably in this situation. They have a predefined template. Whatever category comes under sensitive data has been researched and included in the template, with almost 99% accuracy. A person needs to attach the template and scan their data. Once initiated, it will take time, such as 24 hours or 48 hours, depending on the size of data.
Google Cloud Data Loss Prevention has a system, but it is not very mature. There should be a plug-and-play capability. For example, if a customer knows that Google Cloud Data Loss Prevention would be helpful but needs plug-and-play integration, this is a challenge. If sensitive data is available in on-premises cloud, on a machine, or in a database that is very large, such as 10 TB, Google should have easy plug-and-play options available. They could provide agents or something similar that helps continuously scan or provide reports, which is currently missing. Additionally, because the service is costly and every scan takes time, this is a consideration.
What is most valuable?
The good feature is that for someone using different programming languages in their system, such as Java, Ruby, or Python, Google Cloud Data Loss Prevention offers API integration. They can call Google Cloud Data Loss Prevention services and perform whatever tasks they want. For example, for a UK-based company with government identity proofs, such as those taken by loan companies or banking companies for loans and stored in one location, masking those photos or sensitive information is important. The picture should be blurred or other modifications need to be done. This can be achieved through programming methods by calling Google Cloud Data Loss Prevention services and modifying existing data. The actual data is not altered. For example, they can blur the image name or age or other very sensitive information. The copy of data can be placed somewhere and used for office or other purposes. Someone can see the data and correlate it, but they cannot see the sensitive information. These tasks can be achieved by Google Cloud Data Loss Prevention services in a programmatic way, very quickly and very efficiently.
PII data is very important for compliance. For example, if someone is taking business in the UK and has earned thousands or millions of dollars but missed the GDPR requirements somewhere, the penalty would be more than that profit. Price does not matter when discussing privacy and compliance. However, the system should be more secure.
What needs improvement?
When someone asks where sensitive data is located and to classify it, they would start working manually and would take four or five months to detect only that sensitive data. Google Cloud Data Loss Prevention helps significantly. They have a predefined template, and whatever category comes under sensitive data has been researched with almost 99% accuracy and included in the template. A person needs to attach the template and scan their data. Once initiated, it will take time, such as 24 hours or 48 hours, depending on the data size. Once the template is applied and the system scans it, a tagging report of the sensitive data locations is returned. A report is available, and what previously would have taken a month can now be achieved by Google Cloud Data Loss Prevention services in detecting sensitive PII data within 48 hours maximum, even if the data size is larger.
Once the report is available, the person would know where the sensitive data is located. Dealing with sensitive data is the next question. If data is not to be shared, it has to be hidden, masked, or changed. All kinds of things can be applied to dealing with that data. Google Cloud Data Loss Prevention has another feature available called de-identification services, which deals with sensitive data in applications or production environments.
For how long have I used the solution?
Google Cloud Data Loss Prevention is generally about data security. The companies that would benefit most are those with more personalized data. I would say banking and pharma are the two major stakeholder sectors, with companies in the pharma domain and the banking sector being primary users.
What's my experience with pricing, setup cost, and licensing?
Pricing is based on data size. Google offers different pricing models, and they provide good discounts when someone commits to a long-term engagement with their Google services.
What other advice do I have?
Further strategies can be defined based on specific use cases. For example, in an R&D company where doctors are dealing with sensitive data and need to share it, if two doctors are sitting in different countries with sensitive data that they need to see because they know what to do with it, a further strategy can be defined. If both doctors are sitting in different geographic locations and must see their observations of what is happening in a patient's adverse event, Google has another service called VPC Service Control. This service can apply additional protection by preventing Google Cloud Data Loss Prevention data exfiltration. For example, if it is detected that four projects have sensitive data that should not be shared with everyone but should be shared with a specific person or specific project, not only through human sharing but also through machine sharing and machine process sharing, strategies can be defined based on this.
Google's VPC Service Control service can be used to make data exfiltrations restricting or to restrict copying from one place to another. The copy can be one-directional rather than restricting bi-directional transfers. These are strategies that can be defined once the report is available, which is very helpful to obtain through Google Cloud Data Loss Prevention services scan.
The first advantage is that text-based queries are available where they can query in an easy way. Google Cloud Data Loss Prevention service is helpful in putting, converting, or transforming all data into text. There is an easy option to export in BigQuery. BigQuery has another feature where someone can make a custom query and obtain data based on that query. Further decisions can be taken or business analytics can be performed to achieve business orientation.
However, to achieve this or for someone who avails those services, more dissatisfaction may occur because documentation is not easily available in the public domain for Google. For example, compared to other clouds, most use cases required are already published by other clouds with the services. For example, if someone wants to automate in their existing system, if ten people are searching, nine people will find the exact use case they wanted to implement in their documentation.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Google