Data privacy is a moving target. Organizations are taking in increasingly vast amounts of data—on customers, partners, users, vendors, employees. At the same time, local, national, and international laws governing data privacy and protection are changing almost constantly.
Against this backdrop, Mark Cockerill, vice president of legal in EMEA and head of global privacy at ServiceNow, says we lack a global consensus on what data privacy is and how to approach it. “People are shaped by their experiences,” he says, “so various countries and regions have different ideas of how important data privacy is, or what it even means.”
However, security teams and analysts do agree that companies must invest in a data privacy infrastructure that incorporates “privacy by design.” Rather than taking in data and attempting to secure it post hoc, privacy by design builds data privacy protection into the data collection process.
To that end, organizations are using tools like artificial intelligence (AI) and machine learning (ML) that parse and secure vast quantities of data as it’s collected. But how does that work? Why does it matter? When machines come in contact with human data, what ethical questions are raised?
Manual tools complicate compliance
Organizations are now processing too much data for human operations to parse and secure. That’s a problem. If personal information or personally identifiable information (PII)—think credit card numbers or GPS coordinates—falls into the wrong hands, bad actors can steal financial data or engage in identity theft. But organizations can’t secure their data if they don’t know what they have or where it is stored. And with so much data pouring in all the time, businesses are losing visibility into their data infrastructure.
ServiceNow and BigID surveyed IT and engineering leaders to understand how they’re handling privacy at their own organizations; BigID partners with ServiceNow to manage sensitive and private data. The survey revealed that businesses are struggling to comply with regulatory and compliance guidelines.
The General Data Protection Regulation, or GDPR, is especially challenging. That’s because compliance with GDPR requires complex documentation and collaboration across the organization to identify what data the company has, who owns it, and how it is being processed. Despite the staggering amounts of data organizations are handling, many are still using manual tools and processes to keep track of it.
According to survey data, companies are mostly using Excel sheets (53%) and data mapping or visualization tools such as Vizio (41%). With a heavy reliance on manual tools, many respondents are simply scanning and identifying data in structured sources (40%), have yet to analyze data in both structured and unstructured locations (12%), or have no initiative in place to scan sensitive data (4%). This heavy reliance on manual tools has made it harder for companies to proactively manage and keep track of regulated data, which is vital for compliance with GDPR and other regulations.
Many don’t seem to be proactively building privacy by design into their processes and products. One-third of organizations say they simply react to the changing privacy landscape without taking steps to optimize their privacy program, while 10% say they are not even in a position to react.
Future of data privacy
That’s where firms like BigID can help. BigID leverages AI and ML to disambiguate different kinds of data. Dimitri Sirota, CEO of BigID, says the goal is to build a map of each data point that belongs to a particular identity. That allows companies to demystify their data: to know what they have, where it is stored, and whether it contains personal information.
“There’s such a sizable volume,” says Sirota. “Structured and unstructured data, cloud and on-premise data… How do [businesses] get a picture of what data and whose data they have? The only way it’s possible is to leverage machine learning of various types. This is about improving transparency and trust.”
BigID uses AI in two ways. First, AI combs through an organization’s data infrastructure, even the parts that are invisible to the organization itself. Algorithms learn where the data is located, what type of data it is, and whether it belongs to a specific individual. Second, AI automates data collection and processing. Individual customers or users can file a request to see whether a company has collected their data, what kind of data the organization has, and how the company plans to use it. AI ensures workflows are in place that make it easier for customers, businesses, and auditors to keep track of that data.
Sirota says AI ensures organizations (and individuals) can get a big-picture view of their data. “Historically, data was viewed in a siloed way: legal had one view on data, security had another view, and governance had another. We think it’s important to look at data from a unified perspective.”
At first glance, using AI and ML to classify personal information doesn’t seem as ethically thorny as, say, using machine learning to determine who gets a bank loan or how long a jail sentence should be. But Cockerill says there are still questions to consider.
“As you start to use AI and ML to identify sets of personal data, the ethical challenges start to arise based on where you’re carrying out that analysis,” he explains. “Are you performing that analysis in the same location where that data is stored, or are you transferring it to a centralized database? Are you carrying out that assessment simply to identify the data, or are you then using the dataset for other reasons?”
Cockerill says that when someone hands over their data to a corporation, they might not fully understand all the ways that data will be used. The problem, Cockerill says, is that when an organization collects data for one reason, the organization is often using that data for a different reason too—but the customer or employee might not be aware of that. “Someone might want to know if their information is being moved or used in a way that wasn’t in the original agreement,” he says. “It’s something to consider.”
As you start to use AI and ML to identify sets of personal data, the ethical challenges start to arise.
Moreover, whenever ML or AI comes in contact with human data, there’s the potential for bias to impact the results. “You can never fully remove bias because you’ve always got a developer making decisions about your algorithms,” says Cockerill. The developer decides what data is used to train the algorithms, for example. Cockerill says the potential for bias and faulty results is amplified when personal identification is done in English and yet the algorithm is attempting to parse data in a different language.
Checks and balances
A key principle of privacy by design is anticipating data classification challenges—ethical questions, the potential for inaccurate results—and working to mitigate them early in the data processing lifecycle.
Cockerill says that in conversations around data privacy, everyone talks about the data lifecycle—collection, use, storage, and deletion—but he invites organizations to reframe the conversation. “I want to bring it back to a couple of key questions: What are you doing with the data? Who is accessing it? And where?”
Cockerill emphasizes that “checks and balances” are critical. He wants to see more companies asking internal ethics committees to review how the organization is using AI and ML. He says algorithmic impact assessments can help determine whether organizations are making the right value-based decisions. These best practices should be incorporated into the data collection process.
“Privacy by design is really the right way of looking at it,” says Cockerill.