How to get started ensuring data protection and privacy

The quantum of data being generated every minute comes with a risk, the risk that this data will not be protected. Each time data is used or re-used for a different purpose, an individual’s right to privacy must be weighed against the rights of citizens and communities and the benefits for society more widely. If privacy is not considered in the process of data collection, re-use and sharing, not only will citizen’s trust be undermined and a variety of regulations and laws may be broken, but citizens and the government could be exposed to fraudulent activities and other dangerous actions.

Many governments lack the necessary legislative frameworks and regulatory environments for data, which consequently makes individuals more susceptible to fraudulent activities, hacking, phishing, and identity theft. Especially while using new or re-using existing data for decision-making, it is important to consider confidentiality, integrity, and availability of data in your work.

Obtaining Consent

The first question you should ask yourself when planning to process personal data should be “What is my reason or justification for processing this personal data?” This question is important since processing personal data usually is only lawful when there is a legal basis. 

The GDPR, for instance, sets out what these potential legal bases are, namely: consent; contract; legal obligation; vital interests; public task; or legitimate interests. Processing personal data therefore may require that the data subject has consented to the processing. 

According to the European Commission (EC), “a consent request needs to be presented in a clear and concise way, using language that is easy to understand, and be clearly distinguishable from other pieces of information such as terms and conditions. […] Consent must be freely given, specific, informed and unambiguous.”

The EC specifies the information that should be provided to data subjects when obtaining consent for processing personal data, including:

  • the identity of the organization processing data,
  • the purposes for which the data is being processed,
  • the type of data that will be processed,
  • the possibility to withdraw consent (for example by sending an email to withdraw consent).

Also see this helpful checklist from the UK Information Commissioners Office on what to consider when asking for, recording and managing consent.

Usually, consent should meet the standard of an unambiguous indication by clear affirmative action (opt-in). As long as this standard is met, different ways of obtaining consent are possible, including:

  • signing a consent statement on a paper form,
  • ticking an opt-in box on paper or electronically,
  • clicking an opt-in button or link online,
  • selecting from equally prominent yes/no options,
  • choosing technical settings or preference dashboard settings,
  • responding to an email requesting consent.

In recent years, new approaches to informed consent have been developed to enable ongoing engagement and communication between individuals and the users of their data. Dynamic consent is one such example, mainly applied to facilitate participant engagement in clinical and research activities over time. 

Authorized public purpose access (APPA) is another innovative method beyond the explicit, opt-in consent of individuals, promoting data flows while simultaneously protecting people’s rights.

Protecting Personal Data

Data privacy is deeply intertwined with data governance, as protecting data necessitates keeping data in secure locations and in the right hands. With the basics of data stewardship covered under the data governance section, we will focus here on keeping personal data private during public use.

Personal data – like names, demographic data, and political beliefs – require particular protections, so that individuals are not identified or targeted based on the data they provide. For instance, an individual should not be targeted for filling out a survey on who they plan to vote for in the next election. In this instance, personal data needs to be kept private and systems need to be developed to keep data private.

The UK Office for National Statistics offers a helpful tool for outlining the five areas of safety that governments should address when using personal data for public use:   

  • Safe people: researchers must be experienced, accredited and they must sign a confidentiality contract;
  • Safe projects: every project must be in the public benefit, must be approved, and the results publicly available; 
  • Safe settings: data is only accessible in secure environments; 
  • Safe data: the data is de-identified as much as possible; and
  • Safe outputs: the outputs of analysis must not identify any individuals.

Concerns about privacy and individual protection arise when collecting personal data on subjects such as race, gender, age, and socioeconomic status. The decision to collect, disseminate, and use such data involves a constant tension between the public interest and the need for privacy. Though data points can help us better understand a situation, it may also compromise the anonymity of the data and expose individuals to re-identification risks, which is why all people, including public officials using personal data, must carefully consider these trade-offs when selecting which data to use. 

Privacy Enhancing Technologies

In addition to best practices that endeavor to keep data private, there are also more sophisticated emerging technologies, known collectively as Privacy Enhancing Technologies (PETs), which can better protect the privacy and security of data, especially when data are shared.

PETs work by limiting access to individual data, either by transforming it, encrypting it, or storing it on a different system, while still enabling analysis. PETs can be an enabler for innovation by allowing for the safe sharing and processing of data. PETs can also enhance privacy in existing projects. But PETs cannot fully address the privacy challenges in each data-sharing system and must be applied within a wider data privacy and protection infrastructure.

A detailed overview of PETs is available from the Centre for Data Ethics and Innovation. The most common PETs include: 

  • Encryption is one of the principal security technologies used to protect information. Encryption converts legible data into a so-called ciphertext - a representation of the data that is unreadable by a human or a computer. The data can only be read by first decrypting it, which requires access to an appropriate decryption key. Thus, data is kept secret from everyone who does not have access to this key. 
  • A de-identification technique is defined as any data transformation or modification that reduces the amount of information about an individual or entity in a dataset, and/or reduces the risk that an individual or entity can be re-identified. These methods are distinguished from the PETs described above in that they involve direct manipulation of the raw data, rather than being mechanisms for protecting confidentiality whilst maintaining maximum utility over the underlying data. Some examples of de-identification techniques are: 

Redaction: deleting an entire record or field, or obfuscating part of a record or field (e.g. deleting all but the last 4 digits of a credit card number)

Tokenization: replacing a real value with a randomly generated value

Hashing: applying a function to a value to produce a fixed-length value (or hash)

Generalization: transforming a value to a less precise or bucketed value, e.g. replacing a height of 179cm with a range 170-180cm 

To determine which types of PETs may be beneficial in your upcoming projects, the Centre for Data Ethics and Innovation’s decision-tree Adoption Guide can be helpful (see illustration below).  

Another comprehensive overview of relevant privacy-preserving techniques can be found in this Handbook from the Big Data UN Global Working Group: 

Centre for Data Ethics and Innovation’s PET Adoption Guide

Protecting data through proper storage

Data storage needs to be handled by database management systems (DBMS). DBMS is a collection of programs that manages the database structure and controls access to the data stored in the database. A relational database management system (RDBMS) is used for creating, storing and connecting structured data and then rapidly retrieving via a query language. 

Rules can be applied for data security, connecting data and enforcing referential integrity. This is vital to ensure data quality which cannot be guaranteed using unmanaged data stores such as Excel. The most widely used commercial DBMS are Microsoft SQL Server, Oracle, Sybase and IBM. There are also open-source examples including MySQL, MariaDB, MongoDB and PostgreSQL. (UN DESA)

Related Use Cases

Ecuador
Leveraging Big Data to Build Sustainable and Connected Cities: The Quito Case
Learn More
DRC
Using Mobile Phone Data for Effective Public Health Measures during Pandemics
Learn More