Access & collect data

You have an overview of what data you need? You wonder how to access the data? In this section, you will get an overview of what you, as policymaker, need to know about accessing and collecting data.

What to expect

This section provides an overview of what to consider when accessing and collecting data from new sources, such as the private sector. Even though this step will often be taken by data analysts and technical experts, it is imperative that policymakers understand the potentials, limitations, and risks involved when working with new data sources. At the end of this section, policymakers will be able to make meaningful requests to their data providers, can make decisions about what data sources to use, and know about legislative frameworks that should be considered when working with new data.

How to get started

Let’s first start with some common ways to access data.

Open data

Open data is data that anyone can access, use or share. This has three advantages.

  1. Open data often does not require any actions to adhere to data protection regulations.
  2. Integrating open data into government systems is technically easy.
  3. Using open data is free of charge.

As a best practice, policymakers should always let the open data provider know that they are accessing the data and what they are using it for. Open data is free to access, so it is difficult for providers to know the value and use of the data they are publishing or to improve aspects of the data such as quality or interoperability. In turn, policymakers should also find out if the data or metadata they are collecting can be published openly as well.

Data from within your government

Governments are collecting data all the time and have great resources. It may well be that the data you need already exists somewhere in your organization. The section on mapping your data ecosystem can help you identify which data you can find where and what the formal requirements are to access it.

Data from other governments

Sharing data between countries is important for better information on climate risks, law enforcement, public health and other topics. However, there are many ethical, legal and practical considerations that need to be taken into account when sharing data between different governments. A cross-border data sharing agreement is a formal agreement between two or more countries that outlines the terms and conditions under which data will be shared between them. Its purpose is to facilitate the exchange of information while protecting the privacy and security of the data being shared. The agreement typically specifies what types of data will be shared, how it will be shared and under what conditions it can be used.

Data from external partners

Government entities can partner with non-governmental organizations, academia or private entities to access data. In fact, most governments have mechanisms of systematic engagement, especially with non-profit organizations. For example, a public health agency could partner with a hospital to access patient health records for research purposes. Larger organizations typically offer Application Programming Interfaces (APIs), which allow public institutions to directly connect to data sources and extract data in real-time. APIs are commonly used for accessing data from social media platforms, weather services and financial data providers.

For instance, the API of the Humanitarian Data Exchange could be used to automatically access, filter, and sort data and integrate it into your customized dashboard. It is important to note that obtaining data from external partners may also require legal agreements, such as data sharing agreements or non-disclosure agreements. To learn more about data sharing agreements, see the section on building data partnerships.

Most of the collected data worldwide belongs to private technology companies. While there are several platforms that provide some of their data openly (like Meta), more granular data is often only accessible for a fee. In some cases, private companies may not be willing to share their data at any price due to privacy or proprietary concerns.

The cost of obtaining data from private companies can vary greatly depending on a number of factors such as the type of data, the quantity of data, the quality of the data, the level of detail required and the company's policies on data sharing. In general, the cost can range from a few hundred dollars to tens of thousands of dollars or more. Some companies may offer standardized data packages with set prices, while others may negotiate on a case-by-case basis depending on the specific needs of the requester. Before purchasing any data, ensure that your servers and local IT infrastructure has sufficient computational capacity to manage the new data sets. Geo-spatial data can be computationally intensive.

Formulating a data request

A critical step in accessing data is to send a meaningful data request to whoever owns the data. It is important to be very clear about the information you need and the format in which it should be delivered to you. The figure below provides an example of a data request, highlighting the main aspects that you should pay attention to when formulating your own data request.

Figure 1: Formulate a data request I Source: GIZ

Data collection from public institutions

After having explored the existing data within and outside your organizations, you still may be missing relevant information and additional data has to be collected through either government surveys, administrative systems, focus groups or field research. The collection of data by public institutions has some advantages and disadvantages, which one should carefully consider before starting such a process. In some cases, it is sufficient to use an approximation or similar data that has already been collected. Therefore, a fist step should always be to consult with your NSO and the data specialists in your entity.

Advantages

Large sample size

Government surveys can reach large sample sizes, making it possible to generate statistically representative data that accurately reflects the population of interest and is granular enough to get more meaningful results. Data that can be disaggregate by region, gender and other characteristics is often more inclusive and ensures that marginalized groups are not disregarded from the policy design.

Controlled methodology

As public institutions set up their own data collection process, they have full control over the collection design. This allows them to specify the sampling strategy (probability sampling, cluster sampling or stratified sampling), the survey design (through standardized questionnaires) and collection methods (face-to-face interviews, phone surveys or online surveys). Thus, the data collected is reliable and comparable across different geographic areas and time periods.

Disadvantages

Time- and resource-intensive
The process of designing, conducting and analysing government collection methods can be time-consuming and resource-intensive, requiring careful planning and management. Due to the long processes, the retrieved information from the data is often outdated by the time it can be used for policymaking.

Limited flexibility
Due to bureaucratic processes, data collection in public institutions is typically designed in advance with limited flexibility to adjust the survey questions or sampling strategy in response to unexpected developments or changes in research questions.

Ad-hoc collection
The ad-hoc collection of data, as opposed to periodic data collection, makes it often difficult to compare results with previous and future data. Thus, it only provides a snapshot of the current situation, making it difficult to infer meaningful policy decisions.

Data silos

Data silos in governments occur when data is stored and managed separately within different departments or agencies, making it difficult for data to be shared or integrated across different parts of the organization. Therefore, it is essential to coordinate any form of data collection with other departments, especially the NSO. In many instances, similar datasets have already been collected by another entity.

A major data gap arises out of disruptions in how data flows from one entity to another within the national statistical system. If you have done the data ecosystem mapping exercise, you most likely have identified some gaps that are emerging from data not flowing from one stakeholder to the other in a timely manner and decisions being made without that information. Here are some tips for ensuring the data flows systemically:

  • Start where the biggest source of your data is. In most cases, this may be the National Statistical Office (NSO) or the agency focal points that have the mandate to collect data on the topic in question. Ask for all the data you may need.
  • Ensure all relevant stakeholders are being looked at while considering data flows. In some cases, local stakeholders such as civil society organizations may have data that completes the puzzle. For example, data on specific vulnerable communities, tribes, etc. However, it is easy to overlook them when other stakeholders hold the majority data.
  • Build structure mechanisms such that these data flows are institutionalized and there are less ad-hoc data needs.
Personal data

Data that is relevant to policymaking often contains sensitive, personal information about individuals. This information needs to be protected and cannot be simply shared or accessed without considering relevant legislation. Before accessing and collecting any data, it is important to find out whether the information it contains falls under any data protection regulation and if it does take necessary steps to protect data privacy.

If you are unsure about which data protection regulation applies and how to proceed, see the section on data governance.

Reliability and data quality

Public institutions must ensure that the data they access is accurate, reliable and relevant to their needs. They must establish procedures for data quality assurance and data verification to prevent data errors and inconsistencies. See the section on identifying data sources and ensuring reliability for more information.

Have I successfully accessed and collected the data I need?

At the end of accessing and collecting the desired data, the data specialists you are working with should have all the resources available, which they need to start the data analysis. To check if this is the case, you can ask the data specialists the following questions:

  • Are all data sources directly accessible to you, so that you can start the data analysis?
  • Did you take care of data privacy and other ethical concerns?  

If your answer to these questions is yes, then you have successfully accessed and collected your data.

What comes next?

With the data at hand, you are now ready to start thinking about the data analysis. This includes validating data, thinking about adequate visualizations and finally interpreting the results. At this stage, it will be important to forge partnerships with other stakeholders and ensure that your data is enriched with qualitative assessments.

Resources

Report: What is open data?
by ODI
Download

Related Use Cases

DRC
Using Mobile Phone Data for Effective Public Health Measures during Pandemics
Learn More
Ecuador
Leveraging Big Data to Build Sustainable and Connected Cities: The Quito Case
Learn More