Identify reliable data sources

You already have a good understanding of your current data ecosystem, meaning you know which data sets are available to you and where you can find them (if not, see this guidance). What you need now is additional data and information. This section will support you in identifying new, trusted data sources to address your data needs.

What to expect

This section will provide you with an overview of conventional, as well as emerging data sources that can complement the sources you already identified. You may be wondering how to trust these new sources, which is why you will find some guidance on identifying reliable data sources as well.

How to get started

A good first step is to create a consolidated list of the data needed and not yet available. The list should be informed by the earlier problem definition and data ecosystem mapping exercises. If the list is long, prioritizing the different data needs can help.

As an example, let’s assume you are working on a policy to improve the situation of people with disabilities in your country. You may already know how many people have a disability, what type of disability they have and what their background is (e.g. education level, income level). However, you lack data on their everyday life, such as, can they access public spaces (e.g. public transport, theaters, etc.)? Is sufficient housing available for people with a disability?

Your data gaps are therefore:

  • Wheelchair accessibility, as well as accessibility for blind people, in public spaces.
  • Number and size of apartments that are accessible by wheelchair.

As a reminder, below you will find different types of data that you may be be looking for.

Figure 1: Overview of conventional and emerging data types

The next question is, where to find these data?

Data sources in government

The most established data source is often the National Statistics Office (NSO), as well as national, regional and municipality governments and affiliated agencies.

Linking back to the example above, the data on number of people with a disability, the type of disability and their background may have been provided by your NSO.

You are likely aware of these sources and what data they provide, however, you may still find the following summary helpful:

Exemplary data

  • Population size and demographic data (age, sex, language spoken)
  • Household data (size, type of housing, ownership status, location)
  • Economic data (sources, level of income)

Common sources

National Statistics Office

Exemplary data

  • Education level
  • Birth records, contraceptive use, attitudes about family planning
  • Maternal and child health
  • Malaria, HIV/AIDS, Tuberculosis, Diabetes

Common sources

Ministry of Health

Exemplary data

  • Individual women: birth history, post-natal health, domestic violence, life satisfaction
  • Individual men: fertility, domestic violence, life satisfaction
  • Children: breastfeeding, child labor, child discipline, parental involvement

Common sources

National Statistics Office, Ministry of Health, UNICEF

Exemplary data

  • Income, working hours
  • Duration and type of contract, social protection and benefits
  • Employer, responsibilities and tasks

Common sources

National Statistics Office, Ministry of Labor, International Labour Organization (ILO)

Exemplary data

  • GDP
  • Inflation rates
  • Unemployment rates

Common sources

National Statistics Office, National Planning Office/ Department

Exemplary data

  • Civil registries, farmer registries
  • Tax records
  • Social protection records

Common sources

Different ministries, e.g. Ministry of Finance for tax records

Data sources in the non-government, non-profit sector

More relevant in the context of identifying new data sources are non-government, non-profit public sources:

  1. Independent national research institutions and think tanks: These organizations often collect broad statistical and survey data on a wide range of areas at a large scale. The specifics of the data collected depends on the organizations’ research focuses. For example, a think tank researching disabled people in a country may provide insights into their living and housing standards. Furthermore, these organizations tend to have done analysis, like case studies and political analysis, therefore, they are a great source for qualitative data on their research topics.
  2. Regional, national,  and local NGOs and other civil society actors: These organizations often collect relatively specific, sometimes sensitive, data linked to a concrete problem for a group of citizens. Often the data is collected on a local level, potentially also covering hard-to-reach places. They may fill data gaps that are overlooked by official statistics, making data sets more inclusive. For example, data on public transport wheelchair accessibility (bus stops, train stations), data on domestic violence or data on marine litter on public beaches. See this use case from Ghana where citizen-generated data provided by NSOs was used.
  3. International organizations: International organizations provide a wide variety of data, depending on their mandates and focuses. These data are the combination of national level data, providing insights on regions, groups of countries and the world. Coming back to the disability example, global data and survey methods from the World Health Organization (WHO) can provide additional insights as well as the Disability Data Portal and the United Nations Disability Statistics.
New trends for data collection:
Citizen-generated data is the practice of involving the public (non-scientific) community in gathering knowledge. While it can take different forms, it involves citizens collecting and sharing specific information for a dedicated, often non-profit, organization. Some examples include: volunteers assisting NASA in identifying clouds on Mars and Argentinians separating and documenting their solid waste in townships. This form of science is becoming increasingly relevant as technologies make data collection easier. For political decision makers, it can offer significant benefits as more relevant data is collected at a local level, providing important insights for policy design.

Data sources in the private sector

Private sources can generally be split into two different categories.

First category includes the companies (e.g. multinationals and start-ups) that are collecting data for their operations. A standard example is social media companies owning data on user interests as those users shared the data on their social media profiles or via social posts. Analysing this data can for example help to understand how ideas and movements on disability spread on social media and how they are perceived.

Second category is the service providers who are specialized in data collection and analysis (for example, to conduct market studies).

While both may be relevant data suppliers, the first category provides more data that can be used for policymaking. Focusing on that category, the following type of companies can be of relevance to you:

Examples of data provided

  • Call detail records, 
  • Profile data (age, gender)
  • Location data (number of people in specific locations)
  • Usage (number and duration of calls and text messages)
  • Spend data (monthly charges, currency, on-time payments) 

Examples of companies

Orange, Vodafone, Safaricom, Grameenphone

Common sectors

Migration / Refugees, Health, Transport & Infrastructure

Related use cases

  • Argentina – Public Transport
  • Moldova – Refugee Management
  • Mexico - Safer Public Spaces for Women
  • Malawi - Health Care
  • DRC – Health Care / Covid-19
  • Ecuador - Sustainable and Connected Cities

Additional resources

Series on “Using MNO data” by the Digital Impact Alliance (DIAL) here.

Examples of data provided

  • User details and preferences (age, interests)
  • Engagement of users

Examples of companies

Meta (Data for Good), LinkedIn, What’s App

Common sectors

Labour, Health, Diversity & Inclusion, Governance

Related use cases

Mexico - Economic Development

Additional resources

Gender Social Media Monitoring Tool by UNDP

Examples of data provided

  • Real-time traffic data
  • Infrastructure quality

Examples of companies

WAZE, Google Maps

Common sectors

Health, Transport & Infrastructure

Related use cases

Additional resources

Waze for Cities program, here.

Examples of data provided

  • Daily data on changes in land surfaces, oceans and atmosphere
  • Crisis data (land after earthquakes and floods)
  • Climate data (sea levels, air quality, weather
  • Agriculture data (soil health, vegetation, livestock size)
  • Urban planning and rural development data (economic development, energy consumption, informal settlements)

Examples of companies

SpaceKnow, Planet

Common sectors

Crisis, Climate, Agriculture, Food Security, Urban Planning, Rural Development

Related use cases

Additional resources

Integrated Spatial Planning Workbook by UNDP

A number of private companies share open data that are not necessarily related to their core business. Examples include:

The data provided by these sources may be closed data, shared data or openly available data. To learn more about how to access them, see the next section on accessing and collecting data.

Identify the right data source for your needs

The process of identifying the data source that suits your needs is often an iterative process. It usually requires checking in with your NSO to double check what data they have available. In addition, it will need someone to conduct brief research (desk research, expert interviews) on what non-governmental and private organizations exist in the relevant field. Following that, engaging in conversations directly with these actors to understand their willingness to share data and the quality of their data has proven to be helpful.

In that process, there are two key considerations to keep in mind:

  1. Costs of accessing and collecting the data (vs. the benefit provided by the data). While there are many open datasets available, more granular datasets may come with additional costs. In addition, server costs and time for analysing the data need to be considered.
  2. Trustworthiness of the data source. This will help you to understand if the data provided are relevant, accurate, timely and ethically-produced. Let’s dive deeper into that.
Assess data source trustworthiness

Reliable data, and thus trustworthy data sources, are essential for policymakers to create effective policies. The previous section on “identifying data gaps” provided guidance on ensuring a conventional data source is complete, timely, accurate and of sufficient quality.

With new data sources, however, the challenge of knowing when to trust a source becomes more complex. The data are collected for a different purpose than informing policy, raising potential questions on the data collection and management methods, the data representativeness and the applied privacy standards. Especially in the case of non-profits, the team managing the data may have limited resources, leading to an increased risk of quality issues with the data. What is more, new data sources often provide large datasets which amplify existing biases. For example, in countries where a high share of the population is not yet using the internet, data collected from activities online (social media) may not be representative. In addition to the measures outlined in the previous section, you may want to consider the following aspects, which build on this resource.

Analyze the data producer and owners

Are they reputable? Organizations that are larger or embedded in the local ecosystem can be trusted more easily. Asking trusted partners confidentially about a certain organization may be another way to double check the reputation of an organization. Further, critically questioning the potential political agenda of an organization is crucial.

Do they have a track record of ethical data production? This can be explored by seeing if they worked with other governments in the past, published open datasets or provided resources to help the public use data more ethically.

Do they have sufficient resources available to answer your requests, provide insights into the data and continue to collect the data ethically? The information may be available online. Alternatively, through engaging with them over time, you will be able to get deeper insights into their resources.

Ensure transparency over data

Are data being released in clean datasets? This question may be answered by the NSO or a data analyst on your team.

Has sufficient metadata and other contextual information been provided? There are a few key questions that you can ask the data provider and then double check their answers with your data experts:

  • How do you benchmark your data? How do you know you are measuring what you want to measure? In case of surveys, can you share the survey questions?
  • How often do you revise your measurement? What causes you to revise it?
  • Do you seasonally adjust your data? If so, how do you do it?
  • If several data sources are combined to one indicator, what is the underlying methodology?
  • Are you covering a representative sample? How do you ensure you do?
  • What data privacy measures are in place?

Parts of the information may not be available publicly but should be provided by the data sources upon request. Ensuring you have this level of transparency with the data source will also be important for data analysis later, as comparing and combining data requires your team to have a good understanding of the methodology behind the datasets.

Once data has been shared, does the data source allow for feedback on datasets as well as independent verification of sensitive data? Ideally, the data source agrees to a continuous engagement where feedback provided by you is taken into consideration.

Ensuring credibility of citizen-generated data may seem tricky. Luckily, there is already a good amount of research available on strategies to address this. They include:

  • Formulate minimum standards for volunteers and provide them with trainings
  • Collaborate with academic partners (university lab) to develop the project
  • Designate experienced volunteers or academic partners to oversee data collection in the field
  • Employ additional technology to verify data entries (photo-based, automated location recording)
  • Include statistics-driven flagging of incorrect data

More information can be found here.

Have successfully identified the right data sources? What comes next?

After working through this section, you should have a list of (new) data sources that will provide more relevant data to address your policy problem. You may have to go back to the ecosystem mapping exercise and the identify data gaps section to double check if all data needs are covered.

In addition, it is important to have confidence in the data sources identified. Meaning that it provides complete, timely, accurate and sufficient data with the necessary quality and privacy measures in place, as outlined above.

It is important to remember that this is an iterative process. You may have identified a private sector partner as a new data source now and will realize later that the collaboration may not materialize. That is normal.

The next step is to get access to the data or collect it yourself.

Resources

Framework: IMF Data Quality Assessment
by IMF
Download

Related Use Cases

Mexico
From Data to Action: Creating Safer Public Spaces for Women in Mexico City
Learn More
Moldova (Energy Vulnerability)
Data-Driven Collaboration: Using Technology to Support Refugee Management in Moldova
Learn More