Identify data sources & ensure reliability

Identify reliable data sources

What to expect

This section will provide you with an overview of conventional, as well as emerging data sources that can complement the sources you already identified. You may be wondering how to trust these new sources, which is why you will find some guidance on identifying reliable data sources as well.

How to get started

A good first step is to create a consolidated list of the data needed and not yet available. The list should be informed by the earlier problem definition and data ecosystem mapping exercises. If the list is long, prioritizing the different data needs can help.

As an example, let’s assume you are working on a policy to improve the situation of people with disabilities in your country. You may already know how many people have a disability, what type of disability they have and what their background is (e.g. education level, income level). However, you lack data on their everyday life, such as, can they access public spaces (e.g. public transport, theaters, etc.)? Is sufficient housing available for people with a disability?

Your data gaps are therefore:

Wheelchair accessibility, as well as accessibility for blind people, in public spaces.
Number and size of apartments that are accessible by wheelchair.

As a reminder, below you will find different types of data that you may be be looking for.

Figure 1: Overview of conventional and emerging data types

The next question is, where to find these data?

Data sources in government

The most established data source is often the National Statistics Office (NSO), as well as national, regional and municipality governments and affiliated agencies.

Linking back to the example above, the data on number of people with a disability, the type of disability and their background may have been provided by your NSO.

You are likely aware of these sources and what data they provide, however, you may still find the following summary helpful:

Exemplary data

Population size and demographic data (age, sex, language spoken)
Household data (size, type of housing, ownership status, location)
Economic data (sources, level of income)

Common sources

National Statistics Office

Exemplary data

Education level
Birth records, contraceptive use, attitudes about family planning
Maternal and child health
Malaria, HIV/AIDS, Tuberculosis, Diabetes

Common sources

Ministry of Health

Exemplary data

Individual women: birth history, post-natal health, domestic violence, life satisfaction
Individual men: fertility, domestic violence, life satisfaction
Children: breastfeeding, child labor, child discipline, parental involvement

Common sources

National Statistics Office, Ministry of Health, UNICEF

Exemplary data

Income, working hours
Duration and type of contract, social protection and benefits
Employer, responsibilities and tasks

Common sources

National Statistics Office, Ministry of Labor, International Labour Organization (ILO)

Exemplary data

GDP
Inflation rates
Unemployment rates

Common sources

National Statistics Office, National Planning Office/ Department

Exemplary data

Civil registries, farmer registries
Tax records
Social protection records

Common sources

Different ministries, e.g. Ministry of Finance for tax records

Data sources in the non-government, non-profit sector

More relevant in the context of identifying new data sources are non-government, non-profit public sources:

Independent national research institutions and think tanks: These organizations often collect broad statistical and survey data on a wide range of areas at a large scale. The specifics of the data collected depends on the organizations’ research focuses. For example, a think tank researching disabled people in a country may provide insights into their living and housing standards. Furthermore, these organizations tend to have done analysis, like case studies and political analysis, and therefore, they are a great source for qualitative data on their research topics.
Regional, national, and local NGOs and other civil society actors: These organizations often collect relatively specific, sometimes sensitive, data linked to a concrete problem for a group of citizens. Often the data is collected on a local level, potentially also covering hard-to-reach places. They may fill data gaps that are overlooked by official statistics, making data sets more inclusive. For example, data on public transport wheelchair accessibility (bus stops, train stations), data on domestic violence or data on marine litter on public beaches. See this use case from Ghana where citizen-generated data provided by NSOs was used.
International organizations: International organizations provide a wide variety of data, depending on their mandates and focuses. These data are the combination of national level data, providing insights on regions, groups of countries and the world. Coming back to the disability example, global data and survey methods from the World Health Organization (WHO) can provide additional insights as well as the Disability Data Portal and the United Nations Disability Statistics.

World Bank collects and provides socio-economic data such as GDP, inflation, unemployment rate and trade statistics.
ILO collects data on labor markets, including vulnerable employment, care work and refugee labor markets.
United Nations Environment Programme (UNEP) and World Meteorological Organization collect and provide data on climate change, pollution, biodiversity and more.
WHO specializes in health data like vaccination, healthcare spending and disease outbreaks.
United Nations Development Programme (UNDP) can be a great source for political data like governance, stability and human rights.
National Aeronautics and Space Administration (NASA) and European Space Agency (ESA) provide satellite imagery.
United Nations Childrens Fund (UNICEF) collects data on children and the sustainable development goals, including stillbirth data, child and youth mortality ages HIV/AIDS data and immunization data.
Asian Development Bank and Inter-American Development Bank collect social and economic data for their regions.
World Trade Organization (WTO) collects information on trade and trade policy measure.
Inter-Parliamentary Union is a resource with over 600 data points provided directly by national parliaments on their structure, composition, working methods and activities.
United Nations Industrial Development Organization (UNIDO) has disaggregated data on the manufacturing sector, including number of employees and salaries.
Universal Postal Union serves as an analytical tool for both international and national level insights, collecting postal data since 1875.
World Intellectual Property Organization (WIPO) has IP statistics for understanding policy, business and technology worldwide.

New trends for data collection:
Citizen-generated data is the practice of involving the public (non-scientific) community in gathering knowledge. While it can take different forms, it involves citizens collecting and sharing specific information for a dedicated, often non-profit, organization. Some examples include: volunteers assisting NASA in identifying clouds on Mars and Argentinians separating and documenting their solid waste in townships. This form of science is becoming increasingly relevant as technologies make data collection easier. For political decision makers, it can offer significant benefits as more relevant data is collected at a local level, providing important insights for policy design.

Data sources in the private sector

Private sources can generally be split into two different categories.

First category includes the companies (e.g. multinationals and start-ups) that are collecting data for their operations. A standard example is social media companies owning data on user interests as those users shared the data on their social media profiles or via social posts. Analysing this data can for example help to understand how ideas and movements on disability spread on social media and how they are perceived.

Second category is the service providers who are specialized in data collection and analysis (for example, to conduct market studies).

While both may be relevant data suppliers, the first category provides more data that can be used for policymaking. Focusing on that category, the following type of companies can be of relevance to you:

‍

Examples of data provided

Call detail records
Profile data (age, gender)
Location data (number of people in specific locations)
Usage (number and duration of calls and text messages)
Spend data (monthly charges, currency, on-time payments)

Examples of companies

Orange, Vodafone, Safaricom, Grameenphone

Common sectors

Migration / Refugees, Health, Transport & Infrastructure

Related use cases

Argentina – Public Transport
Moldova – Refugee Management
Mexico - Safer Public Spaces for Women
Malawi - Health Care
DRC – Health Care / Covid-19
Ecuador - Sustainable and Connected Cities

Additional resources

Series on “Using MNO data” by the Digital Impact Alliance (DIAL) here.

Examples of data provided

User details and preferences (age, interests)
Engagement of users

Examples of companies

Meta (Data for Good), LinkedIn, What’s App

Common sectors

Labour, Health, Diversity & Inclusion, Governance

Related use cases

Mexico - Economic Development

Additional resources

Gender Social Media Monitoring Tool by UNDP

Examples of data provided

Real-time traffic data
Infrastructure quality

Examples of companies

WAZE, Google Maps

Common sectors

Health, Transport & Infrastructure

Related use cases

Argentina – Public Transport
Ecuador - Sustainable and Connected Cities
Indonesia - Flooding

Additional resources

Waze for Cities program, here.

Examples of data provided

Daily data on changes in land surfaces, oceans and atmosphere
Crisis data (land after earthquakes and floods)
Climate data (sea levels, air quality, weather)
Agriculture data (soil health, vegetation, livestock size)
Urban planning and rural development data (economic development, energy consumption, informal settlements)

Examples of companies

SpaceKnow, Planet

Common sectors

Crisis, Climate, Agriculture, Food Security, Urban Planning, Rural Development

Related use cases

Costa Rica – Climate
Malawi - Health Care
Moldova – Energy
Niger - Agriculture
India – Health Care/Covid-19
Zambia - Urbanisation

Additional resources

Integrated Spatial Planning Workbook by UNDP

A number of private companies share open data that are not necessarily related to their core business. Examples include:

Microsoft Data for Society covers data on equity and inclusion, sustainability and health.
Open Data on Amazon Web Services (AWS) covers a wide range of data.
Google Cloud Datasets and Google Dataset Search provide pre-built solutions and a place to search almost 25 million datasets from across the web.

‍

The data provided by these sources may be closed data, shared data or openly available data. To learn more about how to access them, see the next section on accessing and collecting data.

‍

Identify the right data source for your needs

The process of identifying the data source that suits your needs is often an iterative process. It usually requires checking in with your NSO to double check what data they have available. In addition, it will need someone to conduct brief research (desk research, expert interviews) on what non-governmental and private organizations exist in the relevant field. Following that, engaging in conversations directly with these actors to understand their willingness to share data and the quality of their data has proven to be helpful.

In that process, there are two key considerations to keep in mind:

Costs of accessing and collecting the data (vs. the benefit provided by the data). While there are many open datasets available, more granular datasets may come with additional costs. In addition, server costs and time for analysing the data need to be considered.
Trustworthiness of the data source. This will help you to understand if the data provided are relevant, accurate, timely and ethically-produced. Let’s dive deeper into that.

Assess data source trustworthiness

Reliable data, and thus trustworthy data sources, are essential for policymakers to create effective policies. The previous section on “identifying data gaps” provided guidance on ensuring a conventional data source is complete, timely, accurate and of sufficient quality.

With new data sources, however, the challenge of knowing when to trust a source becomes more complex. The data are collected for a different purpose than informing policy, raising potential questions on the data collection and management methods, the data representativeness and the applied privacy standards. Especially in the case of non-profits, the team managing the data may have limited resources, leading to an increased risk of quality issues with the data. What is more, new data sources often provide large datasets which amplify existing biases. For example, in countries where a high share of the population is not yet using the internet, data collected from activities online (social media) may not be representative. In addition to the measures outlined in the previous section, you may want to consider the following aspects, which build on this resource.

Analyze the data producer and owners

Are they reputable? Organizations that are larger or embedded in the local ecosystem can be trusted more easily. Asking trusted partners confidentially about a certain organization may be another way to double check the reputation of an organization. Further, critically questioning the potential political agenda of an organization is crucial.

Do they have a track record of ethical data production? This can be explored by seeing if they worked with other governments in the past, published open datasets or provided resources to help the public use data more ethically.

Do they have sufficient resources available to answer your requests, provide insights into the data and continue to collect the data ethically? The information may be available online. Alternatively, through engaging with them over time, you will be able to get deeper insights into their resources.

Ensure transparency over data

Are data being released in clean datasets? This question may be answered by the NSO or a data analyst on your team.

Has sufficient metadata and other contextual information been provided? There are a few key questions that you can ask the data provider and then double check their answers with your data experts:

How do you benchmark your data? How do you know you are measuring what you want to measure? In case of surveys, can you share the survey questions?
How often do you revise your measurement? What causes you to revise it?
Do you seasonally adjust your data? If so, how do you do it?
If several data sources are combined to one indicator, what is the underlying methodology?
Are you covering a representative sample? How do you ensure you do?
What data privacy measures are in place?

Parts of the information may not be available publicly but should be provided by the data sources upon request. Ensuring you have this level of transparency with the data source will also be important for data analysis later, as comparing and combining data requires your team to have a good understanding of the methodology behind the datasets.

Once data has been shared, does the data source allow for feedback on datasets as well as independent verification of sensitive data? Ideally, the data source agrees to a continuous engagement where feedback provided by you is taken into consideration.

Ensuring credibility of citizen-generated data may seem tricky. Luckily, there is already a good amount of research available on strategies to address this. They include:

Formulate minimum standards for volunteers and provide them with trainings
Collaborate with academic partners (university lab) to develop the project
Designate experienced volunteers or academic partners to oversee data collection in the field
Employ additional technology to verify data entries (photo-based, automated location recording)
Include statistics-driven flagging of incorrect data

More information can be found here.

Have I successfully identified the right data sources?

After working through this section, you should have a list of (new) data sources that will provide more relevant data to address your policy problem. You may have to go back to the ecosystem mapping exercise and the identify data gaps section to double check if all data needs are covered.

In addition, it is important to have confidence in the data sources identified. Meaning that it provides complete, timely, accurate and sufficient data with the necessary quality and privacy measures in place, as outlined above.

It is important to remember that this is an iterative process. You may have identified a private sector partner as a new data source now and will realize later that the collaboration may not materialize. That is normal.

What's next?

Mapping your data ecosystem and understanding where data and data producers might already exist in your work is an important step towards learning what data you may or may not have available to you already. The next step is to get access to the data or collect it yourself.‍‍‍

Identify reliable data sources

Identify reliable data sources

What to expect

How to get started

Data sources in government

Data sources in the non-government, non-profit sector

Data sources in the private sector

Identify the right data source for your needs

Assess data source trustworthiness

Analyze the data producer and owners

Ensure transparency over data

Have I successfully identified the right data sources?

What's next?

Resources

Related Use Cases

Mexico

Moldova (Energy Vulnerability)

Identify reliable data sources

Identify reliable data sources

What to expect

How to get started

Data sources in government

Data sources in the non-government, non-profit sector

Data sources in the private sector

Identify the right data source for your needs

Assess data source trustworthiness

Analyze the data producer and owners

Ensure transparency over data

Have I successfully identified the right data sources?

What's next?

Resources

Related Use Cases

Mexico

Moldova (Energy Vulnerability)

Feedback