Innovations in Data and Experiments for Action Initiative (IDEA)
Increasing the use of administrative data for evidence-informed decision-making through partnerships with governments, non-profits, and private firms.
Funding:Sloan-G-2019-11391
Project Goal
J-PAL’s Innovations in Data and Experiments for Action Initiative (IDEA) aimed to increase the use of administrative data by governments, non-profits, and private firms for evidence-informed decision-making.
Around the world, vast amounts of data are now being digitally collected or stored, creating tremendous opportunities to transform lives through improved social policy. Using these data sets in creative and innovative ways to evaluate programs and improve outcomes is a vital step toward making significant progress in the fight against poverty worldwide.
Yet so little data is accessed, analyzed, or used in research to inform decision-making. The core challenges IDEA aimed to address include:
Cumbersome or non-existent data use policies and access modalities
Low usability of data stored in outmoded formats and disconnected databases
Lack of advanced technical skills for analysis and experiments
IDEA supported governments, firms, and non-profit organizations (“data providers”) who wanted to make their administrative data accessible in a safe and ethical way; analyze it to improve decision-making; and partner with researchers in using this data to design innovative programs, evaluate program impact through randomized experiments, and scale up successful programs.
Note: As of 2023, IDEA concluded its core activities and is no longer an active initiative at J-PAL.
Project Leadership
The IDEA Initiative was led by co-chairs with diverse backgrounds in data security, privacy and ethics, research, and government systems:
Shawn Cole (Harvard Business School) - Co-Chair, Co-Principal Investigator
Anja Sautmann (J-PAL Global, MIT) - Principal Investigator
Lars Vilhuber (Cornell University) - Co-Chair, Co-Principal Investigator
Funding
The Alfred P. Sloan Foundation provided funding for the IDEA Initiative through grant Sloan-G-2019-11391 from 2019 to 2022, supporting the development and publication of the IDEA Handbook and related activities.
Internal J-PAL funds
Activities
Two pilot projects in the United States
Creation of a handbook and workshops on using administrative data in economic experiments
Building capacity of data providers to conduct research with their own data
Creating long-term institutional partnerships with data providers
Outcomes
The IDEA Handbook (2021)
The centerpiece outcome was the Handbook on Using Administrative Data for Research and Evidence-Based Policy, published in September 2020 (formally published in 2021). The Handbook serves as a go-to reference for researchers seeking to use administrative data and for data providers looking to make their data accessible for research.
The handbook is published online under an open licensing model and freely available to all. An overview is available on the J-PAL website. It provides information, best practices, and case studies on how to create privacy-protected access to, handle, and analyze administrative data, with the aim of pushing the research frontier as well as informing evidence-based policy innovations.
Key Features:
Comprehensive coverage of data security, statistical and differential privacy
Guidance on crafting data use agreements
Technical expertise and practical use cases
Physical and technical methods for protecting sensitive data
Statistical disclosure limitation techniques
Balance between privacy protection and data usability
Accompanying the handbook publication is a webinar series featuring handbook authors and experts, providing additional guidance and discussion on using administrative data for research. The webinars can be viewed for free on YouTube.
Handbook on Using Administrative Data for Research and Evidence-based Policy
Shawn Cole, Iqbal Dhaliwal, Anja Sautmann, and 1 more author
The Handbook serves as a go-to reference for researchers seeking to use administrative data and for data providers looking to make their data accessible for research. The handbook is published online under an open licensing model and freely available to all. It provides information, best practices, and case studies on how to create privacy-protected access to, handle, and analyze administrative data, with the aim of pushing the research frontier as well as informing evidence-based policy innovations.
@book{cole_handbook_2021,abstract={The Handbook serves as a go-to reference for researchers seeking to use administrative data and for data providers looking to make their data accessible for research. The handbook is published online under an open licensing model and freely available to all. It provides information, best practices, and case studies on how to create privacy-protected access to, handle, and analyze administrative data, with the aim of pushing the research frontier as well as informing evidence-based policy innovations.},author={Cole, Shawn and Dhaliwal, Iqbal and Sautmann, Anja and Vilhuber, Lars},copyright={Copyright (c) 2021 Shawn Cole Iqbal Dhaliwal Anja Sautmann Lars Vilhuber},doi={10.31485/admindatahandbook.1.0},isbn={978-1-7360216-0-6},language={en},month=jan,publisher={Abdul Latif Jameel Poverty Action Lab},title={Handbook on {Using} {Administrative} {Data} for {Research} and {Evidence}-based {Policy}},url={https://admindatahandbook.mit.edu/},urldate={2021-04-08},year={2021},month_numeric={1}}
Using Administrative Data for Research and Evidence-Based Policy: An Introduction
Shawn Cole, Iqbal Dhaliwal, Anja Sautmann, and 1 more author
In Handbook on Using Administrative Data for Research and Evidence-based Policy, Jan 2021
@incollection{cole_using_2021,author={Cole, Shawn and Dhaliwal, Iqbal and Sautmann, Anja and Vilhuber, Lars},booktitle={Handbook on {Using} {Administrative} {Data} for {Research} and {Evidence}-based {Policy}},copyright={Copyright (c) 2021 Shawn Cole Iqbal Dhaliwal Anja Sautmann Lars Vilhuber},doi={10.31485/admindatahandbook.1.0},editor={Cole, Shawn and Dhaliwal, Iqbal and Sautmann, Anja and Vilhuber, Lars},isbn={978-1-7360216-0-6},language={en},month=jan,pages={1--36},publisher={Abdul Latif Jameel Poverty Action Lab},title={Using {Administrative} {Data} for {Research} and {Evidence}-{Based} {Policy}: {An} {Introduction}},url={https://admindatahandbook.mit.edu/print/v1.0/handbook_ch1_Introduction.pdf},urldate={2021-04-08},year={2021},month_numeric={1}}
Physically Protecting Sensitive Data
Jim Shen and Lars Vilhuber
In Handbook on Using Administrative Data for Research and Evidence-based Policy, Jan 2021
Keeping sensitive data safe relies heavily on the physical environments in which data are stored, processed, transmitted, and accessed, and from which researchers can access computers that store and process the data. However, it is also the setting that is most dependent on rapidly evolving technology. The chapter provides snapshot of the technologies available and in use as of 2020, and characterizes the technologies along a multi-dimensional scale, allowing for some comparability across methods.
@incollection{shen_physically_2021,abstract={Keeping sensitive data safe relies heavily on the physical environments in which data are stored, processed, transmitted, and accessed, and from which researchers can access computers that store and process the data. However, it is also the setting that is most dependent on rapidly evolving technology. The chapter provides snapshot of the technologies available and in use as of 2020, and characterizes the technologies along a multi-dimensional scale, allowing for some comparability across methods.},author={Shen, Jim and Vilhuber, Lars},booktitle={Handbook on {Using} {Administrative} {Data} for {Research} and {Evidence}-based {Policy}},copyright={Copyright (c) 2021 Shawn Cole Iqbal Dhaliwal Anja Sautmann Lars Vilhuber},doi={10.31485/admindatahandbook.1.0},editor={Cole, Shawn and Dhaliwal, Iqbal and Sautmann, Anja and Vilhuber, Lars},isbn={978-1-7360216-0-6},language={en},month=jan,pages={37--84},publisher={Abdul Latif Jameel Poverty Action Lab},title={Physically {Protecting} {Sensitive} {Data}},url={https://admindatahandbook.mit.edu/print/v1.0/handbook_ch2_Physical-protection.pdf},urldate={2021-04-08},year={2021},month_numeric={1}}
Balancing Privacy and Data Usability: An Overview of Disclosure Avoidance Methods
Ian M. Schmutte and Lars Vilhuber
In Handbook on Using Administrative Data for Research and Evidence-based Policy, Jan 2021
The Five Safes framework (safe projects, safe people, safe settings, safe data, and safe outputs) is one way of thinking about security of different aspects of a project, and is used throughout the Handbook and in research with administrative data. Within the Five Safes framework, data providers need to create safe data that can be provided to trusted safe people for use within safe settings, as part of safe projects. Finally, any findings that are shared publicly must be safe outputs. The processes used to create safe data and safe outputs (manipulations that render data less sensitive and therefore more appropriate for public release) are generally referred to as statistical disclosure limitation (SDL). This chapter describes techniques traditionally used within the field of SDL, pointing at methods as well as metrics to assess the resultant statistical quality and sensitivity of the data, and offers technical guidance applicable to any data provider or researcher looking for practical tools to apply to their own data to reduce the risk to privacy.
@incollection{schmutte_balancing_2021,abstract={The Five Safes framework (safe projects, safe people, safe settings, safe data, and safe outputs) is one way of thinking about security of different aspects of a project, and is used throughout the Handbook and in research with administrative data. Within the Five Safes framework, data providers need to create safe data that can be provided to trusted safe people for use within safe settings, as part of safe projects. Finally, any findings that are shared publicly must be safe outputs. The processes used to create safe data and safe outputs (manipulations that render data less sensitive and therefore more appropriate for public release) are generally referred to as statistical disclosure limitation (SDL). This chapter describes techniques traditionally used within the field of SDL, pointing at methods as well as metrics to assess the resultant statistical quality and sensitivity of the data, and offers technical guidance applicable to any data provider or researcher looking for practical tools to apply to their own data to reduce the risk to privacy.},author={Schmutte, Ian M. and Vilhuber, Lars},booktitle={Handbook on {Using} {Administrative} {Data} for {Research} and {Evidence}-based {Policy}},copyright={Copyright (c) 2021 Shawn Cole Iqbal Dhaliwal Anja Sautmann Lars Vilhuber},doi={10.31485/admindatahandbook.1.0},editor={Cole, Shawn and Dhaliwal, Iqbal and Sautmann, Anja and Vilhuber, Lars},isbn={978-1-7360216-0-6},language={en},month=jan,pages={145--172},publisher={Abdul Latif Jameel Poverty Action Lab},title={Balancing {Privacy} and {Data} {Usability}: {An} {Overview} of {Disclosure} {Avoidance} {Methods}},url={https://admindatahandbook.mit.edu/print/v1.0/handbook_ch5_SDL.pdf},urldate={2021-04-08},year={2021},month_numeric={1}}