Cornell Census-NSF Research Node (NCRN)

Integrated research support, training, and data documentation for administrative data

Project Overview

The Cornell node of the NSF-Census Research Network (NCRN) was funded by the National Science Foundation to develop infrastructure for using administrative data in social science research. The project focused on:

  • Privacy-preserving methods for administrative data
  • Training programs for researchers using confidential data
  • Documentation standards for statistical products
  • Synthetic data methods for broader access

Funding

  • National Science Foundation
    • Award Number: 1131848
    • Period: September 19, 2011 - September 12, 2016
    • Amount: $3,560,887
    • Role: Principal Investigator (with John M Abowd, William C Block, Ping Li)
  • National Science Foundation (partial)
    • Award Number: 1012593
    • Period: July 14, 2010 - June 27, 2016
    • Amount: $1,326,660.00
    • Role: Co-Principal Investigator (with Johannes E Gehrke, John M Abowd)

Team

  • Lars Vilhuber - Principal Investigator (2014-2018)
  • John M. Abowd - Former Principal Investigator (2011-2014)
  • William Block - Co-Principal Investigator
  • Ping Li - Co-Principal Investigator

Repositories

The project produced multiple open-source repositories and tools. See

Publications

Publications by grant

All publications funded by grant SES-1131848:
  1. An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices
    John M. Abowd and Ian M. Schmutte
    American Economic Review, forthcoming
  2. Sorting Between and Within Industries: A Testable Model of Assortative Matching
    John M. Abowd, Francis Kramarz, Sebastien Perez-Duarte, and 1 more author
    Annals of Economics and Statistics, 2018
  3. Earnings Inequality and Mobility Trends in the United States: Nationally Representative Estimates from Longitudinally Linked Employer-Employee Data
    John M. Abowd, Kevin L. Mckinney, and Nellie Zhao
    Journal of Labor Economics, 2018
  4. An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices
    John M. Abowd and Ian M. Schmutte
    Aug 2018
  5. An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices
    John M. Abowd and Ian M. Schmutte
    2018
  6. Codebook for the SIPP Synthetic Beta v7 [Online]
    Lori B. Reeder, Jordan C. Stanley, and Lars Vilhuber
    2018
  7. Codebook for the SIPP Synthetic Beta 7.0 (PDF version)
    Lori B. Reeder, Jordan C. Stanley, and Lars Vilhuber
    Nov 2018
  8. Codebook for the SIPP Synthetic Beta 7.0 (DDI-C and PDF)
    Lori B. Reeder, Jordan C. Stanley, and Lars Vilhuber
    Nov 2018
  9. How Will Statistical Agencies Operate When All Data Are Private?
    John M. Abowd
    Journal of Privacy and Confidentiality, 2017
  10. Sorting Between and Within Industries: A Testable Model of Assortative Matching
    John M. Abowd, Francis Kramarz, Sebastien Perez-Duarte, and 1 more author
    2017
  11. Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods
    John M. Abowd and Ian M. Schmutte
    04/2017 2017
  12. Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods
    John M. Abowd and Ian M. Schmutte
    Jan 2017
  13. Noise infusion as a confidentiality protection measure for graph-based statistics
    John M. Abowd and Kevin L. McKinney
    Statistical Journal of the IAOS, Feb 2016
  14. Modeling Endogenous Mobility in Wage Determination
    John M. Abowd, Kevin L. McKinney, and Ian M. Schmutte
    May 2016
  15. How Will Statistical Agencies Operate When All Data Are Private?
    John M. Abowd
    2016
  16. Why Statistical Agencies Need to Take Privacy-loss Budgets Seriously, and What It Means When They Do
    John M. Abowd
    2016
  17. Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods
    John M. Abowd and Ian Schmutte
    Jan 2015
  18. Codebook for the SIPP Synthetic Beta v6.0.2 [Online]
    Lori B. Reeder, Martha Stinson, Kelly E. Trageser, and 1 more author
    2015
  19. Graph Kernels via Functional Embedding
    Anshumali Shrivastava and Ping Li
    CoRR, 2014
  20. In Defense of MinHash Over SimHash
    Anshumali Shrivastava and Ping Li
    In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS), 2014
  21. b-Bit Minwise Hashing in Practice
    Ping Li, Anshumali Shrivastava, and Arnd Christian König
    In Internetware 2013, Oct 2013
  22. Exact Sparse Recovery with L0 Projections
    Ping Li and Cun-Hui Zhang
    In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Illinois, USA, 2013
  23. Beyond Pairwise: Provably Fast Algorithms for Approximate k-Way Similarity Search
    Anshumali Shrivastava and Ping Li
    In Advances in Neural Information Processing Systems 26, 2013
  24. One Permutation Hashing
    Ping Li, Art Owen, and Cun-Hui Zhang
    In Advances in Neural Information Processing Systems 25, 2012
  25. GPU-based minwise hashing: GPU-based minwise hashing
    Ping Li, Anshumali Shrivastava, and Arnd Christian König
    In Proceedings of the 21st World Wide Web Conference (WWW 2012) (Companion Volume), 2012
  26. Entropy Estimations Using Correlated Symmetric Stable Random Projections
    Ping Li and Cun-Hui Zhang
    In Advances in Neural Information Processing Systems 25, 2012
  27. Fast Near Neighbor Search in High-Dimensional Binary Data
    Anshumali Shrivastava and Ping Li
    In The European Conference on Machine Learning (ECML 2012), 2012
  28. Testing for Membership to the IFRA and the NBU Classes of Distributions
    Radhendushka Srivastava, Ping Li, and Debasis Sengupta
    Journal of Machine Learning Research - Proceedings Track for the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2012), 2012
  29. Fast Multi-task Learning for Query Spelling Correction
    Xu Sun, Anshumali Shrivastava, and Ping Li
    In The 21^st ACM International Conference on Information and Knowledge Management (CIKM 2012) , 2012
  30. Query spelling correction using multi-task learning
    Xu Sun, Anshumali Shrivastava, and Ping Li
    In Proceedings of the 21st World Wide Web Conference (WWW 2012)(Companion Volume), 2012

Impact

The NCRN project contributed to:

  • Development of synthetic data methods used by the U.S. Census Bureau
  • Training of hundreds of researchers in confidential data access
  • Creation of open-source tools for reproducible research
  • Advancement of privacy-preserving techniques in economics research