Cornell Census-NSF Research Node (NCRN)

Integrated research support, training, and data documentation for administrative data

Funding: NSF-1131848 NSF-1012593 Sloan
Cornell Census-NSF Research Node (NCRN)

Project Overview

The Cornell node of the NSF-Census Research Network (NCRN) was funded by the National Science Foundation to develop infrastructure for using administrative data in social science research. The project focused on:

  • Privacy-preserving methods for administrative data
  • Training programs for researchers using confidential data
  • Documentation standards for statistical products
  • Synthetic data methods for broader access

Funding

  • National Science Foundation
    • Award Number: 1131848
    • Period: September 19, 2011 - September 12, 2016
    • Amount: $3,560,887
    • Role: Principal Investigator (with John M Abowd, William C Block, Ping Li)
  • National Science Foundation (partial)
    • Award Number: 1012593
    • Period: July 14, 2010 - June 27, 2016
    • Amount: $1,326,660.00
    • Role: Co-Principal Investigator (with Johannes E Gehrke, John M Abowd)

Team

  • Lars Vilhuber - Principal Investigator (2014-2018)
  • John M. Abowd - Former Principal Investigator (2011-2014)
  • William Block - Co-Principal Investigator
  • Ping Li - Co-Principal Investigator

Repositories

The project produced multiple open-source repositories and tools. See

Publications

All 51 publications funded by grant SES-1131848:

  1. An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices
    John M. Abowd and Ian M. Schmutte
    American Economic Review, Jan 2019
  2. Sorting Between and Within Industries: A Testable Model of Assortative Matching
    John M. Abowd, Francis Kramarz, Sebastien Perez-Duarte, and 1 more author
    Annals of Economics and Statistics, 2018
  3. Earnings Inequality and Mobility Trends in the United States: Nationally Representative Estimates from Longitudinally Linked Employer-Employee Data
    John M. Abowd, Kevin L. Mckinney, and Nellie Zhao
    Journal of Labor Economics, 2018
  4. An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices
    John M. Abowd and Ian M. Schmutte
    Center for Economic Studies, U.S. Census Bureau, Working Papers 18-35, Aug 2018
  5. An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices
    John M. Abowd and Ian M. Schmutte
    arXiv, preprint, 2018
  6. Disclosure Limitation and Confidentiality Protection in Linked Data
    John M. Abowd, Ian M. Schmutte, and Lars Vilhuber
    Center for Economic Studies, U.S. Census Bureau, Working Papers 18-07, Jan 2018
  7. Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Improve the U.S. Statistical System?
    Daniel H. Weinberg, John M. Abowd, Robert F. Belli, and 13 more authors
    Journal of Survey Statistics and Methodology, 2018
  8. Codebook for the SIPP Synthetic Beta 7.0 (PDF version)
    Lori B. Reeder, Jordan C. Stanley, and Lars Vilhuber
    Cornell Institute for Social and Economic Research and Labor Dynamics Institute. Cornell University, Codebook V20181102b-pdf, Nov 2018
  9. Utility Cost of Formal Privacy for Releasing National Employer-Employee Statistics
    Samuel Haney, Ashwin Machanavajjhala, John M. Abowd, and 3 more authors
    In Proceedings of the 2017 International Conference on Management of Data, 2017
  10. How Will Statistical Agencies Operate When All Data Are Private?
    John M. Abowd
    Journal of Privacy and Confidentiality, 2017
  11. Proceedings from the 2016 NSF-Sloan Workshop on Practical Privacy
    Lars Vilhuber and Ian M. Schmutte
    Labor Dynamics Institute, Cornell University, Document 33, 2017
  12. Sorting Between and Within Industries: A Testable Model of Assortative Matching
    John M. Abowd, Francis Kramarz, Sebastien Perez-Duarte, and 1 more author
    Labor Dynamics Institute, Document 40, 2017
  13. Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods
    John M. Abowd and Ian M. Schmutte
    Labor Dynamics Institute, Document 37, 04/2017 2017
  14. Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods
    John M. Abowd and Ian M. Schmutte
    Center for Economic Studies, U.S. Census Bureau, Working Papers 17-37, Jan 2017
  15. Utility Cost of Formal Privacy for Releasing National Employer-Employee Statistics
    Samuel Haney, Ashwin Machanavajjhala, John M. Abowd, and 2 more authors
    Proceedings of the 2017 ACM International Conference on Management of Data, 2017
  16. Utility Cost of Formal Privacy for Releasing National Employer-Employee Statistics
    Samuel Haney, Ashwin Machanavajjhala, John M Abowd, and 2 more authors
    Cornell University, Preprint 1813:49652, 2017
  17. Proceedings from the Synthetic LBD International Seminar
    Lars Vilhuber, Saki Kinney, and Ian Schmutte
    Labor Dynamics Institute, Cornell University, Document 44, 2017
  18. Proceedings from the 2016 NSF-Sloan Workshop on Practical Privacy
    Lars Vilhuber and Ian Schmutte
    Cornell University, Preprint 1813:46197, 2017
  19. Proceedings from the 2017 Cornell-Census-NSF-Sloan Workshop on Practical Privacy
    Lars Vilhuber and Ian Schmutte
    Labor Dynamics Institute, Cornell University, Document 43, 2017
  20. Making Confidential Data Part of Reproducible Research
    Lars Vilhuber and Carl Lagoze
    Labor Dynamics Institute, Cornell University, Document 41, 2017
  21. Effects of a Government-Academic Partnership: Has the NSF-Census Bureau Research Network Helped Improve the U.S. Statistical System?
    Daniel H. Weinberg, John M. Abowd, Robert F. Belli, and 13 more authors
    Center for Economic Studies, U.S. Census Bureau, Working Papers 17-59r, Jan 2017
  22. Total Error and Variability Measures with Integrated Disclosure Limitation for Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in On The Map
    Kevin L. McKinney, Andrew S. Green, Lars Vilhuber, and 1 more author
    Center for Economic Studies, U.S. Census Bureau, Working Papers 17-71, Jan 2017
  23. Two Perspectives on Commuting: A Comparison of Home to Work Flows Across Job-Linked Survey and Administrative Files
    Andrew S. Green, Mark J. Kutzbach, and Lars Vilhuber
    Center for Economic Studies, U.S. Census Bureau, Working Papers 17-34, Jan 2017
  24. Using partially synthetic microdata to protect sensitive cells in business statistics
    Javier Miranda and Lars Vilhuber
    Statistical Journal of the IAOS, Feb 2016
  25. Noise infusion as a confidentiality protection measure for graph-based statistics
    John M. Abowd and Kevin L. McKinney
    Statistical Journal of the IAOS, Feb 2016
  26. Synthetic establishment microdata around the world
    Lars Vilhuber, John M. Abowd, and Jerome P. Reiter
    Statistical Journal of the IAOS, Feb 2016
  27. Modeling Endogenous Mobility in Wage Determination
    John M. Abowd, Kevin L. McKinney, and Ian M. Schmutte
    Labor Dynamics Institute, Document 28, May 2016
  28. How Will Statistical Agencies Operate When All Data Are Private?
    John M. Abowd
    Labor Dynamics Institute, Cornell University, Document 30, 2016
  29. Why Statistical Agencies Need to Take Privacy-loss Budgets Seriously, and What It Means When They Do
    John M. Abowd
    Labor Dynamics Institute, Cornell University, Document 32, 2016
  30. Economic analysis and statistical disclosure limitation
    John M. Abowd and Ian Schmutte
    Brookings Papers on Economic Activity, 2015
  31. A new method for protecting interrelated time series with Bayesian prior distributions and synthetic data
    Matthew J. Schneider and John M. Abowd
    Journal of the Royal Statistical Society: Series A (Statistics in Society), 2015
  32. Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods
    John M. Abowd and Ian Schmutte
    Labor Dynamics Institute, Document 22, Jan 2015
  33. CED²AR: The Comprehensive Extensible Data Documentation and Access Repository
    Carl Lagoze, Lars Vilhuber, Jeremy Williams, and 2 more authors
    In ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014), Sep 2014
    Presented at the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014)
  34. Graph Kernels via Functional Embedding
    Anshumali Shrivastava and Ping Li
    CoRR, 2014
  35. In Defense of MinHash Over SimHash
    Anshumali Shrivastava and Ping Li
    In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS), 2014
  36. Synthetic Longitudinal Business Databases for International Comparisons
    Jörg Drechsler and Lars Vilhuber
    In Privacy in Statistical Databases, 2014
  37. Using Partially Synthetic Data to Replace Suppression in the Business Dynamics Statistics: Early Results
    Javier Miranda and Lars Vilhuber
    In Privacy in Statistical Databases, 2014
  38. Data Management of Confidential Data
    Carl Lagoze, William C. Block, Jeremy Williams, and 2 more authors
    International Journal of Digital Curation, 2013
    Presented at 8th International Digital Curation Conference 2013, Amsterdam. See also http://hdl.handle.net/1813/30924
  39. Encoding Provenance of Social Science Data: Integrating PROV with DDI
    Carl Lagoze, William C. Block, Jeremy Williams, and 1 more author
    In 5th Annual European DDI User Conference, 2013
  40. Encoding Provenance Metadata for Social Science Datasets
    Carl Lagoze, Jeremy Willliams, and Lars Vilhuber
    In Metadata and Semantics Research, 2013
  41. b-Bit Minwise Hashing in Practice
    Ping Li, Anshumali Shrivastava, and Arnd Christian König
    In Internetware 2013, Oct 2013
  42. Exact Sparse Recovery with L0 Projections
    Ping Li and Cun-Hui Zhang
    In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Illinois, USA, 2013
  43. Beyond Pairwise: Provably Fast Algorithms for Approximate k-Way Similarity Search
    Anshumali Shrivastava and Ping Li
    In Advances in Neural Information Processing Systems 26, 2013
  44. A Proposed Solution to the Archiving and Curation of Confidential Scientific Inputs
    John M. Abowd, Lars Vilhuber, and William Block
    In Privacy in Statistical Databases, 2012
  45. One Permutation Hashing
    Ping Li, Art Owen, and Cun-Hui Zhang
    In Advances in Neural Information Processing Systems 25, 2012
  46. GPU-based minwise hashing: GPU-based minwise hashing
    Ping Li, Anshumali Shrivastava, and Arnd Christian König
    In Proceedings of the 21st World Wide Web Conference (WWW 2012) (Companion Volume), 2012
  47. Entropy Estimations Using Correlated Symmetric Stable Random Projections
    Ping Li and Cun-Hui Zhang
    In Advances in Neural Information Processing Systems 25, 2012
  48. Fast Near Neighbor Search in High-Dimensional Binary Data
    Anshumali Shrivastava and Ping Li
    In The European Conference on Machine Learning (ECML 2012), 2012
  49. Testing for Membership to the IFRA and the NBU Classes of Distributions
    Radhendushka Srivastava, Ping Li, and Debasis Sengupta
    Journal of Machine Learning Research - Proceedings Track for the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2012), 2012
  50. Fast Multi-task Learning for Query Spelling Correction
    Xu Sun, Anshumali Shrivastava, and Ping Li
    In The 21^st ACM International Conference on Information and Knowledge Management (CIKM 2012) , 2012
  51. Query spelling correction using multi-task learning
    Xu Sun, Anshumali Shrivastava, and Ping Li
    In Proceedings of the 21st World Wide Web Conference (WWW 2012)(Companion Volume), 2012

Impact

The NCRN project contributed to:

  • Development of synthetic data methods used by the U.S. Census Bureau
  • Training of hundreds of researchers in confidential data access
  • Creation of open-source tools for reproducible research
  • Advancement of privacy-preserving techniques in economics research