SAFRN: Secure Analytics For Reticent Non-consolidated databases

Under a grant from Arnold Ventures LLC (formerly the Laura and John Arnold Foundation), Stealth collaborated with the University of Michigan’s Inter-university Consortium for Political and Social Research (ICPSR) to develop SAFRN, a software platform enabling researchers to securely conduct quantitative studies using sensitive, private datasets without requiring data owners to share their underlying data.

The Arnold Ventures grant was motivated significantly by the proposed Student Right to Know Before You Go Act, first introduced in 2012 by U.S. Senators Marco Rubio and Ron Wyden, which seeks to give prospective students, taxpayers, and policymakers higher-quality information regarding the economic costs and outcomes associated with higher education. Currently universities make available summary data on graduation rates, average debt, and post-matriculation earnings, but this data is frequently presented at a very coarse level, and graduates’ debt and income information is frequently derived from optional survey responses. The proposed bill would require universities, the Social Security Administration, the Department of the Treasury, the Department of Education, and other federal government entities to contribute to a higher education data system, yielding richer data sets – in particular, with student debt and income data sourced from federal-government databases and with aggregated graduation, debt, and income metrics available at the level of individual institutions and majors. Such statistics can be difficult to produce accurately without compromising students’ privacy, as they involve linking multiple sources of sensitive information (namely, education information held by universities with financial information held by multiple federal government entities). Accordingly, the most recent versions of the Act introduced in Congress would require the use of privacy-enhancing technologies such as secure multi-party computation (MPC) in the design of the higher education data system.

Generalizing from this vision, our SAFRN platform is designed to enable researchers to perform statistical analyses – including computation of moments and linear regression, with associated significance tests and goodness-of-fit measures – on sensitive data held by multiple parties, without leaking private information within that data to the researchers or between the data-owning parties. The researcher sends a proposed statistic to be computed, and the data owners (in the example above, the universities and a government entity supplying student financial data) participate in an MPC protocol, which consists of first a private set intersection to link the common identifiers in the records held by the parties, followed by a private aggregate computation on the linked data.

Upon completion of this project we released a working prototype and made it freely available as open-source code on Github.

Work performed on SAFRN was performed by the subsidiary Stealth Software Technologies Commercial, Inc. This work was supported by a grant for “Computing Statistics When Data Cannot Be Shared” from Arnold Ventures LLC (formerly the Laura and John Arnold Foundation).