ANalytics and statistics with Information SEcurity
Stealth, as part of the National Institute of Technology’s FY2021 Small Business Innovation Research (SBIR) Program, received a Phase I SBIR award to develop a software platform for securely performing statistical computations on data sets held by multiple mutually untrusting parties. Originally envisioned as an extension of technology developed under previous Stealth projects like SAFRN and VLDS, through collaboration with researchers at NIST’s Computer Security Resource Center, we designed and implemented a solution for performing efficient Private Set Intersection (PSI). This can in turn be used to address numerous potential applications in multiple industries, including detection of benefit double-dipping in insurance or collaborative blacklist management among mutually distrusting financial entities.
We focused much of our effort on the design and implementation of a collaborative software analytics platform targeting the specific application of pairwise PSI. A pairwise PSI protocol enables an analyst to compute the pairwise intersections of private datasets contributed by multiple data-owner participants—that is, to determine the elements in common between any two of the data sets–in a privacy-preserving manner. More specifically, using our software, the analyst can compute the intersections of each pair of contributed data sets, but nothing else; neither the analyst, any other participant, nor any third party learns any other elements of a participant data owner’s data set.
As part of this project, we also envisioned the creation of a tool that can be used to setup and configure a proposed multiparty computation. This setup tool would be to engage the relevant parties (the analysts who wish to run the computation, the data owners who are contributing their private data, the recipient of the result) to have them work together in a structured process to formulate the parameters of the computation, including what data is to be analyzed with associated relational database metadata, and network configuration data. In several of our previous contracts building and implementing multiparty computations, we have seen that there is a significant amount of effort that must be dedicated in coordinating and orchestrating the parties even before the computation is ready to be executed. The different parties could be storing the same data but with different types (e.g. floats versus integers), or using a different naming convention. Furthermore, each party will need to share its relevant network parameters to other parties so that they are able to communicate with each other as part of the computation. Often this setup can be difficult, even for experts in distributed systems, as it involves different systems maintained by different organizations. Our hope for this setup tool is to reduce the time and effort that is spent by the various parties in preparing their systems to run the desired computation.
We envision that this setup tool (which likely be a website) would be hosted by one of parties wishing to create the computation (e.g. the analyst would then reach out to the data owners to see if they are interested in participating), but we have also considered that Stealth might host a centralized setup tool, which any interested parties are free to create accounts and then use them to configure their desired computation. Once those accounts are created, those parties could then reuse their account to participate in future studies with other parties. This version, hosted by Stealth, could come in use in more public settings from public research institutions, where the parties involved might wish to make public the computations they’ve run and on which data sets they were able to use as part of their research.
As part of our work we developed flow diagrams, written in Business Process Modeling Notation, that illustrate how we imagine such a UI would be configured. Below is the diagram describing the creation of such a study, with the input that is needed by the various parties such as the analyst proposing the study, and the dataowners who would potentially contribute data to the computation.
In addition to our implemented solution, as part of our work on ANISE, Stealth researchers made a theoretical advance in the area of secure merge (https://ia.cr/2022/590), the problem of combining two sorted lists (which are either held separately by two parties, or held by two parties in some privacy-preserving manner, e.g. via secret-sharing), and outputting a single merged (sorted) list in a privacy-preserving manner, e.g. either encrypted or secret-shared among the parties. Our result yields a protocol with security in the semi-honest model that requires O(n) communication and computation complexity, and O(log log n) rounds of communication. These metrics match those of an insecure protocol for communication and computation.
This work was performed under the following financial assistance award number 70NANB21H064 from U.S. Department of Commerce, National Institute of Standards and Technology. The statements, findings, conclusions, and recommendations are those of the author(s) and do not necessarily reflect the views of the National Institute of Standards and Technology or the U.S. Department of Commerce.