Data Confidentiality Issues and Solutions in Academic Research Computing

Many universities have needs for computing with “sensitive” data, such as data containing protected health information (PHI), personally identifiable information (PII), or proprietary information. Sometimes this data is subject to legal restrictions, such as those imposed by HIPAA, CUI, FISMA, DFARS, GDPR, or the CCPA, and at other times, data may simply not be sharable per a data use agreement. It may be tempting to think that such data is typically only in the domain of DOD and NIH funded research, but it turns out that this assumption is far from reality. While this issue arises in numerous scientific domains, including ones that people might immediately think of, such as medical research, it also arises in numerous others, including economics, sociology, and other social sciences that might look at financial data, student data or psychological records; chemistry and biology particularly that which relates to genomic analysis and pharmaceuticals, manufacturing, and materials; engineering analyses, such as airflow dynamics; underwater acoustics; and even computer science and data analysis, including advanced AI research, quantum computing, and research involving system and network logs. Such research is funded by an array of sponsors, including the National Science Foundation (NSF) and private foundations.

The report examined both the varying needs involved in analyzing sensitive data and also a variety of solutions currently in use, ranging from campus and PI-operated clusters to cloud and third-party computing environments to technologies like secure multiparty computation and differential privacy. We also discussed procedural and policy issues involved in campuses handling sensitive data.

Read more at the Trusted CI Blog.