Enabling the secondary use of research data to advance scientific discoveries while respecting participant privacy has been a priority for both NIH and the public. How can we strike the right balance of maximizing public benefit from research while remaining consistent among the many important scientific and ethical considerations?
NIH has been evaluating the best way for researchers to access genomic summary results (GSR), which, as the name implies, are ‘aggregated’ summary statistics from all participants in a genomic research study or set of studies. GSR have an important distinction from some other types of genomic research data. This is because GSR do not include individual-level information, in contrast to individual genome sequences. Instead, GSR come from pooling genomic data from multiple individuals together, yielding information like genotype frequencies and other statistics. This information can help researchers determine which genomic variants might or might not contribute to a disease or disorder.
Before 2008, these types of GSR were publicly available in the NIH Database of Genotypes and Phenotypes (dbGaP). However, in 2008, an article was published showing that statistical methods using GSR could possibly be used to determine if an individual participated in a specific research study (if they also had access to that individual’s genomic data). Because of this concern, NIH decided that until it had a better appreciation of the state of the science and the actual risks to research participants, it was best to have GSR available through controlled-access.
Since that time, NIH has convened two workshops to bring together leaders in the field to consider a wide range of issues, including those directly related to GSR. One of the workshops, held in 2016, focused specifically on the risks and benefits of different levels of access to GSR. NIH also solicited broad input in a Request for Information earlier in 2017. Based on the recommendations from the workshops and public comments received through the RFI, NIH has come to realize that many stakeholders believe that there is little risk when GSR are maintained through unrestricted access (i.e., in an open and public way). However, they also suggested that additional protections should be in place for sensitive studies where there might be additional concerns, such as studies that include populations from isolated geographic areas or with rare or stigmatizing traits.
Based on this input, NIH has developed a proposed update to the access process for GSR under the NIH Genomic Data Sharing Policy, and is now seeking public comment. This update would allow GSR from most studies to be provided via a public, rapid-access model. GSR from sensitive studies would remain in controlled-access.
To view the request for comments and for instructions on how to comment, please visit: Previously Compiled Public Comments.
NIH encourages comments from all stakeholders, and is especially interested in hearing from members of the general public, research participants, and the broader patient community. Comments will be accepted until October 20, 2017. In addition, during the comment period, experts from both OSP and NHGRI will also be hosting a webinar on GSR on October 4. More details on this webinar will be provided shortly.
NIH is committed to maximizing the value of government-funded research while ensuring that participant privacy is protected, and we want to take all stakeholder thoughts into account. We look forward to hearing from you!
This blog was co-authored by Dr. Eric Green, Director of the Human Genome Research Institute. More information about NHGRI can be found at https://www.genome.gov/.
As a genomics researcher, I can say that summary statistics are an amazingly powerful tool that give an extraordinary insight into complex traits. Public access is a key determinant of whether better understanding of these traits is accelerating. If we are to maximize the return on investment in biology research and respect all those who participate in these studies, it is critical to make the results easily available to a large audience.
I agree that summary level data can be extraordinarily useful and there is very low risk for individual identification as long as the data are provided to researchers who commit that they will not attempt individual identification. In the NCI funded DRIVE project we made summary data available on our own website because NIH was reluctant to host it for public access. An ongoing issue however, occurs when NIH-supported data are jointly analyzed with data from other countries not supported by NIH funding. Colleagues in these other countries report (a) they are told that they are not under data sharing obligations (b) even when their funding agencies have such obligations on paper they do not enforce them and (c) their academic institutions are anxious about public data sharing and forbid it. It would be very helpful in NIH would consider joint policies with the EU, UK funders etc. to reach uniform policies. Otherwise the summary results have to be redacted to NIH-funded data only before posting.
As a rare cancer patient I’d like to see as much patient data shared as possible. I’ve participated in NIH and OHSU research protocols yet I never get to hear the results of the studies. I’m in touch with the few other patients with my inheritable gene mutation. We have no cure and our children could inherit our disease. I personally email researchers all over the world looking for the needle in the haystack to stop my own recurrent tumors. In the forty years since my father’s death I thought there would be some progress on why a Krebs Cycle mutation could result in three different cancers but there hasn’t been any revelations. I personally want my data, my DNA, my tumor specimens shared with anyone who will study my disease.
We’re working on a Medical Imaging exchange platform with the goal to democratize the data while keeping in mind the patients privacy simply through data anonymization.