De-clunking the dbGaP Data Submission and Access Process – We’re All Ears!

Data. It is the essential output of biomedical research that allows us to move science forward and improve human health. It gets a little trickier however when the conversation turns to how to best provide researchers with access to that data. Especially when you’re trying to balance appropriate protections for human participants in research, who deserve both the maximal use of their data for achieving medical progress and the respectful use of their data in a way that affords privacy protections and consistency with consent. At the NIH, lots of smart people spend a lot of time thinking about human data and how best to manage it. However, we can’t do it alone. We also need help from our stakeholders to solve these difficult issues.

Back in 2007, the National Center for Biotechnology Information (NCBI) developed the database of Phenotypes and Genotypes (dbGaP) to archive and distribute the results of human genome-phenotype studies that fall under NIH’s policies for sharing genomic data.

The dbGaP is a controlled-access data repository and currently serves as a central portal to submit, locate and request access to genomic and associated phenotypic data. It is a highly utilized, valuable, and rapidly growing resource with over 750 studies available for access. Users of dbGaP have access to a wide range of data types such as microarray, genome-wide association study, whole and targeted genomic, transcriptomic, epigenomic, and metagenomic data. As of January 2017, NIH has approved approximately 28,000 Data Access Requests for over 4,500 investigators from 46 countries.

Over the years, users of the dbGaP system have shared their feedback, and many have expressed a number of frustrations relating to the difficulty in navigating the submission process. To address these concerns, NIH has made a number of improvements to dbGaP (see Box 1). To best serve the needs of the research community and enable robust and responsible data sharing, it is imperative that new resources, tools, and data management models be developed to make the system as user-friendly and efficient as possible, as well as increase its utility.

With this in mind, NIH released today a Request for Information (RFI) seeking public comments on the data submission and access processes for dbGaP, and on the management of data within dbGaP, in order to consider options to improve and streamline these processes.

It is vital that we hear from members of the research community on this topic. We want to take your thoughts and ideas into account when attempting to increase the utility of dbGaP. I invite all stakeholders who currently use or may use dbGaP to provide us with their thoughts. Comments will be accepted until April 7, 2017.

                                                   

Box 1: Recent Improvements/Upgrades to dbGAP  

  • Development of standard data use limitations to promote consistent implementation of the consent group categories.
  • Development of fillable Institutional Certification forms to standardize and expedite the Institutional Certification process for institutions.
  • Implementation of user-friendly, electronic study registration, submission, DAR, project renewal, and project close-out forms.
  • Development of the dbGaP Data Browser to enable viewing of controlled-access summary statistics and individual-level genotype and sequence data associated with phenotypic features, by dbGaP approved users, without the need to download datasets.
  • In collaboration with the Global Alliance For Genomics and Health Beacon project, implementation of a simple web interface that allows users to query dbGaP for genomic variants of interest and their presence in the database.
  • Issuance of a Position on the Use of Cloud Computing Services for Storage and Analysis of Controlled-Access Data Subject to the NIH Genomic Data Sharing Policy to allow investigators to request permission to transfer controlled-access genomic data and other associated data obtained from dbGaP to public or private cloud systems for storage and analysis.
  • Creation of search filters for dbGaP datasets (e.g. data use limitations, disease area, data type).
  • Assembly of two data collections that allows investigators to submit a single DAR to gain access to most of the individual-level datasets in dbGaP approved for general research use (currently includes 96 datasets), or only the aggregated data from these datasets.
  • In an effort to promote transparency, the addition of a “Facts & Figures” section on the NIH GDS website to highlight current dbGaP data submission and access statistics, including DAR processing times and data management incidents.
  • Development of a mechanism to establish structured partnerships with external organizations or “trusted partners”.

The Revised Common Rule: A Tribute to the Past and a Promise for the Future

As humans living longer, healthier lives than at any point in history, we all owe a great deal of thanks to the countless volunteers who have served as research participants. These are people who have given of their time, of their bodies, who have accepted risks from the very small to the very large, and who have done so knowing that they might receive no personal benefit. Lifesaving vaccines, cancer therapies, cardiovascular treatments, and every drug, diagnostic, and cure in our modern medical pantheon have been made possible thanks to the willingness of research volunteers; real people with real lives and real loved ones.

And we owe it to those willing to volunteer to ensure that we have the best possible safeguards in place to ensure that research involving humans is conducted ethically, safely, and equitably.  In 1978, the Belmont Report was published, outlining the principles and guidelines for protecting human research participants, built on the foundations of respect, beneficence and justice. This led to the regulation now known as the “Common Rule,” whose purpose was to ensure that research involving humans was conducted in line with the highest ethical standards and practices. Today, the U.S. government has announced revisions to the Common Rule, the culmination of nearly a decade of rulemaking aimed at improving our system of oversight and facilitating research.

In the world of science policy geeks, there are few Eureka! moments or end zone dances. It is not every day that landmark rules are released. Furthermore, it is rare for significant policy changes to take place in the absence of a crisis, but rather in recognition of the evolution of science, our increased understanding of what works and what doesn’t in research oversight, and the changing nature of participant engagement in research. The revision of the Common Rule is the endpoint of years of discussion, debate, thousands of public comments, and, most importantly, a dedication to the very people who it is designed to protect and to the researchers who serve as their partners in discovery.

‘Twas the Night Before Hanukkah (and Christmas) at NIH

‘Twas the night before Hanukkah and at OSP,

We were lighting menorahs and trimming the tree,

sIRB policy was all tucked in its bed,

In hopes that the deadline extension will alleviate dread.

Wondering whether the Common Rule soon will appear,

Perhaps we will see it in the coming New Year?

The RAC’s streamlined process is now put in place,

To focus on biotech moving at rapid pace.

With 21st Century Cures passed and signed,

On implementation, we’ll spend lots of time.

More rapid than eagles, precision medicine soars,

And we juggle the policy issues that still lay in store.

Now CRISPR! Now data! Now Select Agents in vials!

On chimera! On privacy! On clinical trials!

Speaking of which, we continue to move,

Policies to help clinical trials improve.

From the start of the project! To training and checking!

To the reporting of results we now are expecting!

To protocol templates we hope you will use,

And have you seen our GCP FAQs?

The Moonshot is launched, NSABB report’s been released,

Research with chimps now has been ceased.

Your thoughts on data sharing we still want to hear,

RFI’s been extended until early next year.

Genomic Data Sharing – it’s still running along!

HeLa cell review – it’s still going strong!

While it’s hard to believe this year’s almost over,

We’ll be highlighting biosafety again, next October.

Time to start thinking about next year’s plans,

Because good science and good policy, go hand in hand.

And our policy wonks all just want to say,

“To the NIH world, have a fine holiday!”

The What and How of Data Sharing

In a recent BMJ article, Milton Packer highlights the value of data sharing using iconic scientific figures: Copernicus, whose heliocentric theory of the universe was built using others’ data; his rival, Brahe, who hoarded data for fear of confirming Copernicus’ theory; and Kepler, whose famous laws of planetary motion depended on data sharing.  The National Institutes of Health (NIH) has long been a leader in data sharing, and there is a clear clamor for more and better data sharing by NIH and other federal agencies. There is little doubt that sharing biomedical research and health-related data plays a key role in advancing knowledge of human health and well-being. But data sharing is not without cost, and data shared in ways that are not useful to the research enterprise can waste, rather than maximize, resources. This is why we need help from you, the research community, to help us shape our data sharing strategies for the future.

Consistent with federal initiatives promoting open data and open science, NIH continues to be committed to ensuring that, to the maximum extent possible, the results of federally-funded scientific research are made publicly to support reuse, reproducibility and discovery. In order to move forward with ongoing commitments to the data sharing enterprise, we are considering priorities for data management and sharing and how to expand upon existing data sharing policies, such as our 2003 Data Sharing Policy. However, we recognize that many factors must be considered when determining what, when, and how data should be managed and shared including, for example, the purpose for sharing, supporting data re-use and reproducibility, maturity of the science, the infrastructure uniqueness of the data, and ethical considerations.

Today we are publishing a Request for Information (RFI) related to strategies for data management, sharing, and citation.  Through this RFI, we are seeking stakeholder feedback on considerations pertaining to what types of data should be shared, the costs and benefits of sharing different types of data, and standards for citation of data and software. Your feedback will help us to prioritize our thinking in data sharing stewardship and be considered as we move forward in developing new NIH policies in this area.

We need to hear from data users, data generators, and data scientists. By assisting us in this request for information, you can help ensure that NIH has the most robust set of information on hand when making future decisions in this important arena.  I encourage all interested stakeholders to review the RFI and provide us with their thoughts.  Comments on the RFI will be accepted until December 29, 2016.