Under The Poliscope: Bringing Science Policy Into Focus

De-clunking the dbGaP Data Submission and Access Process – We’re All Ears!

February 21, 2017

Data. It is the essential output of biomedical research that allows us to move science forward and improve human health. It gets a little trickier however when the conversation turns to how to best provide researchers with access to that data. Especially when you’re trying to balance appropriate protections for human participants in research, who deserve both the maximal use of their data for achieving medical progress and the respectful use of their data in a way that affords privacy protections and consistency with consent. At the NIH, lots of smart people spend a lot of time thinking about human data and how best to manage it. However, we can’t do it alone. We also need help from our stakeholders to solve these difficult issues.

Back in 2007, the National Center for Biotechnology Information (NCBI) developed the database of Phenotypes and Genotypes (dbGaP) to archive and distribute the results of human genome-phenotype studies that fall under NIH’s policies for sharing genomic data.

The dbGaP is a controlled-access data repository and currently serves as a central portal to submit, locate and request access to genomic and associated phenotypic data. It is a highly utilized, valuable, and rapidly growing resource with over 750 studies available for access. Users of dbGaP have access to a wide range of data types such as microarray, genome-wide association study, whole and targeted genomic, transcriptomic, epigenomic, and metagenomic data. As of January 2017, NIH has approved approximately 28,000 Data Access Requests for over 4,500 investigators from 46 countries.

Over the years, users of the dbGaP system have shared their feedback, and many have expressed a number of frustrations relating to the difficulty in navigating the submission process. To address these concerns, NIH has made a number of improvements to dbGaP (see Box 1). To best serve the needs of the research community and enable robust and responsible data sharing, it is imperative that new resources, tools, and data management models be developed to make the system as user-friendly and efficient as possible, as well as increase its utility.

With this in mind, NIH released today a Request for Information (RFI) seeking public comments on the data submission and access processes for dbGaP, and on the management of data within dbGaP, in order to consider options to improve and streamline these processes. To view the RFI and for instructions on how to comment, please visit here.

It is vital that we hear from members of the research community on this topic. We want to take your thoughts and ideas into account when attempting to increase the utility of dbGaP. I invite all stakeholders who currently use or may use dbGaP to provide us with their thoughts. Comments will be accepted until April 7, 2017.


                                                     Box 1:

                        Recent Improvements/Upgrades to dbGAP                            

  • Development of standard data use limitations to promote consistent implementation of the consent group categories.
  • Development of fillable Institutional Certification forms to standardize and expedite the Institutional Certification process for institutions.
  • Implementation of user-friendly, electronic study registration, submission, DAR, project renewal, and project close-out forms.
  • Development of the dbGaP Data Browser to enable viewing of controlled-access summary statistics and individual-level genotype and sequence data associated with phenotypic features, by dbGaP approved users, without the need to download datasets.
  • In collaboration with the Global Alliance For Genomics and Health Beacon project, implementation of a simple web interface that allows users to query dbGaP for genomic variants of interest and their presence in the database. 
  • Issuance of a Position on the Use of Cloud Computing Services for Storage and Analysis of Controlled-Access Data Subject to the NIH Genomic Data Sharing Policy to allow investigators to request permission to transfer controlled-access genomic data and other associated data obtained from dbGaP to public or private cloud systems for storage and analysis.
  • Creation of search filters for dbGaP datasets (e.g. data use limitations, disease area, data type).
  • Assembly of two data collections that allows investigators to submit a single DAR to gain access to most of the individual-level datasets in dbGaP approved for general research use (currently includes 96 datasets), or only the aggregated data from these datasets.  
  • In an effort to promote transparency, the addition of a “Facts & Figures” section on the NIH GDS website to highlight current dbGaP data submission and access statistics, including DAR processing times and data management incidents.
  • Development of a mechanism to establish structured partnerships with external organizations or "trusted partners”.



Thank you for this post. It is very helpful. I have a suggestion that a single master guidance document be developed that would provide step-by-step guidance for what and how to get data out and another for how to put data in. There are, at this time, multiple documents for the above purposes. Can this process be simplified or perhaps we need to be realistic and recognize that this proposal will never be possible?
For over a year we have tried to deposit sequence data from a study that was published in Cell. With no response from our queries we eventually decided to deposit it in a European database to fulfill our promise to open data. Pretty embarrassing.

Add comment

Under The Poliscope: Bringing Science Policy Into Focus
Dr. Carrie D. Wolinetz

Carrie D. Wolinetz, Ph.D.
Associate Director for Science Policy, NIH

Subscribe to Under The Poliscope



Under The Poliscope RSS Feed Under The Poliscope Policies

Search Under The Poliscope