CIBR: BBSRC-NSF/BIO: Next generation Protein Data Bank - FACT infrastructure with value added FAIR data supporting diverse research and education user communities.

Project Details


This US RCSB Protein Data Bank (RCSB PDB)/Protein Data Bank in Europe (PDBe) collaborative project aims to improve data deposition, delivery, and management of three- dimensional (3D) macromolecular structure information stored in single global public data resource known as the Protein Data Bank (PDB). The PDB currently houses ~160,000 experimentally determined 3D structures of proteins and nucleic acids. It is managed according to the FAIR Principles on an open access basis by the Worldwide Protein Data Bank (wwPDB) partnership, which includes RCSB PDB, PDBe, Protein Data Bank Japan, and BioMagResBank. This project will transform the relatively static PDB archive into a living data resource composed of 3D structure information integrated with up-to- date annotations and metadata that relate each structure to its biochemical, cellular, or organismal context. Accurate data integration will help basic and applied researchers coming from agriculture to zoology, who are not experts in structural and computational biology, better understand and more fully utilize 3D structures in their day-to-day work in wet and dry laboratories. This new data repository will also enable 3D structural comparisons of proteins from different with similar or identical biochemical and biological functions thereby improving our understanding of protein evolution at the atomic level. The wwPDB maintains an information portal describing PDB global resources at

The project addresses significant software engineering challenges, resulting from growth in the number and size/complexity of newly deposited macromolecular crystallography (MX) and single-particle cryo-electron microscopy (3DEM) structures, and the need to manage groups of related structures coming from serial femtosecond X-ray crystallography (SFX) using X-ray Free Electron Lasers (XFEL) and 3DEM. The project will improve the fidelity and completeness of 3D structure data deposited into the PDB by harvesting data automatically from structure determination software packages, and streamlining the wwPDB data deposition, validation, and biocuration system known as OneDep. The project will improve the accessibility of PDB data for researchers, educators, and students by extending chemical metadata for small-molecule ligands (e.g., bound cofactors and inhibitors), incorporating enhanced descriptions of macromolecular assemblies, grouping related PDB structures into investigations for more efficient, parallel data delivery; and creating a 'Next Generation' PDB data repository with up-to-date metadata. Finally, the project will modernize wwPDB information technology infrastructure to futureproof PDB data management and weekly PDB archive release to the public domain by developing new application programming interfaces (APIs) and microservices infrastructure, and updating existing mechanisms for synchronization of data and software across wwPDB data centers in the US, Europe, and Asia. This work will directly benefit researchers, educators, and their students across the natural, physical, and engineering sciences.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Effective start/end date7/1/206/30/23


  • National Science Foundation: $1,612,028.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.