TY - JOUR
T1 - PDBx/mmCIF Ecosystem
T2 - Foundational Semantic Tools for Structural Biology: PDBx/mmCIF Ecosystem: Foundational Semantic Tools
AU - Westbrook, John
AU - Young, Jasmine Y.
AU - Shao, Chenghua
AU - Feng, Zukang
AU - Guranovic, Vladimir
AU - Lawson, Catherine L.
AU - Vallat, Brinda
AU - Adams, Paul D.
AU - Berrisford, John M.
AU - Bricogne, Gerard
AU - Diederichs, Kay
AU - Joosten, Robbie P.
AU - Keller, Peter
AU - Moriarty, Nigel W.
AU - Sobolev, Oleg V.
AU - Velankar, Sameer
AU - Vonrhein, Clemens
AU - Waterman, David G.
AU - Kurisu, Genji
AU - Berman, Helen M.
AU - Burley, Stephen K.
AU - Peisach, Ezra
N1 - Funding Information: RCSB PDB is funded by the National Science Foundation [DBI-1832184; P.I.: S.K.B.], the US Department of Energy [DE-SC0019749; P.I.: S.K.B.], and the National Cancer Institute, National Institute of Allergy and Infectious Diseases, and National Institute of General Medical Sciences of the National Institutes of Health [R01GM133198; P.I.: S.K.B.]. Development of the PDBx/mmCIF data dictionary is funded in part by [DBI-2019297; P.I.: S.K.B.]. PDBe is funded by European Molecular Biology Laboratory-European Bioinformatics Institute; Wellcome Trust [104948]; Biotechnology and Biological Sciences Research Council [BB/N019172/1, BB/G022577/1, BB/J007471/1, BB/K016970/1, BB/K020013/1, BB/M013146/1, BB/M011674/1, BB/M020347/1, BB/M020428/1, BB/P024351/1]; European Union [284209]; ELIXIR; and Open Targets. PDBj is funded by the NBDC-JST [P.I.: G.K.], partially by BINDS-AMED [P.I.: G.K.]. The authors would like to acknowledge the early work by Syd Hall, Paula Fitzgerald, and Brian McMahon, Keith Watenpaugh, Phil Bourne and many others who were instrumental in the development of the original mmCIF data standard. Publisher Copyright: © 2022 The Authors
PY - 2022/6/15
Y1 - 2022/6/15
N2 - PDBx/mmCIF, Protein Data Bank Exchange (PDBx) macromolecular Crystallographic Information Framework (mmCIF), has become the data standard for structural biology. With its early roots in the domain of small-molecule crystallography, PDBx/mmCIF provides an extensible data representation that is used for deposition, archiving, remediation, and public dissemination of experimentally determined three-dimensional (3D) structures of biological macromolecules by the Worldwide Protein Data Bank (wwPDB, wwpdb.org). Extensions of PDBx/mmCIF are similarly used for computed structure models by ModelArchive (modelarchive.org), integrative/hybrid structures by PDB-Dev (pdb-dev.wwpdb.org), small angle scattering data by Small Angle Scattering Biological Data Bank SASBDB (sasbdb.org), and for models computed generated with the AlphaFold 2.0 deep learning software suite (alphafold.ebi.ac.uk). Community-driven development of PDBx/mmCIF spans three decades, involving contributions from researchers, software and methods developers in structural sciences, data repository providers, scientific publishers, and professional societies. Having a semantically rich and extensible data framework for representing a wide range of structural biology experimental and computational results, combined with expertly curated 3D biostructure data sets in public repositories, accelerates the pace of scientific discovery. Herein, we describe the architecture of the PDBx/mmCIF data standard, tools used to maintain representations of the data standard, governance, and processes by which data content standards are extended, plus community tools/software libraries available for processing and checking the integrity of PDBx/mmCIF data. Use cases exemplify how the members of the Worldwide Protein Data Bank have used PDBx/mmCIF as the foundation for its pipeline for delivering Findable, Accessible, Interoperable, and Reusable (FAIR) data to many millions of users worldwide.
AB - PDBx/mmCIF, Protein Data Bank Exchange (PDBx) macromolecular Crystallographic Information Framework (mmCIF), has become the data standard for structural biology. With its early roots in the domain of small-molecule crystallography, PDBx/mmCIF provides an extensible data representation that is used for deposition, archiving, remediation, and public dissemination of experimentally determined three-dimensional (3D) structures of biological macromolecules by the Worldwide Protein Data Bank (wwPDB, wwpdb.org). Extensions of PDBx/mmCIF are similarly used for computed structure models by ModelArchive (modelarchive.org), integrative/hybrid structures by PDB-Dev (pdb-dev.wwpdb.org), small angle scattering data by Small Angle Scattering Biological Data Bank SASBDB (sasbdb.org), and for models computed generated with the AlphaFold 2.0 deep learning software suite (alphafold.ebi.ac.uk). Community-driven development of PDBx/mmCIF spans three decades, involving contributions from researchers, software and methods developers in structural sciences, data repository providers, scientific publishers, and professional societies. Having a semantically rich and extensible data framework for representing a wide range of structural biology experimental and computational results, combined with expertly curated 3D biostructure data sets in public repositories, accelerates the pace of scientific discovery. Herein, we describe the architecture of the PDBx/mmCIF data standard, tools used to maintain representations of the data standard, governance, and processes by which data content standards are extended, plus community tools/software libraries available for processing and checking the integrity of PDBx/mmCIF data. Use cases exemplify how the members of the Worldwide Protein Data Bank have used PDBx/mmCIF as the foundation for its pipeline for delivering Findable, Accessible, Interoperable, and Reusable (FAIR) data to many millions of users worldwide.
KW - biological data
KW - data management
KW - data standard
KW - macromolecular structure
KW - protein data bank (PDB)
UR - http://www.scopus.com/inward/record.url?scp=85129924628&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85129924628&partnerID=8YFLogxK
U2 - https://doi.org/10.1016/j.jmb.2022.167599
DO - https://doi.org/10.1016/j.jmb.2022.167599
M3 - Review article
C2 - 35460671
SN - 0022-2836
VL - 434
JO - Journal of molecular biology
JF - Journal of molecular biology
IS - 11
M1 - 167599
ER -