Modernized uniform representation of carbohydrate molecules in the Protein Data Bank

Chenghua Shao, Zukang Feng, John D. Westbrook, Ezra Peisach, John Berrisford, Yasuyo Ikegawa, Genji Kurisu, Sameer Velankar, Stephen K. Burley, Jasmine Y. Young

Research output: Contribution to journalArticlepeer-review


Since 1971, the Protein Data Bank (PDB) has served as the single global archive for experimentally determined 3D structures of biological macromolecules made freely available to the global community according to the FAIR principles of Findability-Accessibility-Interoperability-Reusability. During the first 50 years of continuous PDB operations, standards for data representation have evolved to better represent rich and complex biological phenomena. Carbohydrate molecules present in more than 14,000 PDB structures have recently been reviewed and remediated to conform to a new standardized format. This machine-readable data representation for carbohydrates occurring in the PDB structures and the corresponding reference data improves the findability, accessibility, interoperability and reusability of structural information pertaining to these molecules. The PDB Exchange MacroMolecular Crystallographic Information File data dictionary now supports (i) standardized atom nomenclature that conforms to International Union of Pure and Applied Chemistry-International Union of Biochemistry and Molecular Biology (IUPAC-IUBMB) recommendations for carbohydrates, (ii) uniform representation of branched entities for oligosaccharides, (iii) commonly used linear descriptors of carbohydrates developed by the glycoscience community and (iv) annotation of glycosylation sites in proteins. For the first time, carbohydrates in PDB structures are consistently represented as collections of standardized monosaccharides, which precisely describe oligosaccharide structures and enable improved carbohydrate visualization, structure validation, robust quantitative and qualitative analyses, search for dendritic structures and classification. The uniform representation of carbohydrate molecules in the PDB described herein will facilitate broader usage of the resource by the glycoscience community and researchers studying glycoproteins.

Original languageAmerican English
Pages (from-to)1204-1218
Number of pages15
Issue number9
StatePublished - Sep 1 2021

ASJC Scopus subject areas

  • Biochemistry


  • Protein Data Bank
  • carbohydrate structure
  • glycan
  • glycosylation
  • oligosaccharide


Dive into the research topics of 'Modernized uniform representation of carbohydrate molecules in the Protein Data Bank'. Together they form a unique fingerprint.

Cite this