Dock2D: Synthetic data for the molecular recognition problem

Research output: Contribution to journalArticlepeer-review

Abstract

Predicting the physical interaction of proteins is a cornerstone problem in computational biology. New classes of learning-based algorithms are actively being developed, and are typically trained end-to-end on protein complex structures extracted from the Protein Data Bank. These training datasets tend to be large and difficult to use for prototyping and, unlike image or natural language datasets, they are not easily interpretable by non-experts. We present Dock2D-IP and Dock2DIF, two “toy” datasets that can be used to select algorithms predicting protein-protein interactions—or any other type of molecular interactions. Using two-dimensional shapes as input, each example from Dock2D-IP (“interaction pose”) describe the interaction pose of two shapes known to interact and each example from Dock2D-IF (“interaction fact”) describes whether two shapes form a stable complex or not, regardless of how they bind.

Original languageAmerican English
Pages (from-to)1-8
Number of pages8
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
DOIs
StateAccepted/In press - 2024
Externally publishedYes

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Applied Mathematics

Keywords

  • Computer vision
  • Deep learning
  • Prediction algorithms
  • Predictive models
  • Proteins
  • Shape
  • Training
  • Vectors
  • deep convolutional neural networks
  • molecular docking
  • protein-protein interactions

Fingerprint

Dive into the research topics of 'Dock2D: Synthetic data for the molecular recognition problem'. Together they form a unique fingerprint.

Cite this