Abstract
Predicting the physical interaction of proteins is a cornerstone problem in computational biology. New classes of learning-based algorithms are actively being developed, and are typically trained end-to-end on protein complex structures extracted from the Protein Data Bank. These training datasets tend to be large and difficult to use for prototyping and, unlike image or natural language datasets, they are not easily interpretable by non-experts. We present Dock2D-IP and Dock2DIF, two “toy” datasets that can be used to select algorithms predicting protein-protein interactions—or any other type of molecular interactions. Using two-dimensional shapes as input, each example from Dock2D-IP (“interaction pose”) describe the interaction pose of two shapes known to interact and each example from Dock2D-IF (“interaction fact”) describes whether two shapes form a stable complex or not, regardless of how they bind.
| Original language | American English |
|---|---|
| Pages (from-to) | 1-8 |
| Number of pages | 8 |
| Journal | IEEE/ACM Transactions on Computational Biology and Bioinformatics |
| DOIs | |
| State | Accepted/In press - 2024 |
| Externally published | Yes |
ASJC Scopus subject areas
- Biotechnology
- Genetics
- Applied Mathematics
Keywords
- Computer vision
- Deep learning
- Prediction algorithms
- Predictive models
- Proteins
- Shape
- Training
- Vectors
- deep convolutional neural networks
- molecular docking
- protein-protein interactions