Dataset Description: Hetionet is an integrative network of biomedical knowledge assembled from 29 different databases of genes, compounds, diseases, and more. The network combines over 50 years of biomedical information into a single resource. In the dataset, TDC processes into a list of triplets where each row contains source_type, source_id, target_type, target_id, relation type, and direction of the relation.

Dataset Statistics: 47,031 nodes (11 types) and 2,250,197 relationships (24 types).

from tdc.resource import BioKG
data = BioKG(name = 'HetioNet')


[1] Himmelstein, Daniel Scott, et al. “Systematic integration of biomedical knowledge prioritizes drugs for repurposing.” Elife 6 (2017): e26726.

[2] Himmelstein, Daniel S., and Sergio E. Baranzini. “Heterogeneous network edge prediction: a data integration approach to prioritize disease-associated genes.” PLoS Comput Biol 11.7 (2015): e1004259.

Dataset License: CC BY 4.0.